6:30 - You talk about adding gamma * V(S') to the value of state S. You say that you're actually adding gamma^{num_steps} - I don't see how this is true. It looks to me like you're just adding (gamma times) the value of one step in the past, not recursively adding the value of all states that lead to state S as the Bellman equation describes. Please, can you clarify?
Dom McKean in deep q learning the output of the neural net is the "actual estimated q value" for each action for a specific state. You then take the highest number to get the action you should perform.
Good videos, but you can't just copy code and pretend it is your own. The is just copied from 'Deep reinforcement learning hands-on' by Maxim Lapan, and looking back at some of your other videos, it is clear that you copied them as well. I have no problem with you using other peoples code, you just have to properly reference it in your video and description, and don't just link to your own GitHub. That is shady as shit.
It is remarkable summarization and demonstration. Thank you!
For this reason, I am implementing A2C for my masters. Thank you.
This is amazing. Especially for someone like I who know A3C but haven't implemented them yet. Thanks!
you can't tell you know the algorithm if you have not implemented it before.
04:00 Why saving s_prime if you do not use it ?
6:30 - You talk about adding gamma * V(S') to the value of state S. You say that you're actually adding gamma^{num_steps} - I don't see how this is true. It looks to me like you're just adding (gamma times) the value of one step in the past, not recursively adding the value of all states that lead to state S as the Bellman equation describes. Please, can you clarify?
06:18 line 32, why do you use ReLU for value network ? Value of a state can be negative! It should not have any activation, should not it ?
@Paval Koryakin ReLU is not used in last layer of value head, therefore it can still be negative.
0:30 What are “actual estimated Q-values”?
Dom McKean in deep q learning the output of the neural net is the "actual estimated q value" for each action for a specific state. You then take the highest number to get the action you should perform.
S4ndwichGurk3 ‘actual estimated’ sounds like an oxymoron.
@@Kingstanding23 ah that's what you mean :D sorry haha i didn't even notice
What is a prime state, is the max V(s) at state, or ?
It smells Maxim Lapan's book. Anyway, the tutorial was great! :D
Good videos, but you can't just copy code and pretend it is your own. The is just copied from 'Deep reinforcement learning hands-on' by Maxim Lapan, and looking back at some of your other videos, it is clear that you copied them as well. I have no problem with you using other peoples code, you just have to properly reference it in your video and description, and don't just link to your own GitHub. That is shady as shit.
Thanks for share