Jeez, these are the most comprehensive tutorials that I've found so far. Thank you soooo much. Everything that is needed is here: overall overview, necessary mathematic, code.
Hey Colin, when specifying a = max_a(Q(s,a)), I think that the 'argmax' operator would be more suitable, since 'max' is returning a value rather than an action, while 'argmax' returns the action itself that maximizes the term inside the parenthesis.
Jeez, these are the most comprehensive tutorials that I've found so far. Thank you soooo much. Everything that is needed is here: overall overview, necessary mathematic, code.
By far the best explanation of this material. You have a knack for explaining this in the simplest terms.
Hey Colin, when specifying a = max_a(Q(s,a)), I think that the 'argmax' operator would be more suitable, since 'max' is returning a value rather than an action, while 'argmax' returns the action itself that maximizes the term inside the parenthesis.
Thank you for making this, very helpful.
So I think we can all agree that the only good question to be asked is where you got that pic of the cute game playing robot
Shouldnt the subscript in step 6 be s' ?
Can we make it interactive with more diagrams ?
This is a nice copy paste from Hands-on Deep Reinforcement Learning.