So there is only two method-based in RL, Value-based, and Policy Gradient-based, Actor-Critic based is fall into category Policy Gradient-based, for confirmation is that correct? and from what source this information? or would you like to cover some Actor-Critic based method RL videos?
lol what was confusing here he simply told about the policy generation and value function based policy generation method.. then told two types of policy generation methods from value functions which are V(s) and Q(s,a).. the simple intution was to be able to detect maximum reward state.. you should watch first markov decision process then it will make sense.
TH-cam algo, please make the relevance score of this video to 10/10. This video is too good to be ignored
Thank you! Now if only the TH-cam gods listen
you just make video. what am i about to study😃
Thanks alot!!😀
can you prepare a video for Double Q-Learning Network
and Dueling Double Q-Learning Network
please
So there is only two method-based in RL, Value-based, and Policy Gradient-based,
Actor-Critic based is fall into category Policy Gradient-based, for confirmation is that correct? and from what source this information? or would you like to cover some Actor-Critic based method RL videos?
Thank you.
I was confused. You made me more confused. This doesn't explain the intuition.
lol what was confusing here he simply told about the policy generation and value function based policy generation method.. then told two types of policy generation methods from value functions which are V(s) and Q(s,a).. the simple intution was to be able to detect maximum reward state.. you should watch first markov decision process then it will make sense.
Confused :(