Yes, in newer versins of slides are correccted already, consider checking values calculations, there can be some typo, or changes of values that havent updated the results is possible. Thanks however.
Yes, you can have a stochastic policy. In stochastic policy, π gives probabilities of taking each possible action in that state you have π(a∣s). So if consider the policy as stochastic, we can say that given a state s, the policy π assigns probabilities to each possible action rather than always picking the same action. Therefore, other actions (not argmax) also will have a chance to be run by policy although they are bad.
Hi Sir, In Monte Carlo Exploring Starts example , if we know that the robot is starting from s20, why are we generating episodes randomly from s7 or s8, we can start generating episodes from s20?
It is only an example here in the slide to show the process (6 is developer choice and considered small here). The number of episodes should be defined depending on the problem. It must be enough (usually at least two or three times larger than if the agent can do it in an optimal way) to allow the agent to explore and, of course, be able to reach the objective.
I am not sure you are mentioning which slide. However, there are some updates in the slides after the videos, I tried to note them under the video, since some examples have been changed some final values still might show previous example results which have not been updated on the slides. So consider solution approach first
At 40min40sec, there is an error in the calculation of V(S1). It should be -3.5 NOT -4.5. Because (-4-3/2) = 3.5.
Yes, in newer versins of slides are correccted already, consider checking values calculations, there can be some typo, or changes of values that havent updated the results is possible. Thanks however.
that's why i pay for a youtube subscription, this content was exactly what i needed
Very good and detail explanation !!! Thank you
@Saeed Saeedvand would you pls share the PPT pls ??
Nice lectures. Thank you for the example. I was wondering how the trajectory would change if the policy is stochastic.
Yes, you can have a stochastic policy.
In stochastic policy, π gives probabilities of taking each possible action in that state you have π(a∣s). So if consider the policy as stochastic, we can say that given a state s, the policy π assigns probabilities to each possible action rather than always picking the same action. Therefore, other actions (not argmax) also will have a chance to be run by policy although they are bad.
@@SaeedSaeedvand Thank you
@Saeed would you pls share the PPT URL pls
Very good explanation
tum bohot mast kaam karta hai maqsood bhai 🫂
Hi Sir, In Monte Carlo Exploring Starts example , if we know that the robot is starting from s20, why are we generating episodes randomly from s7 or s8, we can start generating episodes from s20?
best explanation on youtube
@SaeedSaeedvand
would you pls share the PPT link pls ??
good example for learning
how do you determine each episode is 6 steps long?
It is only an example here in the slide to show the process (6 is developer choice and considered small here). The number of episodes should be defined depending on the problem. It must be enough (usually at least two or three times larger than if the agent can do it in an optimal way) to allow the agent to explore and, of course, be able to reach the objective.
Thank you for the great content
this is golden!
Hi Sir, for (s1, L, r1), shouldn't that be 50?
I am not sure you are mentioning which slide. However, there are some updates in the slides after the videos, I tried to note them under the video, since some examples have been changed some final values still might show previous example results which have not been updated on the slides. So consider solution approach first
aby english toh bollna seekh le bhai
very nice explanation