RL CH4 - Monte-Carlo Methods on Reinforcement Learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ธ.ค. 2024

ความคิดเห็น • 23

  • @UPGadgetscom
    @UPGadgetscom 9 หลายเดือนก่อน +4

    At 40min40sec, there is an error in the calculation of V(S1). It should be -3.5 NOT -4.5. Because (-4-3/2) = 3.5.

    • @SaeedSaeedvand
      @SaeedSaeedvand  9 หลายเดือนก่อน +2

      Yes, in newer versins of slides are correccted already, consider checking values calculations, there can be some typo, or changes of values that havent updated the results is possible. Thanks however.

  • @Ujjayanroy
    @Ujjayanroy 6 หลายเดือนก่อน

    that's why i pay for a youtube subscription, this content was exactly what i needed

  • @andresariaslondono7003
    @andresariaslondono7003 ปีที่แล้ว +1

    Very good and detail explanation !!! Thank you

  • @gulamsarwar7502
    @gulamsarwar7502 9 หลายเดือนก่อน +1

    @Saeed Saeedvand would you pls share the PPT pls ??

  • @adershm3510
    @adershm3510 ปีที่แล้ว +1

    Nice lectures. Thank you for the example. I was wondering how the trajectory would change if the policy is stochastic.

    • @SaeedSaeedvand
      @SaeedSaeedvand  ปีที่แล้ว

      Yes, you can have a stochastic policy.
      In stochastic policy, π gives probabilities of taking each possible action in that state you have π(a∣s). So if consider the policy as stochastic, we can say that given a state s, the policy π assigns probabilities to each possible action rather than always picking the same action. Therefore, other actions (not argmax) also will have a chance to be run by policy although they are bad.

    • @adershm3510
      @adershm3510 ปีที่แล้ว +1

      @@SaeedSaeedvand Thank you

    • @gulamsarwar7502
      @gulamsarwar7502 9 หลายเดือนก่อน

      @Saeed would you pls share the PPT URL pls

  • @asifkhankhosa
    @asifkhankhosa ปีที่แล้ว +2

    Very good explanation

  • @KaranBhanushali-gt6ts
    @KaranBhanushali-gt6ts ปีที่แล้ว +1

    tum bohot mast kaam karta hai maqsood bhai 🫂

  • @SURAJITPAL-j2f
    @SURAJITPAL-j2f 2 หลายเดือนก่อน

    Hi Sir, In Monte Carlo Exploring Starts example , if we know that the robot is starting from s20, why are we generating episodes randomly from s7 or s8, we can start generating episodes from s20?

  • @abdullah.montasheri
    @abdullah.montasheri 8 หลายเดือนก่อน +1

    best explanation on youtube

  • @gulamsarwar7502
    @gulamsarwar7502 9 หลายเดือนก่อน

    @SaeedSaeedvand
    would you pls share the PPT link pls ??

  • @lizhenhuang7053
    @lizhenhuang7053 ปีที่แล้ว +1

    good example for learning

  • @effortlessjapanese123
    @effortlessjapanese123 ปีที่แล้ว

    how do you determine each episode is 6 steps long?

    • @SaeedSaeedvand
      @SaeedSaeedvand  ปีที่แล้ว +1

      It is only an example here in the slide to show the process (6 is developer choice and considered small here). The number of episodes should be defined depending on the problem. It must be enough (usually at least two or three times larger than if the agent can do it in an optimal way) to allow the agent to explore and, of course, be able to reach the objective.

  • @MohamedHassan-vy3wi
    @MohamedHassan-vy3wi 5 หลายเดือนก่อน

    Thank you for the great content

  • @effortlessjapanese123
    @effortlessjapanese123 ปีที่แล้ว +1

    this is golden!

  • @camelacml
    @camelacml 7 หลายเดือนก่อน

    Hi Sir, for (s1, L, r1), shouldn't that be 50?

    • @SaeedSaeedvand
      @SaeedSaeedvand  6 หลายเดือนก่อน

      I am not sure you are mentioning which slide. However, there are some updates in the slides after the videos, I tried to note them under the video, since some examples have been changed some final values still might show previous example results which have not been updated on the slides. So consider solution approach first

  • @razarizvi7085
    @razarizvi7085 ปีที่แล้ว

    aby english toh bollna seekh le bhai

  • @amirhossein1108
    @amirhossein1108 10 หลายเดือนก่อน +2

    very nice explanation