Iterative Policy Evaluation Algorithm in Python and OpenAI Gym - Reinforcement Learning Tutorial

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ธ.ค. 2024

ความคิดเห็น • 15

  • @aleksandarhaber
    @aleksandarhaber  2 ปีที่แล้ว +1

    It takes a significant amount of time and energy to create these free video tutorials. You can support my efforts in this way:
    - Buy me a Coffee: www.buymeacoffee.com/AleksandarHaber
    - PayPal: www.paypal.me/AleksandarHaber
    - Patreon: www.patreon.com/user?u=32080176&fan_landing=true
    - You Can also press the Thanks TH-cam Dollar button

  • @RahulYadav-w1v4l
    @RahulYadav-w1v4l ปีที่แล้ว +2

    Your explanation of the concept was beautiful. Thank you so much.

  • @peralser
    @peralser ปีที่แล้ว +1

    Thanks for your explanation.

  • @pulkitprajapat7862
    @pulkitprajapat7862 ปีที่แล้ว +1

    thanks a lot for such videos, i am loving them.

    • @aleksandarhaber
      @aleksandarhaber  ปีที่แล้ว

      Great! Thank you for the encouraging comments!

  • @samlee9126
    @samlee9126 2 ปีที่แล้ว +1

    Thank you for your tutorial! It really helps me with my project. A small gift has sent you via Paypal.

  • @northstar6887
    @northstar6887 ปีที่แล้ว +1

    Do you have video covers policy iteration?

    • @aleksandarhaber
      @aleksandarhaber  ปีที่แล้ว

      Check if there is a tutorial in the reinforcement learning list.

  • @dulanjanaperera988
    @dulanjanaperera988 ปีที่แล้ว +2

    Isn't the value of the goal state 1?

    • @aleksandarhaber
      @aleksandarhaber  ปีที่แล้ว +3

      First of all, the definition of the state value function in the current state is the expected sum of rewards that you will obtain by going from that state to the next states. That is, it is a sum of rewards that does not include the reward obtain by reaching the current state. Since the goal state is a terminal state, you do not go to any other state. Consequently, the sum of rewards is zero, and this means that the state value function in the terminal state is zero. This is actually by definition (see Sutton's and Barto's book on reinforcement learning).
      It is a recommendation from Sutton's and Barto's book to initialize all state value functions in terminal states to zero. The goal state value function is not being updated in the iterative algorithm. If you set an initial value it will stay at that value. Its value function is not relevant. You only get a reward of +1 by reaching the goal state. The value function at the goal state is a boundary condition in the Bellman equation I think. I am not sure what will happen if you initialize this state value function in the goal state to non-zero value. You can try.

    • @dulanjanaperera988
      @dulanjanaperera988 ปีที่แล้ว +1

      @@aleksandarhaber Thanks for the clarification. I understand the reason now.

    • @aleksandarhaber
      @aleksandarhaber  ปีที่แล้ว +1

      @@dulanjanaperera988 good!