Continuous Action Space Actor Critic Tutorial

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ธ.ค. 2024

ความคิดเห็น • 26

  • @E4asyTut0rial
    @E4asyTut0rial 6 ปีที่แล้ว +10

    I rarely write comments but I really like your tutorials, as a RL beginner I didn't find anyone explaining it as simple and clear as you. Thank you!

  • @nadni1
    @nadni1 5 ปีที่แล้ว +1

    This tutorial series is great. Linear, concise, clear. Thank you so much

  • @ivanof90
    @ivanof90 ปีที่แล้ว

    This explanation helps me a lot! Thank you!

  • @curumo_curunir
    @curumo_curunir 2 ปีที่แล้ว

    Thank you for the nice explanations and video. It is useful. I hope your videos about ML&Data Science will continue.

  • @miraclemaxicl
    @miraclemaxicl 5 ปีที่แล้ว

    Thanks for the video. I just started looking into RL and this helped me solve OpenAI's mountain car in continuous action space.

  • @bjarke7886
    @bjarke7886 3 ปีที่แล้ว +4

    why use obscure libraries like ptan in your code? It just makes it frustrating to work with...

  • @santhoshjayaraman1112
    @santhoshjayaraman1112 ปีที่แล้ว +1

    sir, can you share any book name or any resources link or something to get more knowledge about continuous action space RL concepts. Please,please

  • @whyzzy2683
    @whyzzy2683 2 ปีที่แล้ว

    This is greate tutorial. Thanks for the talk.

  • @jackhuang468
    @jackhuang468 5 ปีที่แล้ว

    Very clear explaining. Thank you so much.

  • @ekorudiawan
    @ekorudiawan 4 ปีที่แล้ว

    Very good explanation,, please update with more algorithm like DDPG, TD3

  • @ShusenWang
    @ShusenWang 4 ปีที่แล้ว

    Great lecture. Is there a paper that study what you introduced?

    • @CommanderCraft98
      @CommanderCraft98 3 ปีที่แล้ว +1

      The code he uses is from a book "Deep reinforcement learning hands on" by Maxim Lapan

  • @sedi4361
    @sedi4361 5 ปีที่แล้ว +2

    nice explanation, but you could mention maxim lapan, since you take all the code from him.

  • @gideonprior4842
    @gideonprior4842 4 ปีที่แล้ว

    Colin, great tutorial. Can you explain how the new policy probs are different from the old policy probs? The new policy is given the same observations and actions taken, and since at the onset of training the old and new policy are the same neural net, how do we get an update? My score of np.exp(new_log_probs - old_log_probs) is 1 because the policies are the same, the update is nonzero initially only due to the entropy bonus. Do I need a target network similar to DDQN? Thanks for making these btw, they are solid.

  • @tanujajoshi1901
    @tanujajoshi1901 2 ปีที่แล้ว

    The tutorial is for a stochastic policy or continuous action spaces?

  • @sunaryaseo
    @sunaryaseo 2 ปีที่แล้ว

    Hi! It's a nice video. I wonder if I have continuous action values ranging from 50 to 150, which activation function should I use in the output Actor-network, and how to sample between those values from its probability?

    • @cuongnguyenuc1776
      @cuongnguyenuc1776 10 หลายเดือนก่อน

      I think you still use the tanh activation multiply with range of your action values and add bias to it. The bias should move tanh function to mean of the action values!

  • @tato_good
    @tato_good 3 ปีที่แล้ว

    Very good man!

  • @vadimavkhimenia5806
    @vadimavkhimenia5806 2 ปีที่แล้ว

    Is AC still the leading algorithm for tasks such as self-driving?

  • @ecoflex0030
    @ecoflex0030 6 ปีที่แล้ว

    Thanks for this tutorial!

  • @ravingswe
    @ravingswe 4 ปีที่แล้ว

    Really helpful! Thx a lot.

  • @anilkurkcu3389
    @anilkurkcu3389 6 ปีที่แล้ว

    When is the next video coming?

  • @camus6525
    @camus6525 4 ปีที่แล้ว

    Thank's !!!
    U r awesome !!!!

  • @FluxProGaming
    @FluxProGaming 5 ปีที่แล้ว

    Thanks!!

  • @FluxProGaming
    @FluxProGaming 5 ปีที่แล้ว +4

    Moo

  • @Mohammadmohammad-ze7ru
    @Mohammadmohammad-ze7ru 5 ปีที่แล้ว

    Thanks!!