Hi! It's a nice video. I wonder if I have continuous action values ranging from 50 to 150, which activation function should I use in the output Actor-network, and how to sample between those values from its probability?
I think you still use the tanh activation multiply with range of your action values and add bias to it. The bias should move tanh function to mean of the action values!
Colin, great tutorial. Can you explain how the new policy probs are different from the old policy probs? The new policy is given the same observations and actions taken, and since at the onset of training the old and new policy are the same neural net, how do we get an update? My score of np.exp(new_log_probs - old_log_probs) is 1 because the policies are the same, the update is nonzero initially only due to the entropy bonus. Do I need a target network similar to DDQN? Thanks for making these btw, they are solid.
I rarely write comments but I really like your tutorials, as a RL beginner I didn't find anyone explaining it as simple and clear as you. Thank you!
This tutorial series is great. Linear, concise, clear. Thank you so much
Thank you for the nice explanations and video. It is useful. I hope your videos about ML&Data Science will continue.
This explanation helps me a lot! Thank you!
Thanks for the video. I just started looking into RL and this helped me solve OpenAI's mountain car in continuous action space.
sir, can you share any book name or any resources link or something to get more knowledge about continuous action space RL concepts. Please,please
why use obscure libraries like ptan in your code? It just makes it frustrating to work with...
Very clear explaining. Thank you so much.
Is AC still the leading algorithm for tasks such as self-driving?
nice explanation, but you could mention maxim lapan, since you take all the code from him.
This is greate tutorial. Thanks for the talk.
Hi! It's a nice video. I wonder if I have continuous action values ranging from 50 to 150, which activation function should I use in the output Actor-network, and how to sample between those values from its probability?
I think you still use the tanh activation multiply with range of your action values and add bias to it. The bias should move tanh function to mean of the action values!
Great lecture. Is there a paper that study what you introduced?
The code he uses is from a book "Deep reinforcement learning hands on" by Maxim Lapan
Very good explanation,, please update with more algorithm like DDPG, TD3
Colin, great tutorial. Can you explain how the new policy probs are different from the old policy probs? The new policy is given the same observations and actions taken, and since at the onset of training the old and new policy are the same neural net, how do we get an update? My score of np.exp(new_log_probs - old_log_probs) is 1 because the policies are the same, the update is nonzero initially only due to the entropy bonus. Do I need a target network similar to DDQN? Thanks for making these btw, they are solid.
The tutorial is for a stochastic policy or continuous action spaces?
Thanks for this tutorial!
Very good man!
When is the next video coming?
Really helpful! Thx a lot.
Thank's !!!
U r awesome !!!!
Thanks!!
Moo
Thanks!!