This is a clear, careful, and organized presentation of dueling DQNs that you need to see if you're new to these networks! Thank you so much, Andrew for your video.
I read the paper and was so confused about the intuitive behind their loss function that subtracted the mean of the action rewards. Now I get in just the first 4 min of this video. Amazing explanation, clearly illustrates the intuition thank you so much!
This video did not exactly answer the question that I was here for but it gave me several new ideas that are helpful to understand Dueling DQN better. Good work mate.
Thanks for the awesome video! One question though.. I didn't quite understand why the mean term acts as a regularizer at 6:55. I understand that A - A.mean() would have value around 0, but I don't see why A.mean() itself enables the layer output A to be centered around 0. Could you briefly explain it if possible? Thank you :)
I was confused about this too but if you take the gradient of any action's q value w.r.t. the advantage function, it turns out to sum to zero so the mean will not be changed by SGD updates.
Andrew, can you please explain the *Implicit* Quantile-Regression Network? arxiv.org/abs/1806.06923 There is literally no user-friendly explanation on the web as of November 2018, and I can't understand it (although I understood C51 and Quantile-regression DQN) It would be a great contribution & help!
This is the best explanation video I can find for Dueling DQN!
Agreed
this is exactly what i wanted to write..
same!! fantastic video!
This is a clear, careful, and organized presentation of dueling DQNs that you need to see if you're new to these networks!
Thank you so much, Andrew for your video.
I read the paper and was so confused about the intuitive behind their loss function that subtracted the mean of the action rewards. Now I get in just the first 4 min of this video. Amazing explanation, clearly illustrates the intuition thank you so much!
Phenomenal explanation. Very illustrative and concise. Thank you!
Best explanation
This video did not exactly answer the question that I was here for but it gave me several new ideas that are helpful to understand Dueling DQN better. Good work mate.
Amazing video, thank you very much for the explanation!
Excellent explanation man. Thanks a lot for your effort.
very very good explanation. Now i'm hyped on implementing it :)
good video underrated publicity man :)
❤so good 👍 Tutorial
great video, very clear explanation of the advantage of dueling networks! Thanks
Thanks
Thanks a lot
This is excelent
Thanks for the awesome video! One question though.. I didn't quite understand why the mean term acts as a regularizer at 6:55. I understand that A - A.mean() would have value around 0, but I don't see why A.mean() itself enables the layer output A to be centered around 0. Could you briefly explain it if possible? Thank you :)
I was confused about this too but if you take the gradient of any action's q value w.r.t. the advantage function, it turns out to sum to zero so the mean will not be changed by SGD updates.
Andrew, can you please explain the *Implicit* Quantile-Regression Network? arxiv.org/abs/1806.06923
There is literally no user-friendly explanation on the web as of November 2018, and I can't understand it (although I understood C51 and Quantile-regression DQN)
It would be a great contribution & help!
Do you have the code for the visual example?
Can this be applied to PPO and other on-policy advantage-based methods by simply flipping the equation (instead of q=a+v, you use a = q-v)?
I find it unfortunate that it's called "dueling" because it's more cooperative like pilot(advantage) and navigator(value)