Deep Q-Network & Dueling network architectures for deep reinforcement learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 พ.ย. 2024

ความคิดเห็น • 24

  • @hangchen
    @hangchen 5 ปีที่แล้ว +21

    This is the best explanation video I can find for Dueling DQN!

    • @IgorAherne
      @IgorAherne 5 ปีที่แล้ว +2

      Agreed

    • @SandwichMitGurke
      @SandwichMitGurke 5 ปีที่แล้ว +1

      this is exactly what i wanted to write..

    • @kansasllama
      @kansasllama 4 ปีที่แล้ว +1

      same!! fantastic video!

  • @julian540
    @julian540 4 ปีที่แล้ว +3

    This is a clear, careful, and organized presentation of dueling DQNs that you need to see if you're new to these networks!
    Thank you so much, Andrew for your video.

  • @sharp7j
    @sharp7j 2 ปีที่แล้ว +1

    I read the paper and was so confused about the intuitive behind their loss function that subtracted the mean of the action rewards. Now I get in just the first 4 min of this video. Amazing explanation, clearly illustrates the intuition thank you so much!

  • @cmunozcortes
    @cmunozcortes 3 ปีที่แล้ว +2

    Phenomenal explanation. Very illustrative and concise. Thank you!

  • @bagumamartin
    @bagumamartin 5 หลายเดือนก่อน

    Best explanation

  • @aashishadhikari8144
    @aashishadhikari8144 3 ปีที่แล้ว

    This video did not exactly answer the question that I was here for but it gave me several new ideas that are helpful to understand Dueling DQN better. Good work mate.

  • @coroamalarisa5188
    @coroamalarisa5188 3 ปีที่แล้ว +1

    Amazing video, thank you very much for the explanation!

  • @muhammadusama6040
    @muhammadusama6040 6 ปีที่แล้ว +4

    Excellent explanation man. Thanks a lot for your effort.

  • @SandwichMitGurke
    @SandwichMitGurke 5 ปีที่แล้ว +1

    very very good explanation. Now i'm hyped on implementing it :)

  • @niektuytel9519
    @niektuytel9519 3 ปีที่แล้ว +1

    good video underrated publicity man :)

  • @Nissearne12
    @Nissearne12 ปีที่แล้ว

    ❤so good 👍 Tutorial

  • @thomasdelteil1158
    @thomasdelteil1158 6 ปีที่แล้ว

    great video, very clear explanation of the advantage of dueling networks! Thanks

  • @ЭдуардПольников
    @ЭдуардПольников 2 ปีที่แล้ว

    Thanks

  • @Hideonbush-fm2td
    @Hideonbush-fm2td 2 ปีที่แล้ว

    Thanks a lot

  • @Eijgey
    @Eijgey 2 ปีที่แล้ว

    This is excelent

  • @andychoi3589
    @andychoi3589 4 ปีที่แล้ว

    Thanks for the awesome video! One question though.. I didn't quite understand why the mean term acts as a regularizer at 6:55. I understand that A - A.mean() would have value around 0, but I don't see why A.mean() itself enables the layer output A to be centered around 0. Could you briefly explain it if possible? Thank you :)

    • @westoncook981
      @westoncook981 3 ปีที่แล้ว

      I was confused about this too but if you take the gradient of any action's q value w.r.t. the advantage function, it turns out to sum to zero so the mean will not be changed by SGD updates.

  • @IgorAherne
    @IgorAherne 6 ปีที่แล้ว +1

    Andrew, can you please explain the *Implicit* Quantile-Regression Network? arxiv.org/abs/1806.06923
    There is literally no user-friendly explanation on the web as of November 2018, and I can't understand it (although I understood C51 and Quantile-regression DQN)
    It would be a great contribution & help!

  • @ahmettavli4205
    @ahmettavli4205 5 ปีที่แล้ว +3

    Do you have the code for the visual example?

  • @revimfadli4666
    @revimfadli4666 11 หลายเดือนก่อน

    Can this be applied to PPO and other on-policy advantage-based methods by simply flipping the equation (instead of q=a+v, you use a = q-v)?

  • @revimfadli4666
    @revimfadli4666 ปีที่แล้ว

    I find it unfortunate that it's called "dueling" because it's more cooperative like pilot(advantage) and navigator(value)