Deep Learning 67: Backpropagation in Long Short-Term Memory (LSTM) Architecture

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ก.ย. 2024
  • This lecture discusses the backpropgation in Long Short Term Memory (LSTM) LSTM architecture
    #deeplearning#lstm#backpropagation

ความคิดเห็น • 26

  • @iliasaarab7922
    @iliasaarab7922 ปีที่แล้ว +10

    The only lecture series I've found that actually goes into the needed details, great job!

  • @thatipelli1
    @thatipelli1 3 ปีที่แล้ว +2

    Thank you so much! This was the most in-depth video for LSTMs.

  • @Pankajjadwal
    @Pankajjadwal 4 ปีที่แล้ว +6

    You deserve more likes sir...I loved it

  • @jordicarbonell1359
    @jordicarbonell1359 2 ปีที่แล้ว +1

    Congratulations sir, a very clear explanation, not only here but in your other videos as well. Very helpful!!

  • @Тима-щ2ю
    @Тима-щ2ю 5 หลายเดือนก่อน

    A huge amount of work done by you. Thanks teacher!!!

  • @GCS557
    @GCS557 3 ปีที่แล้ว +5

    Great lecture. Could you please upload your next video soon. Wanted to know on why vanishing gradient problem is not happening in LSTM.

  • @sourabhverma9034
    @sourabhverma9034 2 หลายเดือนก่อน +1

    This is not Backprop through time, this is just normal backprop. It does not work on LSTMs or even RNNs as the differential of loss on input weight does not only depend on hidden state, but as each hidden state in time t depends on state in t-1 which in turn depends on input weights again. the differential itself propagates backwards through all time steps. This was the whole point of the paper "Backpropagation through time".

  • @meetvardoriya2550
    @meetvardoriya2550 4 ปีที่แล้ว +2

    Sir your lectures are really awesome ✨
    A small request sir,if you can make videos on machine learning algorithms, it'll be very helpful ✨

  • @satyajitdas2780
    @satyajitdas2780 4 ปีที่แล้ว +1

    Thanks Sir! Nice explained backpropagation!

  • @nickparana
    @nickparana 2 ปีที่แล้ว +1

    why Ht-1 dimension is HxH, shouldn't be BxH?

  • @doyugen465
    @doyugen465 3 ปีที่แล้ว +1

    great lecture Sir. I was just wondering at 20:00 the derivation of Ct has its first component as (Yhat - Y) but then a few seconds later you refer to the same derivation as (Yhat - Yhat). is this a mistake? if not, does this simply meen subracting each element of Yhat by itself?

  • @buh357
    @buh357 5 หลายเดือนก่อน

    this is gold.

  • @nickparana
    @nickparana 2 ปีที่แล้ว +1

    awesome explanation. One question, shouldn't we calculate the derivative of the Loss with respect to all biases as well?

    • @Тима-щ2ю
      @Тима-щ2ю 5 หลายเดือนก่อน

      yes, bias is also a tunable parameter. he just forgot about it

  • @AmithAdiraju1994
    @AmithAdiraju1994 7 หลายเดือนก่อน +1

    Looks like the backward propagation here , is only doing it for current time step , and not going back in time similar to RNN. Ex: Since both g2 and h1 have Wg in their formulas indirectly , shouldn't we add Dg2h1 to Dg2Wg , as was case in RNN.
    Could anybody help clarify ?

    • @Тима-щ2ю
      @Тима-щ2ю 5 หลายเดือนก่อน

      Yeah, i think you are correct, these formulas look easier than for RNN, it should be the other way around. I think the real formulas are more complex.
      But according to this paper formulas are correct arxiv.org/pdf/1610.02583.pdf . I am confused...

  • @bangarrajumuppidu8354
    @bangarrajumuppidu8354 2 ปีที่แล้ว

    clear cut explanation thank u sir

  • @YohaneesHutagalung
    @YohaneesHutagalung 2 ปีที่แล้ว

    Nice and thanks 👍👍

  • @devran4169
    @devran4169 3 ปีที่แล้ว

    Finally, subtitle were added.

  • @BoneySinghal
    @BoneySinghal 3 ปีที่แล้ว

    Deep Leaning lecture 68 and 69 are not available yet.

  • @kunalbharali5181
    @kunalbharali5181 3 ปีที่แล้ว

    Sir, thanks a lot.

  • @012345678952752
    @012345678952752 3 ปีที่แล้ว

    Thank you!

  • @NagulSrnurthy
    @NagulSrnurthy 3 ปีที่แล้ว

    Hi.. Is vanishing gradient video available for LSTM?

  • @Тима-щ2ю
    @Тима-щ2ю 5 หลายเดือนก่อน

    LSTM architecture is more complex than RNN architecture, but formulas for gradient look much easier than for RNN. I think your formulas are incorrect. For example: dL / dWi = .....dit/dwi, and you write that dit/dwi = ht-1, but ht-1 is also depends on wi, because ht-1 = Ot-1*tanh(ft-1 * ct-2 + it-1 * gt-1), where it-1 also pedends on on wi

  • @jordiwang
    @jordiwang ปีที่แล้ว

    goodddd jobbbb my man, you are the freaking only person who goes into the details, i fucking love you please marry me.

  • @user-ux8iv1zk5e
    @user-ux8iv1zk5e 2 ปีที่แล้ว

    果然是印度的英语😥