Tutorial 7- Vanishing Gradient Problem

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.ย. 2024

ความคิดเห็น • 199

  • @kumarpiyush2169
    @kumarpiyush2169 4 ปีที่แล้ว +125

    HI Krish.. dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11] +
    [dL/dO21. dO21/dO12. dO12/dW'11] as per the last chain rule illustration. Please confirm

    • @rahuldey6369
      @rahuldey6369 4 ปีที่แล้ว +12

      ...but O12 is independent of W11,in that case won't the 2nd term be zero?

    • @RETHICKPAVANSE
      @RETHICKPAVANSE 3 ปีที่แล้ว +1

      wrong bruh

    • @ayushprakash3890
      @ayushprakash3890 3 ปีที่แล้ว +2

      we don't
      have the second term

    • @Ajamitjain
      @Ajamitjain 3 ปีที่แล้ว +1

      Can anyone clarify this? I too have this question.

    • @grahamfernando8775
      @grahamfernando8775 3 ปีที่แล้ว +29

      @@Ajamitjain dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11]

  • @Vinay1272
    @Vinay1272 2 ปีที่แล้ว +6

    I have been taking a well-known world-class course on AI and ML since the past 2 years and none of the lecturers have made me so interested in any topic as much as you have in this video. This is probably the first time I have sat through a 15-minute lecture without distracting myself. What I realise now is that I didn't lack motivation or interest, nor that I was lazy - I just did not have lecturers whose teaching inspired me enough to take interest in the topics, yours did.
    You have explained the vanishing gradient problem so very well and clear. It shows how strong your concepts are and how knowledgeable you are.
    Thank you for putting out your content here and sharing your knowledge with us. I am so glad I found your channel. Subscribed forever.

  • @Xnaarkhoo
    @Xnaarkhoo 4 ปีที่แล้ว +16

    many years ago in the college I was enjoy watching videos from IIT - before the mooc area, India had and still have many good teachers ! It brings me joy to see that again. Seems Indians have a gene of pedagogy

  • @tosint
    @tosint 4 ปีที่แล้ว +11

    I hardly comment on videos, but this is a gem. One of the best videos explaining vanishing gradients problems.

  • @PeyiOyelo
    @PeyiOyelo 4 ปีที่แล้ว +43

    Sir or As my Indian Friends say, "Sar", you are a very good teacher and thank you for explaining this topic. It makes a lot of sense. I can also see that you're very passionate however, the passion kind of makes you speed up the explanation a bit making it a bit hard to understand sometimes. I am also very guilty of this when I try to explain things that I love. Regardless, thank you very much for this and the playlist. I'm subscribed ✅

    • @amc8437
      @amc8437 3 ปีที่แล้ว +3

      Consider reducing playback speed.

  • @ltoco4415
    @ltoco4415 4 ปีที่แล้ว +7

    Thank you sir for making this misleading concept crystal clear. Your knowledge is GOD level 🙌

  • @sapnilpatel1645
    @sapnilpatel1645 ปีที่แล้ว +1

    so far best explanation about vanishing gradient.

  • @bhavikdudhrejiya4478
    @bhavikdudhrejiya4478 4 ปีที่แล้ว

    Very nice way to explain.
    Learned from this video-
    1. Getting the error (Actual Output - Model Output)^2
    2. Now We have to reduce an error i.e Backpropagation, We have to find a new weight or a new variable
    3. Finding New Weight = Old weight x Changes in the weight
    4. Change in the Weight = Learning rate x d(error / old weight)
    5. After getting a new weight is as equals to old weight due to derivate of Sigmoid ranging between 0 to 0.25 so there is no update in a new weight
    6. This is a vanishing gradient

  • @marijatosic217
    @marijatosic217 3 ปีที่แล้ว +3

    I am amazed by the level of energy you have! Thank you :)

  • @vikrantchouhan9908
    @vikrantchouhan9908 2 ปีที่แล้ว +2

    Kudos to your genuine efforts. One needs sincere efforts to ensure that the viewers are able to understand things clearly and those efforts are visible in your videos. Kudos!!! :)

  • @satyadeepbehera2841
    @satyadeepbehera2841 4 ปีที่แล้ว +3

    Appreciate your way of teaching which answers fundamental questions.. This "derivative of sigmoid ranging from 0 to 0.25" concept was nowhere mentioned.. thanks for clearing the basics...

    • @mittalparikh6252
      @mittalparikh6252 3 ปีที่แล้ว

      Look for Mathematics for Deep Learning. It will help

  • @piyalikarmakar5979
    @piyalikarmakar5979 3 ปีที่แล้ว

    One of the best vedio on clarifying Vanishing Gradient problem..Thank you sir..

  • @classictremonti7997
    @classictremonti7997 3 ปีที่แล้ว

    So happy I found this channel! I would have cried if I found it and it was given in Hindi (or any other language than English)!!!!!

  • @rushikeshmore8890
    @rushikeshmore8890 4 ปีที่แล้ว

    Kudos sir ,am working as data analyst read lots of blogs , watched videos but today i cleared the concept . Thanks for The all stuff

  • @deepthic6336
    @deepthic6336 4 ปีที่แล้ว

    I must say this, normally I am kinda person who prefers to study on own and crack it. Never used to listen to any of the lectures till date because I just don't understand and I dislike the way they explain without passion(not all though). But, you are a gem and I can see the passion in your lectures. You are the best Krish Naik. I appreciate it and thank you.

  • @koraymelihyatagan8111
    @koraymelihyatagan8111 2 ปีที่แล้ว

    Thank you very much, I was wandering around the internet to find such an explanatory video.

  • @himanshubhusanrath2492
    @himanshubhusanrath2492 3 ปีที่แล้ว

    One of the best explanations of vanishing gradient problem. Thank you so much @KrishNaik

  • @skiran5129
    @skiran5129 3 ปีที่แล้ว

    I'm lucky to see this wonderful class.. Tq..

  • @venkatshan4050
    @venkatshan4050 2 ปีที่แล้ว +1

    Marana mass explanation🔥🔥. Simple and very clearly said.

  • @meanuj1
    @meanuj1 5 ปีที่แล้ว +4

    Nice presentation..so much helpful...

  • @MrSmarthunky
    @MrSmarthunky 4 ปีที่แล้ว

    Krish.. You are earning a lot of Good Karmas by posting such excellent videos. Good work!

  • @mittalparikh6252
    @mittalparikh6252 3 ปีที่แล้ว +1

    Overall got the idea, that you are trying to convey. Great work

  • @yousufborno3875
    @yousufborno3875 4 ปีที่แล้ว

    You should get Oscar for your teaching skills.

  • @manujakothiyal3745
    @manujakothiyal3745 4 ปีที่แล้ว +1

    Thank you so much. The amount of effort you put is commendable.

  • @YashSharma-es3lr
    @YashSharma-es3lr 3 ปีที่แล้ว

    very simple and nice explanation . I understand it in first time only

  • @maheshsonawane8737
    @maheshsonawane8737 ปีที่แล้ว

    Very nice now i understand why weights doesn't update in RNN. The main point is derivative of sigmoid is between 0 and 0.25. Vanishing gradient is associated with only sigmoid function. 👋👋👋👋👋👋👋👋👋👋👋👋

  • @it029-shreyagandhi5
    @it029-shreyagandhi5 26 วันที่ผ่านมา

    Great teaching skills !!!

  • @benoitmialet9842
    @benoitmialet9842 3 ปีที่แล้ว +1

    Thank you so much, great quality content.

  • @lekjov6170
    @lekjov6170 4 ปีที่แล้ว +36

    I just want to add this mathematically, the derivative of the sigmoid function can be defined as:
    *derSigmoid = x * (1-x)*
    As Krish Naik well said, we have our maximum when *x=0.5*, giving us back:
    *derSigmoid = 0.5 * (1-0.5) --------> derSigmoid = 0.25*
    That's the reason the derivative of the sigmoid function can't be higher than 0.25

    • @ektamarwaha5941
      @ektamarwaha5941 4 ปีที่แล้ว

      COOL

    • @thepsych3
      @thepsych3 4 ปีที่แล้ว

      cool

    • @tvfamily6210
      @tvfamily6210 4 ปีที่แล้ว +13

      should be: derSigmoid(x) = Sigmoid(x)[1-Sigmoid(x)], and we know it reaches maximum at x=0. Plugging in: Sigmoid(0)=1/(1+e^(-0))=1/2=0.5, thus derSigmoid(0)=0.5*[1-0.5]=0.25

    • @benvelloor
      @benvelloor 4 ปีที่แล้ว

      @@tvfamily6210 Thank you!

    • @est9949
      @est9949 4 ปีที่แล้ว

      I'm still confused. The weight w should be in here somewhere. This seems to be missing w.

  • @MauiRivera
    @MauiRivera 3 ปีที่แล้ว

    I like the way you explain things, making them easy to understand.

  • @faribataghinezhad
    @faribataghinezhad 2 ปีที่แล้ว

    Thank you sir for your amazing video. that was great for me.

  • @vishaljhaveri6176
    @vishaljhaveri6176 3 ปีที่แล้ว

    Thank you, Krish SIr. Nice explanation.

  • @benvelloor
    @benvelloor 4 ปีที่แล้ว +1

    Very well explained. I can't thank you enough for clearing all my doubts!

  • @naresh8198
    @naresh8198 ปีที่แล้ว

    crystal clear explanation !

  • @sumeetseth22
    @sumeetseth22 4 ปีที่แล้ว

    Love your videos, I have watched and taken many courses but no one is as good as you

  • @adityashewale7983
    @adityashewale7983 ปีที่แล้ว

    hats off to you sir,Your explanation is top level, THnak you so much for guiding us...

    • @DEVRAJ-np2og
      @DEVRAJ-np2og 2 หลายเดือนก่อน

      do u completed his full playlist?

  • @elielberra2867
    @elielberra2867 2 ปีที่แล้ว

    Thank you for all the effort you put into your explanations, they are very clear!

  • @aaryankangte6734
    @aaryankangte6734 2 ปีที่แล้ว

    Sir thank u for teaching us all the concepts from basics but just one request is that if there is a mistake in ur videos then pls rectify it as it confuses a lot of people who watch these videos as not everyone sees the comment section and they just blindly belive what u say. Therefore pls look into this.

  • @classictremonti7997
    @classictremonti7997 3 ปีที่แล้ว

    Krish...you rock brother!! Keep up the amazing work!

  • @hiteshyerekar2204
    @hiteshyerekar2204 5 ปีที่แล้ว +4

    Nice video Krish.Please make practicle based video on gradient decent,CNN,RNN.

  • @nabeelhasan6593
    @nabeelhasan6593 2 ปีที่แล้ว

    Very nice video sir , you explained very well the inner intricacies of this problem

  • @sekharpink
    @sekharpink 5 ปีที่แล้ว +33

    Derivative of loss with respect to w11 dash you specified incorrectly, u missed derivative of loss with respect to o21 in the equation. Please correct me if iam wrong.

    • @sekharpink
      @sekharpink 5 ปีที่แล้ว

      Please reply

    • @ramleo1461
      @ramleo1461 5 ปีที่แล้ว

      Evn I hv this doubt

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว +28

      Apologies for the delay...I just checked the video and yes I have missed that part.

    • @ramleo1461
      @ramleo1461 5 ปีที่แล้ว +12

      @@krishnaik06Hey!,
      U dnt hv to apologise, on the contrary u r dng us a favour by uploading these useful videos, I was a bit confused and wanted to clear my doubt that all, thank you for the videos... Keep up the good work!!

    • @rajatchakraborty2058
      @rajatchakraborty2058 4 ปีที่แล้ว

      @@krishnaik06 I think you have also missed the w12 part in the derivative. Please correct me if I am wrong

  • @nikunjlahoti9704
    @nikunjlahoti9704 2 ปีที่แล้ว

    Great Lecture

  • @skviknesh
    @skviknesh 3 ปีที่แล้ว +1

    I understood it. Thanks for the great tutorial!
    My query is:
    weight vanishes when respect to more layers. When new weight ~= old weight result becomes useless.
    what would the O/P of that model look like (or) will we even achieve global minima??

  • @GunjanGrunge
    @GunjanGrunge 3 ปีที่แล้ว

    that was very well explained

  • @అరుణాచలశివ3003
    @అరుణాచలశివ3003 8 หลายเดือนก่อน

    you are legend nayak sir

  • @b0nnibell_
    @b0nnibell_ 4 ปีที่แล้ว

    you sir made neural network so much fun!

  • @krishj8011
    @krishj8011 3 ปีที่แล้ว

    Very nice series... 👍

  • @abdulqadar9580
    @abdulqadar9580 2 ปีที่แล้ว

    Great efforts Sir

  • @muhammadarslankahloon7519
    @muhammadarslankahloon7519 3 ปีที่แล้ว +2

    Hello sir, why the chain rule explained in this video is different from the very last chain rule video. kindly clearly me and thanks for such an amazing series on deep learning.

  • @shmoqe
    @shmoqe 2 ปีที่แล้ว

    Great explanation, Thank you!

  • @nazgulzholmagambetova1198
    @nazgulzholmagambetova1198 2 ปีที่แล้ว

    great video! thank you so much!

  • @nola8028
    @nola8028 2 ปีที่แล้ว

    You just earned a +1 subscriber ^_^
    Thank you very much for the clear and educative video

  • @susmitvengurlekar
    @susmitvengurlekar 3 ปีที่แล้ว

    Understood completely! If weights hardly change, no point in training and training. But I have got a question, where can I use this knowledge and understanding I just acquired ?

  • @spicytuna08
    @spicytuna08 2 ปีที่แล้ว

    you teach better than ivy league professors. what a waste of money spending $$$ on college.

  • @abhinavkaushik6817
    @abhinavkaushik6817 3 ปีที่แล้ว

    Thank you so much for this

  • @hokapokas
    @hokapokas 5 ปีที่แล้ว +4

    Good job bro as usual... Keep up the good work.. I had a request of making a video on implementing back propagation. Please make a video for it.

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว +1

      Already the video has been made.please have a look on my deep learning playlist

    • @hokapokas
      @hokapokas 5 ปีที่แล้ว

      @@krishnaik06 I have seen that video but it's not implemented in python.. If you have a notebook you can refer me to pls

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว +3

      With respect to implementation with python please wait till I upload some more videos

  • @daniele5540
    @daniele5540 4 ปีที่แล้ว +1

    Great tutorial man! Thank you!

  • @AnirbanDasgupta
    @AnirbanDasgupta 3 ปีที่แล้ว

    excellent video

  • @Thriver21
    @Thriver21 ปีที่แล้ว

    nice explanation.

  • @sunnysavita9071
    @sunnysavita9071 5 ปีที่แล้ว

    your videos are very helpful ,good job and good work keep it up...

  • @anusuiyatiwari1800
    @anusuiyatiwari1800 3 ปีที่แล้ว

    Very interesting

  • @lalithavanik5022
    @lalithavanik5022 3 ปีที่แล้ว

    Nice expalnation sir

  • @naughtyrana4591
    @naughtyrana4591 4 ปีที่แล้ว

    Guruvar ko pranam🙏

  • @naimamushfika1167
    @naimamushfika1167 ปีที่แล้ว

    nice explanation

  • @suryagunda4038
    @suryagunda4038 3 ปีที่แล้ว

    May god bless you ..

  • @magicalflute
    @magicalflute 4 ปีที่แล้ว

    Very well explained. Vanishing gradient problem as per my understanding is that, it is not able to perform the optimizer job (to reduce the loss) as old weight and new weights will be almost equal. Please correct me, if i am wrong. Thanks!!

  • @manikosuru5712
    @manikosuru5712 5 ปีที่แล้ว +4

    As usual extremely good outstanding...
    And a small request can expect this DP in coding(python) in future??

  • @AA-yk8zi
    @AA-yk8zi 4 ปีที่แล้ว

    Thank you so much

  • @amitdebnath2207
    @amitdebnath2207 4 หลายเดือนก่อน

    Hats Off Brother

  • @arunmeghani1667
    @arunmeghani1667 3 ปีที่แล้ว

    great video and great explanation

  • @Kabir_Narayan_Jha
    @Kabir_Narayan_Jha 5 ปีที่แล้ว +1

    This video is amazing and you are amazing teacher thanks for sharing such amazing information
    Btw where are you from banglore?

  • @neelanshuchoudhary536
    @neelanshuchoudhary536 4 ปีที่แล้ว +1

    very nice explanation,,great :)

  • @gaurawbhalekar2006
    @gaurawbhalekar2006 4 ปีที่แล้ว

    excellent explanation sir

  • @gouthamkarakavalasa4267
    @gouthamkarakavalasa4267 ปีที่แล้ว

    Gradient Descent will be applied on Cost function right ?-1/m Σ (Y*log(y_pred) + (1-y)* log(1-y_pred))... in this case if they had applied on the activation function, how the algo will come to global minima.

  • @narsingh2801
    @narsingh2801 4 ปีที่แล้ว

    You are just amazing. Thnx

  • @dhananjayrawat317
    @dhananjayrawat317 4 ปีที่แล้ว

    best explanation. Thanks man

  • @nirmalroy1738
    @nirmalroy1738 5 ปีที่แล้ว

    super video...extremely well explained.

  • @gautam1940
    @gautam1940 5 ปีที่แล้ว +3

    This is an interesting fact to know. Makes me curious to see how ReLU overcame this problem

  • @feeham
    @feeham 3 ปีที่แล้ว

    Thank you !!

  • @sunnysavita9071
    @sunnysavita9071 5 ปีที่แล้ว

    very good explanation.

  • @winviki123
    @winviki123 5 ปีที่แล้ว +2

    Could you please explain why bias is needed in neural networks along with weights?

    • @Rising._.Thunder
      @Rising._.Thunder 4 ปีที่แล้ว

      it is because when you want to control or fix the output of a given neuron within a certain range, for example, if the neuron is always giving inputs between 9 and 10, you can put a bias =-9 so as to make the neuron output between 0 and 1

  • @rayyankhattak544
    @rayyankhattak544 2 ปีที่แล้ว

    Great-

  • @ArthurCor-ts2bg
    @ArthurCor-ts2bg 4 ปีที่แล้ว

    Excellent 👌

  • @grownupgaming
    @grownupgaming 3 ปีที่แล้ว

    Why cant they just make the activation function curve aggressively spike up at x=0?

  • @Joe-tk8cx
    @Joe-tk8cx ปีที่แล้ว

    Great video, one question, when you calculate the new weights using the old weight - learning rate x derivative of loss with respect to weight, the derivative of loss wrt weight is that the sigmoid function ?

  • @shahidabbas9448
    @shahidabbas9448 4 ปีที่แล้ว +1

    Sir i'm really confusing about the actual y value please can you tell about that. i thought it would be our input value but here input value is so many with one predicted
    output

  • @aishwaryaharidas2100
    @aishwaryaharidas2100 4 ปีที่แล้ว

    Should we again add bias to the product of the output from the hidden layer O11, O12 and weights W4, W5?

  • @sandipansarkar9211
    @sandipansarkar9211 4 ปีที่แล้ว

    Thanks krish .Video was superb but I am having apprehension I might get lost somewhere .Please provide some reading reference regrading this topic considering as a beginner.Cheers

  • @niazmorshedulhaque4519
    @niazmorshedulhaque4519 4 ปีที่แล้ว

    Dear Sir Splendid tutorial indeed. Please share the reference book link that you are following.

  • @ganeshkharad
    @ganeshkharad 4 ปีที่แล้ว

    nice explaination

  • @_jiwi2674
    @_jiwi2674 4 ปีที่แล้ว

    you meant that the derivative of the sigmoid is between 0 and 0.25, right? I wanted to clarify about that range written in red color. The sigmoid of z would be between 0 and 1, from what I understood. Any reply will be appreciated :)

  • @zoroXgamings
    @zoroXgamings 4 ปีที่แล้ว

    if u ask me which one is more understanding between andrew Ng and krish naik , i think this would be my choice

  • @abhisheksainani
    @abhisheksainani 4 ปีที่แล้ว +1

    In red color you've written that sigmoid of z is between 0 and 0.25 whereas you meant to say that the derivative of sigmoid of z is between 0 and 0.25.

  • @RAZONEbe_sep_aiii_0819
    @RAZONEbe_sep_aiii_0819 4 ปีที่แล้ว +1

    There is a very big mistake at 4:14 sir, you didn't applied the chain rule correctly, check the equation.

  • @rhul0017
    @rhul0017 ปีที่แล้ว

    you talk about changing weights on back propagation by derivation and then suddenly talks about derivating activation function, i dont understand that part, does function triggered during back propagation?

  • @ilyoskhujayorov8498
    @ilyoskhujayorov8498 3 ปีที่แล้ว

    Thank you !

  • @omernaeem1388
    @omernaeem1388 4 ปีที่แล้ว

    Sir GANs ky bhi tutorials bnae plz

  • @kirankumarj8229
    @kirankumarj8229 4 ปีที่แล้ว

    Hi Krish, Thanks for the good videos..God may bless you and your family..
    Do we have merial for ML and DL.If you have how to get it.

  • @ngelospapoutsis9389
    @ngelospapoutsis9389 4 ปีที่แล้ว

    so if we have 2 layers and as we know 1 forward and back step is 1 epoch. If we now have 100 epochs the derivative is going to get smaller every time? Or the vanishing problem is due to many hidden layers and not
    depended on the number of epochs?