Just want to say thank you, I was following Andrej Karpathy's make more series and trying to implement everything in C# from scratch with no tensor and no automatic differentiation, I only have raw bi-dimensional arrays and thanks to your explanation I was able to calculate the gradients to train the network! thanks dude your explanation is the best!
Thanks so much brother! I was struggling with this so much. I am following NNs from scratch book by Sentdex and I was stuck at the derivative of softmax because I was not able to understand the notations. Now, I understand that the j=k is referring to the diagonal elements of the gradient matrix :) Thanks
Wish you would have described Y better in the video … eventually figured it out had to go back and forth, but so far this makes a lot of sense, thanks.
Quick question. In most other videos, such as the StatQuest on neural network part 7 deriving the backprop for CCEL and Softmax, they seem to arrive at the answer that the derivative of the loss with respect to the inputs to the softmax would be (in the case that y_true = [0,0,1,0] for ease), [a3, a3, a3- 1, a3]; wheras, you get [a1, a2, a3-1, a4], which is also what i get. Do you know whether this is a discrepany in their work or yours?
This video is everything I asked for! Thank you so much, Meerkat!
Such a good explanation. Appreciated it boss
just like magic, a matrix vector multiplication turned into a vector difference. Thanks alot
Thanks bro, i was struggling to get this since 3 days.. at last got this.. thanks a lot..
Just want to say thank you, I was following Andrej Karpathy's make more series and trying to implement everything in C# from scratch with no tensor and no automatic differentiation, I only have raw bi-dimensional arrays and thanks to your explanation I was able to calculate the gradients to train the network! thanks dude your explanation is the best!
Just wanted to say Thank You !!! Very well explained & so much intuitive 👍
Thanks so much brother! I was struggling with this so much. I am following NNs from scratch book by Sentdex and I was stuck at the derivative of softmax because I was not able to understand the notations. Now, I understand that the j=k is referring to the diagonal elements of the gradient matrix :) Thanks
Wish you would have described Y better in the video … eventually figured it out had to go back and forth, but so far this makes a lot of sense, thanks.
YOU ARE THE BEST THANK YOU, YOU HELP ME TO PREPARE TO MY EXAM I LOVE U!!!!! HELLO FROM RUSSIA
This is one of the best explanation of the softmax I have found, thank you so much!!! Hello from Ukraine
Слава Україні!
@@MeerkatStatistics Героям Слава!
Why not use the jacobian and use the cross entropy derivative as a row vector in the final step
thanks man
thank you boss
at 3:30 , i dont understand why ez1 + ez2 + ez 3 = 1
can someone please explain?
thanks
because they are also divided by the exact same number... so it turns into 1.
@@MeerkatStatistics oh 🤦♀🤦♀duh. Thanks
btw can u tell, when will we need that diagonal part.
Excelente video
Quick question. In most other videos, such as the StatQuest on neural network part 7 deriving the backprop for CCEL and Softmax, they seem to arrive at the answer that the derivative of the loss with respect to the inputs to the softmax would be (in the case that y_true = [0,0,1,0] for ease), [a3, a3, a3- 1, a3]; wheras, you get [a1, a2, a3-1, a4], which is also what i get. Do you know whether this is a discrepany in their work or yours?
wow!!!
Thanks