The full Neural Networks playlist, from the basics to deep learning, is here: th-cam.com/video/CqOfi41LfDw/w-d-xo.html Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
The quotient rule: low-D-high minus high-D-low square the bottom and away we go! My teacher told me this 14 years ago, and I never forgot! Also, thanks for posting! I love these videos!
I don't remember when was the last time that I subscribed to a TH-cam channel but you got my subscription and my gratitude is what I can give you in exchange of this magnificent ,clear, SHORT and understandble video. THANKS!!!
I literally spent 15 minutes trying to figure out this derivative while I put the (original) video on pause. As soon as I pressed to resume, you pointed to this explanation! I now officially consider myself “hard-core” :)
Josh, thanks for these great videos, it really helps a lot for me and so many others who like machine learning! I think you make great videos teaching people about the ideas, but I really hope there can be more videos on how to code under these knowledges. I think it will be wonderful if your videos combine coding and theory together.
I am taking your NN series online to have that protective bubble of easily understandable memories of knowledge before my professor dive into the real "academic knowledge" lol Thanks for the video
I see there is an error in the multiplication of -Psetosa and Pversicolor in 5:57, the correct value would be -0.069, not -0.07. Anyway, thank you for the video, it was most useful! you are doing a great favor to this world with these series
Quotient rule : d/dx(U(x)/V(x)) = U(x)' V(x) - U(x) V'(x) / V(x)^2 , in our case , U is U (x,y,z ) so we use partial derivatives 3 variables : setosa , versicolor and virginica d( E(x) / E(x)+Constant ) is the first equation we derive. Constant= E(versicolor)+E(virginica ). With clever identifications , Josh has a simple final formulae :)
Thank you for the derivative! I want to ask a question at 6:30 that after we calculated the 0.21/-0.07/-0.15, what is the next step of this network? I mean at which part will this network utilize 0.21/-0.07/-0.15? Thank you for reading my question!
Excellent Series !!! . I think for Regression use case you can try a Different example. Drug dosage example looks like classification Problem and there we used SSR as loss function. Like iris Flowers Example was perfect for Classification Problem
The outputs from the SoftMax abide by the technical definition of probabilities, but the actual values are dependent on the random values we used to initialize the model, so they shouldn't be trusted in the same way that we might trust the probabilities that were derived from some statistical framework. To see an example of what I'm talking about, see: th-cam.com/video/KpKog-L9veg/w-d-xo.htmlsi=FlU4E3gH2M0UJsJP&t=489
Early on, at 0.49, I say that the "raw setosa" value is the "raw output value for setosa". In other words, "raw setosa" is the output value from the neural net for setosa before we apply softmax. For more details, see the video that introduces softmax: th-cam.com/video/KpKog-L9veg/w-d-xo.html
So do you add all the 3 probabilities together when you get the derivatives assuming the results for all three predicted outcomes are matrices. Simply put do you need all 3 derivatives or just the derivative of the output you are predicting.
Also if setosa is the the correct prediction ie your yhat, what values represent the incorrect prediction in your code ( from what i understand setosa is when i=j) and the other two is when i != j
ok i guess correct prediction when i=j is yhat(1-yhat) and when i != j is -yhat * yhat. When i add or subtract this to the first function my predictions return nan so im kinda lost on what you do with the equation when i != j. Any help appreciated.
The next video in this series shows how these derivatives are used in practice. So, to answer your question, see: th-cam.com/video/xBEh66V9gZo/w-d-xo.html
@@statquest do the outputs need to be 'labeled'. For example my softmax outputs a one hot encoded vector of results [ awaywin, draw, homewin], but they arent lableled logically so if i want to differentiate with respect to each outcome would you say i need to select each element in the vector individually ex dwin/result = dwin/daway + daway/ddraw + ddraw/dhome + dhome/result. You can tell me to fuck off if im annoying you : )
I show it in part 7 of this series. So, you need to see part 6 first... th-cam.com/video/6ArSys5qHA/w-d-xo.html ...then part 7... th-cam.com/video/xBEh66V9gZo/w-d-xo.html
Great video as usual! In one of your earlier videos you referred to Elements of Statistical Learning as the Bible of machine learning. This text is comparatively light on NNs. Do you have a Bible for NNs that you would recommend?
"When using softmax as the activation function in the output layer of a neural network, the error for each class (or category) can be calculated as the difference between the predicted probability (y_hat) and the true label (y)." Is this all you have to do for softmax backprop? My networks already do this, so I guess I can skip the softmax layer on backprop? So confusing.
With softmax, the loss function is cross entropy (see: th-cam.com/video/6ArSys5qHAU/w-d-xo.html ) and I show how that works with backpropagation here: th-cam.com/video/xBEh66V9gZo/w-d-xo.html
@@statquest Yes. I do use cross entropy to SHOW the loss, but that's all I do with it. The error that gets back propagated is still just the difference between the predicted output and the target output. Everything else I've tried fails big time and my network doesn't learn. EDIT: Note: I'm a programmer trying to do math, not a math guy trying to program lol.
@@statquest This is most likely a confusion on my part with terminology. Namely between calling something the "error" vs "gradient". I've watched many of your videos, sometimes many times too lol. And ya, I've watched those vids as well :).
The full Neural Networks playlist, from the basics to deep learning, is here: th-cam.com/video/CqOfi41LfDw/w-d-xo.html
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
The quotient rule: low-D-high minus high-D-low square the bottom and away we go! My teacher told me this 14 years ago, and I never forgot! Also, thanks for posting! I love these videos!
bam! :)
That’s a good teacher. Stealing this!
My teacher used, "low-D-high minus high-D-low, square the bottom and put it below!" hahaha
@@charlesrambo7845 its just as easy to reconstruct it as d(f*(1/g) )
I don't remember when was the last time that I subscribed to a TH-cam channel but you got my subscription and my gratitude is what I can give you in exchange of this magnificent ,clear, SHORT and understandble video. THANKS!!!
Welcome aboard!
I literally spent 15 minutes trying to figure out this derivative while I put the (original) video on pause. As soon as I pressed to resume, you pointed to this explanation! I now officially consider myself “hard-core” :)
bam!
Josh, thanks for these great videos, it really helps a lot for me and so many others who like machine learning! I think you make great videos teaching people about the ideas, but I really hope there can be more videos on how to code under these knowledges. I think it will be wonderful if your videos combine coding and theory together.
I'll keep that in mind.
your videos helped me have a better understanding of ml in a simpler way than the literature. thank you and waiting for your next videos.
Thank you very much! The next videos should come out on Monday.
Thank you very much, I was struggling to understand the SoftMax derivative, and finally managed to understand.
Hooray! :)
Absolutely awesome,I love going ml over with this kind of videos.
More to come!
I am taking your NN series online to have that protective bubble of easily understandable memories of knowledge before my professor dive into the real "academic knowledge" lol Thanks for the video
Bam!
the step by step derivative explanation is good.
bam!
Excellent Video Brother ❤ Really so addicted to your videos and the way you explain every topics.. Thanks Man ! 🙌
Thanks!
I see there is an error in the multiplication of -Psetosa and Pversicolor in 5:57, the correct value would be -0.069, not -0.07. Anyway, thank you for the video, it was most useful! you are doing a great favor to this world with these series
Thanks!
my bad lol I was a little sleepy, you just rounded the value haha thank you for the content! :)
Statquest should be declared as Universal Treasure !
Triple bam! :)
Of the many AI videos on TH-cam, yours are definitely among the very best.
I'm even getting used to your constant singing, all the time. 🤣
Wow, thank you!
Quotient rule : d/dx(U(x)/V(x)) = U(x)' V(x) - U(x) V'(x) / V(x)^2 , in our case , U is U (x,y,z ) so we use partial derivatives 3 variables : setosa , versicolor and virginica
d( E(x) / E(x)+Constant ) is the first equation we derive. Constant= E(versicolor)+E(virginica ). With clever identifications , Josh has a simple final formulae :)
:)
I love you used 'hard core' to describe guys watching this video.
bam! :)
Remarkable sense of humor :D Laughing while studying
bam! :)
you are great! and cute sometimes saying "Quotient Rule"!!!!
Thank you! 😃
Thank you for the derivative!
I want to ask a question at 6:30 that
after we calculated the 0.21/-0.07/-0.15, what is the next step of this network?
I mean at which part will this network utilize 0.21/-0.07/-0.15?
Thank you for reading my question!
These derivatives will come in handy when we do backpropagation with Cross Entropy. The videos on Cross Entropy will be out soon.
@@statquest
I see! I never connect it to cross entropy!!!
Thank you so much!!!
Great video. I wonder if we will reach the Gaussian processes regression in the quest soon?
I'll keep that in mind.
as always Josh, Thank you
Thanks!
I sincerely thanks to you
Thanks!
Excellent Series !!! . I think for Regression use case you can try a Different example. Drug dosage example looks like classification Problem and there we used SSR as loss function. Like iris Flowers Example was perfect for Classification Problem
What time point, minutes and seconds, are you referring to?
Thank you so much!
You're welcome!
I'm not hardcore, the coursework is...Thank you for helping me out.
Good luck! :)
Video is great! But why probability of setosa in commas? Doesn't softmax convert raw outputs in probabilities based on logits relativity?
The outputs from the SoftMax abide by the technical definition of probabilities, but the actual values are dependent on the random values we used to initialize the model, so they shouldn't be trusted in the same way that we might trust the probabilities that were derived from some statistical framework. To see an example of what I'm talking about, see: th-cam.com/video/KpKog-L9veg/w-d-xo.htmlsi=FlU4E3gH2M0UJsJP&t=489
Hey great video! one question: when you talk about RAWsetosa, what is its value exactly? is it like x in case we were talking about exp(x)? Thanks !
Early on, at 0.49, I say that the "raw setosa" value is the "raw output value for setosa". In other words, "raw setosa" is the output value from the neural net for setosa before we apply softmax. For more details, see the video that introduces softmax: th-cam.com/video/KpKog-L9veg/w-d-xo.html
Hi Josh! Could you please make a video on what metric to look at to evaluate whether a ML model is overfitting?
I'll keep that in mind.
So do you add all the 3 probabilities together when you get the derivatives assuming the results for all three predicted outcomes are matrices. Simply put do you need all 3 derivatives or just the derivative of the output you are predicting.
Also if setosa is the the correct prediction ie your yhat, what values represent the incorrect prediction in your code ( from what i understand setosa is when i=j) and the other two is when i != j
ok i guess correct prediction when i=j is yhat(1-yhat) and when i != j is -yhat * yhat. When i add or subtract this to the first function my predictions return nan so im kinda lost on what you do with the equation when i != j. Any help appreciated.
The next video in this series shows how these derivatives are used in practice. So, to answer your question, see: th-cam.com/video/xBEh66V9gZo/w-d-xo.html
@@statquest thanks, gonna check it now.
@@statquest do the outputs need to be 'labeled'. For example my softmax outputs a one hot encoded vector of results [ awaywin, draw, homewin], but they arent lableled logically so if i want to differentiate with respect to each outcome would you say i need to select each element in the vector individually ex
dwin/result = dwin/daway + daway/ddraw + ddraw/dhome + dhome/result.
You can tell me to fuck off if im annoying you : )
Hey any chance you could do a video on missing data and multiple imputation methods ?
I'll keep that in mind.
can u do the backpropagation to this example pls ?
I show it in part 7 of this series. So, you need to see part 6 first... th-cam.com/video/6ArSys5qHA/w-d-xo.html ...then part 7... th-cam.com/video/xBEh66V9gZo/w-d-xo.html
Hi, I'm having trouble correlating the setosa and P setosa. What is rawSetosa? Is it e^A or A?? Pls somebody help
"rawSetosa" is the value we calculate before calculating the softmax - so, it is one of the 3 input values for the softmax function.
Great video as usual! In one of your earlier videos you referred to Elements of Statistical Learning as the Bible of machine learning. This text is comparatively light on NNs. Do you have a Bible for NNs that you would recommend?
Not yet. I'm writing one, though. I hope for it to be out next year.
Hey Josh, why would we even want to find a derivative for the output layer, as you did with the SoftMax function?
We need the derivative to do backpropagation.
@@statquest I understand, thank you :)
Hi! Thanks for this great video!! Do you have (or plan to have) a video on jacknife?
I'll keep it in mind. I have a video on bootstrap here (which is related): th-cam.com/video/isEcgoCmlO0/w-d-xo.html
Can you please make a tutorial on RNN, LSTM and RL?
I am working on them.
@@statquest Thank you so much Sir. You are a real Guru.
"When using softmax as the activation function in the output layer of a neural network, the error for each class (or category) can be calculated as the difference between the predicted probability (y_hat) and the true label (y)."
Is this all you have to do for softmax backprop? My networks already do this, so I guess I can skip the softmax layer on backprop? So confusing.
With softmax, the loss function is cross entropy (see: th-cam.com/video/6ArSys5qHAU/w-d-xo.html ) and I show how that works with backpropagation here: th-cam.com/video/xBEh66V9gZo/w-d-xo.html
@@statquest Yes. I do use cross entropy to SHOW the loss, but that's all I do with it. The error that gets back propagated is still just the difference between the predicted output and the target output. Everything else I've tried fails big time and my network doesn't learn.
EDIT: Note: I'm a programmer trying to do math, not a math guy trying to program lol.
@@easyBob100 Check out those links I put in the last comment. They might help.
@@statquest This is most likely a confusion on my part with terminology. Namely between calling something the "error" vs "gradient".
I've watched many of your videos, sometimes many times too lol. And ya, I've watched those vids as well :).
I'm in love with Josh..
:)
Hard-core StatQuest I am, StatQuest
bam! :)
up
double up! :)
HMMM I think your drawing is messing it is hard to read even I know how to do the derivative
Can you tell me what time point, minutes and seconds, is confusing?
@@statquest I truly understand all stuff it just hard to read if someone new to this topic may have some trouble
@@ccuuttww OH I see. Understood.
@@statquest If U don't mind I can help u to tidy it up and send it to u tomorrow
@@ccuuttww Sure!