The full Neural Networks playlist, from the basics to deep learning, is here: th-cam.com/video/CqOfi41LfDw/w-d-xo.html Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
I’ve inquired about the reasons behind using the logarithm to calculate the loss for so long, no one could explain well enough to develop intuition about it. This did it. Thank You!
I was predetermined that I would need to watch several videos to grasp this concept. OMG!! You have explained it so intuitively. Thanks a lot for saving my time and energy.
Your videos are the best for fundamental knowledge regarding ML/AI. I'im in transformer for 4 months, and i come back very often for the fundamental thing. THANKS Josh !!
What is this guy made of??? what does he eat??? Are you a God?? An alien?? You are so smart and dope man!!! How do you do all this? He should be a lecturer at MIT! SO underrated content💞💞💞💞💞💞
I admire this professor a lot. I hope one day be a good teacher like you. Salute from Brazil. In my classes, I try to do it also, take a subject and make it easy as possible.
Another fantastic video. You make these topics such straightforward to understand, that most lecturers overcomplicate by writing down unnecessarily long formulas and just showing off with their knowledge. Thanks a lot!
Hello Josh! I have to say WOW!! I love every single of your videos!! They are so educational. I recently started studying ML for my master's degree and from the moment I found your channel ALL my questions that I wonder get answered! Also, I noticed that u reply to every post in the comment section. I am astonished.. no words. A true professor! Thanks for everything! Thank you for being a wonderful teacher.
Thank you sooo much I have a masters degree in CS and this is substantially better than anything I learnt in college, I understand it at an intuitive level. Thank you sooo much!!
Thank you for saving so much of my time. There are so many blogs on NN that I have wasted so many hours and days on across various topics, then I found your channel. Thank God for that.
Josh, you are a savior man. I cannot emphasize this enough. I would have given up on understanding these concepts long ago if you had you not made these videos.
What an awesome channel. I've been learning and using ML for years and still these videos help me build intuition around basic concepts that I realize I never had. Also, love the songs and the BAMs. Thank you!
this really clarified so much of the concepts of this topic! i always wondered what is the purpose of having cross entropy when we can use other loss functions like mean squared error! thank you so much!
I am currently starting my bachelor's thesis on particle physics and I was told that a big part of it consists in running a neural network with pytorch. Your videos are really really useful and thanks to you I have at least have a vague idea on how a NN works. Looking forward to watch the rest of your Neural Networks videos!! TRIPLE BAM!!
Love each and every video by StatQuest. Thank you Josh and team for providing such clear, easy-to-digest concepts with a bonus of fun and entertainment. Quadruple BAM!!!
I'm in a stats phd program and we had a guest speaker last week. During the lunch time the speaker asked us which course we liked most in our school, one of my classmates said actually no, he likes statquest "course" the most. And I was like nodding my head 100 times per minute. We discussed like why US universities hire professors good at research but not hire professors good at teaching, why there is no tenure-tracked teaching position......US education system really needs to change
Thank you! And that's a good question. One day I'd like to teach a real course somewhere. I love making videos, and want to do it forever, but it's also very lonely, and maybe teaching in person would change that.
@@statquest I hope you don't feel lonely, as I value your feelings and I am very willing to give you feedbacks, I think video is a very gorgeous format, we can pause and ponder, we can revisit when we forget, for example now, It's my 2nd time to rewatch this video after months to solidify my memory ! Thank you Josh ! I love you so much !
as always a truly wonderful presentation. It could be good to do the KL Divergence first and then explain that minimizing the KL divergence results in minimizing the cross entropy.
An exeptional video as always !Im just gettin into ML and i havent found a single difficulty yet thanks to your videos ,it all seems so natural and rational .I really appreciate you man ! I d like to point out,tho, that in CrossEntropy function maybe you mean -ln(p) and not log cuz the maths there aint mathing :))
In statistics and machine learning, log = log base 'e' = ln. This is the standard convention, which is why I use it and I tried to clarify this at 2:33. However, I've also made a little song that might make it easier to remember: th-cam.com/video/iujLN48gumk/w-d-xo.html
Please note that the image for -log("p") at 7:07 is incorrect. Both -log(x) and -ln(x) are 0 at x=1. The image also makes it looks like the function asymptotically approaches 0, but that is not the case. It is actually a much steeper decent than pictured.
Hi. Can I ask where the formula for Cross Entropy is defined. It appears around minute 2:22. Is that the definition of it? It looks like the definition of Entropy although at minute 2:22 it doesn't have the sigma sign at the beginning. In Wikipedia, I see a definition but it is not exactly this one. It is -E sub p of log q. I didn't see a definition of Cross Entropy in this video though. Is there another video where Josh defines cross entropy? I saw his supremely wonderful video on Entropy but I don't see any more. I more or less understand the argument that the observed probability that the data comes from virginica and versicolor are zero. Any help would be greatly appreciated! BAM
Thank you for the videos. They are really helpful. I have question about the softmax process in minutes 5:13. All outputs for Versicolor species, were already smaller than 1. Even all outputs smaller than 1, do we have to continue to process of softmax ? And Can't we use raw outputs for cross entropy? Thank you again.
Not only should probabilities be between 0 and 1, but when we add up all possible options (Setosa, Versicolor and Virginica) they should add up to 1. Using the softmax function ensures that both of those are true.
@4:49 is it because we know data is from Virginica, we are putting its probability [0.58] there, or 0.58 is the maximum thus we are taking the observation as Virginica? What if the prob were [Setosa, Virginica, Versicolor] = [0.5, 0.2, 0.3], in that case, will we still take 0.2, or take it as Setosa?
We save 0.58 as the probability, and then use it for cross entropy, because we know the data is from Viriginica. If we knew it was from Setosa or Versicolor, we would have used 0.22.
It's because the range of values for cross entropy is much larger than the range of values for the residual^2. In other words, the scale of the y-axis makes the residual^2 look flat.
I am a student at Merton College, Oxford University. Please consider visiting our university sometime. Thank you for your absolutely brilliant content.
3:28 Please correct me if I'm wrong, but the negative sign is before the summation sign, so shouldn't it be - ObservedSetosa*log(PredictedSetosa) + ObservedVersicolor*log(PredictedVersicolor) + ObservedVirginica*log(PredictedVirginica)? Like shouldn't the negative signs be positive instead?
Thanks! One question, in 5:40 you measure the total cross entropy as the sum of the 'train' set cross entropy. Could it be bias in unbalanced datasets? Do you recommend this method in these datasets? Thanks again
I can imagine it takes a lot of time to make the videos. Thanks for the amount of efforts! cross entropy function is convex while squared error is not due to the logit function in softmax.
I come for the great content, but stay for the "beep boop beep boop" calculation noises. Please make a coffee mug with the calculation nosies and I will buy it.
Hello josh. could you explain me how you used entropy function here? I've saw the entropy StatQuest for data science and now I'm wondering why you used observed probability outside of the log and predicted inside it not the opposite (observed inside and predicted outside) I know that results in log(0) which is undefined but I'm seeking the exact intuition. Thank you in advance :)
@@statquest Thank you. I watched it again and let me express my Q differently. I say we know that generally the entropy equation is this: entropy = -sum(p*log2(p)) and in this equation we have two p. one of them is multiplying the log and the other is its argument. now in cross entropy we have two probability, one we know by having training set and the second one is calculated by the NN (i.e. : first is observed and second is predicted ). I want to know why we have this equation for cross entropy : -sum(Pobserved*log(Ppredicted) and not this one: -sum(Ppredicted*log(Pobserved).
@@mahdimohammadalipour3077 For details, see: en.wikipedia.org/wiki/Cross_entropy (by the way, I plan on covering the Kullback-Leibler divergence soon). Anyway the key part of that article is in the section titled "Motivation". It says... "Therefore, cross-entropy can be interpreted as the expected message-length per datum when a wrong distribution q is assumed while the data actually follows a distribution p. That is why the expectation is taken over the true probability distribution p and not q."
Hi Josh, it is me again. Thank you for all these amazing videos. Currently upgrading my own programmed NN to support classification. However, my own softMax function results in slightly different values around 1:30 namely .68, .11 and .21. So, is this just due to rounding or is there something wrong with my function. Many thanks in advance!
As you said in the current video, if the cross-entropy function helps us more in the gradient descent process than what Sum of squared function does, why don't we use the same cross-entropy for the optimization of linear models such as Linear regression also.. why we use SS there and not entropy ? Thank you for the wonderful videos to understand the math and functions ..
Hi Josh. You have prepared a amazing video again. Firstly thanks a lot for this. I have a question. You said that sum of predicted probabilities must be equal to 1. But sum of the probabilities in the video differs 1. are they random datas to explain for video making or I am wrong?Please clarify this issue. Thanksss a lot in advance :) Bam !!
@@technojos The column of numbers for "p" are not supposed to add up to 1 because each row is a value taken from a completely different set of SoftMax values (however, each set of SoftMax values does, in fact, add up to 1). Let me explain: At 2:14 we are running the first row of the training data through the neural network and at 2:22 we are applying SoftMax to the raw Output values. The corresponding probabilities for Setosa, Versicolor and Viriginica are: 0.57, 0.20 and 0.23. If we add those up, we get 1. Bam. However, because the first row of training data is for Setosa, we select the probability for Setosa, 0.57, and add it to the table (and then use that value to calculate the cross entropy). We then do the same thing for the second row of training data, which is for Viriginica. In this case, the SoftMax values at 4:32 are 0.22, 0.22 and 0.58 which add up to 1.02 instead of 1 because of rounding errors, but the idea is the same. We then select the one probability associated with Virginica (because the training data is for Virginica), 0.58, and add it to the table (and we use that to calculate cross entropy). Thus, the column in the table, "p", refers to the specific probability from a set of probabilities calculated for that row, and thus, there is no need for the column itself to also add up to 1. Does that make sense?
What happens if we want to use soft labels? Would the cross entropy loss still be a good loss function that would help the neural net converge to a good predictor of the soft labels?
Hi Josh, just want to let you know that the link for "Neural networks with multiple inputs and outputs" in the description is broken (though I was able to find the video in your Neural Network playlist).
Hey Josh, I saw one of your videos about entropy in general - which is a way to measure uncertainty or surprise. Regarding Cross Entropy, the idea is the same - but now it's for the SoftMax outputs for the predicted Neural Network values?
but in a Regression problem we still using SSR right ? so what will happen if we still using SSR in a Classification problem and after the backpropagation ends his work Check for the maximum output ? is that because if we have an output = 1.64 and the observed = 1 it also tends to decrease the distance so we needed to invent a function to control what is the min and maximum value ? in our case 0 and 1
Argmax and Softmax are using only with classification? Or could use them also with regression? And same question for cross entropy is used for classification only? Thank you
still trying to wrap my head around how is this related to entropy if entropy is the expected surprise, it's like we are using a different distribution for the surprise and the expected value
The full Neural Networks playlist, from the basics to deep learning, is here: th-cam.com/video/CqOfi41LfDw/w-d-xo.html
Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
Josh once again demonstrates his amazing ability to simplify complicated topics into elemental concepts that can be easily understood. BAM!
Thanks Neil!
@@statquest it is amazing ! thank you SQ!
I’ve inquired about the reasons behind using the logarithm to calculate the loss for so long, no one could explain well enough to develop intuition about it. This did it. Thank You!
Thanks!
I was predetermined that I would need to watch several videos to grasp this concept. OMG!! You have explained it so intuitively. Thanks a lot for saving my time and energy.
Thank you! :)
Your videos are the best for fundamental knowledge regarding ML/AI. I'im in transformer for 4 months, and i come back very often for the fundamental thing. THANKS Josh !!
Thank you very much!
Happy teacher's day, from India. It's teacher's day today in India. Thanks for all your teaching
Thank you very much!
I just can’t believe how you opened my eyes. How can you be so awesome 👌👌. Sharing this knowledge for free is amazing.
Happy to help!
What is this guy made of??? what does he eat??? Are you a God?? An alien?? You are so smart and dope man!!! How do you do all this? He should be a lecturer at MIT! SO underrated content💞💞💞💞💞💞
Wow, thanks!
@@statquest OMEGA BAAM!
I admire this professor a lot. I hope one day be a good teacher like you. Salute from Brazil. In my classes, I try to do it also, take a subject and make it easy as possible.
Muito obrigado!
This is a life saver! Thank you so much again and again. Love your simple and elegant explainations.
Thank you very much! :)
So refreshing and so different from the mathematical riddles that are used in university to teach us this stuff. Thank you!
You're very welcome!
Another fantastic video. You make these topics such straightforward to understand, that most lecturers overcomplicate by writing down unnecessarily long formulas and just showing off with their knowledge. Thanks a lot!
Glad it was helpful!
Hello Josh!
I have to say WOW!! I love every single of your videos!! They are so educational. I recently started studying ML for my master's degree and from the moment I found your channel ALL my questions that I wonder get answered! Also, I noticed that u reply to every post in the comment section. I am astonished.. no words. A true professor!
Thanks for everything! Thank you for being a wonderful teacher.
Thank you very much! :)
Thank you sooo much I have a masters degree in CS and this is substantially better than anything I learnt in college, I understand it at an intuitive level. Thank you sooo much!!
Thank you!
Thank you for saving so much of my time. There are so many blogs on NN that I have wasted so many hours and days on across various topics, then I found your channel. Thank God for that.
Glad I could help!
Josh, you are a savior man. I cannot emphasize this enough. I would have given up on understanding these concepts long ago if you had you not made these videos.
Glad they are helpful! :)
This is the shortest and the easiest explanation. Excellent job Josh!
Awesome, thank you!
I would not be able to get how neural networks fundamentally work without this series. Thank you so much Josh! Amazing and clear explainations!
Happy to help!
What an amazing video. Never found any content or video better than this one anywhere on this topic. Thank you so much.
Glad it was helpful!
What an awesome channel. I've been learning and using ML for years and still these videos help me build intuition around basic concepts that I realize I never had. Also, love the songs and the BAMs. Thank you!
Thank you very much!
Whenever I 'wonder' while watching statquest, josh tells me the solution just after:)
bam! :)
He just cant stop getting better, THANK YOU MA MAN!
Thanks!
this really clarified so much of the concepts of this topic! i always wondered what is the purpose of having cross entropy when we can use other loss functions like mean squared error! thank you so much!
Awesome! I'm glad the video was helpful.
The hero we wanted, and the hero we needed, StatQuest...
Thanks!
I am currently starting my bachelor's thesis on particle physics and I was told that a big part of it consists in running a neural network with pytorch. Your videos are really really useful and thanks to you I have at least have a vague idea on how a NN works. Looking forward to watch the rest of your Neural Networks videos!! TRIPLE BAM!!
Awesome! And good luck with your thesis!
Love each and every video by StatQuest. Thank you Josh and team for providing such clear, easy-to-digest concepts with a bonus of fun and entertainment. Quadruple BAM!!!
Thank you!
This video helps a lot! The explanation is brief and clear.
Thank you!
The softmax fuzzy bear cracks me up so much :D Fantastic video Josh!
Thanks! 😀
really good explanation .Difference of squared error vs cross entropy is very well explained .
Thank you very much!
Hey Josh!
the way you teach is incredible. THANKS A LOT!❤
Thank you! 😃
Wow , I feel when I say thank you it's nothing in compare with what you do ! Very impressive❤❤
Thank you!
this is just the best channel in youtube!
Thank you very much! :)
Your video makes my mind triple BAM!!
HOORAY! :)
That wasn't shameless self-promotion. That was selfless giving.
Thank you! :)
very very intuitive and very great explanation
Glad you liked it!
amazing work, can't wait to start on the book once I finish all your videos
BAM! :)
That is an outstanding teaching video, thank you tons!
BAM! :)
Just wow, thumbs up, great explanation sir
Thanks!
@@statquest Just bought the PDF version of your book 'The StatQuest Illustrated Guide to Machine Learning'....really excited
@@ShujaurRehmanToor Hooray!!! Thank you so much for your support! I hope you enjoy it!
@@statquest you are welcome, I am doing PhD so yes I hope it helps me build a good understanding of machine learning
Love your videos, they are so intuitive!
Thanks!
It is on time. I am actually using tensorflow for an image classifier, thank you for your video :)
Glad it was helpful!
Excellent teaching skill
Thank you! :)
Thank you for this video! It and others helped me pass my exam! :D
TRIPLE BAM!!! Glad it helped!
Amazing explanation as always! BAM!
:)
Thank you. Another complicated topic made simple. !!!!
bam! :)
Great explanation in an entertaining way. Bam!
Glad you liked it!
I'm in a stats phd program and we had a guest speaker last week. During the lunch time the speaker asked us which course we liked most in our school, one of my classmates said actually no, he likes statquest "course" the most. And I was like nodding my head 100 times per minute. We discussed like why US universities hire professors good at research but not hire professors good at teaching, why there is no tenure-tracked teaching position......US education system really needs to change
Thank you! And that's a good question. One day I'd like to teach a real course somewhere. I love making videos, and want to do it forever, but it's also very lonely, and maybe teaching in person would change that.
@@statquest I hope you don't feel lonely, as I value your feelings and I am very willing to give you feedbacks, I think video is a very gorgeous format, we can pause and ponder, we can revisit when we forget, for example now, It's my 2nd time to rewatch this video after months to solidify my memory ! Thank you Josh ! I love you so much !
@@exoticcoder5365 Thank you!
2:30 wtf just happened 😂 I enjoy watching your videos. Thank you for the great explanations!!
:)
Thank you so much! Your videos are always very clear and easy to follow :)
Glad you like them!
YOU VIDEO and work is so GREAT man
Thank you!
I found my treasure. So great!
Thanks!
Great Explanation! Thank you!
Thank you!
as always a truly wonderful presentation. It could be good to do the KL Divergence first and then explain that minimizing the KL divergence results in minimizing the cross entropy.
I'll keep that in mind.
Really well explained! Thanks Josh :)
Thank you!
An exeptional video as always !Im just gettin into ML and i havent found a single difficulty yet thanks to your videos ,it all seems so natural and rational .I really appreciate you man !
I d like to point out,tho, that in CrossEntropy function maybe you mean -ln(p) and not log cuz the maths there aint mathing :))
In statistics and machine learning, log = log base 'e' = ln. This is the standard convention, which is why I use it and I tried to clarify this at 2:33. However, I've also made a little song that might make it easier to remember: th-cam.com/video/iujLN48gumk/w-d-xo.html
I love this this way of learning!
Hooray! :)
Thank you, this makes sense now.
Happy to help!
thank you so much for your explanation!
Glad it was helpful!
This is a amazing video
Thanks!
This video is godlike. Thank you.
:)
great video as always.
Thank you!
Josh u r vedios are amazing great work 👏 ❤
Thank you so much 😀
Wonderful explanation
Thank you! :)
Please note that the image for -log("p") at 7:07 is incorrect. Both -log(x) and -ln(x) are 0 at x=1. The image also makes it looks like the function asymptotically approaches 0, but that is not the case. It is actually a much steeper decent than pictured.
I agree that the y-axis is confusing, but I wouldn't say it is incorrect since the y-axis values are not labeled.
Pure Brilliance
Thanks!
love your songs!
Thank you!
Hi. Can I ask where the formula for Cross Entropy is defined. It appears around minute 2:22. Is that the definition of it? It looks like the definition of Entropy although at minute 2:22 it doesn't have the sigma sign at the beginning. In Wikipedia, I see a definition but it is not exactly this one. It is -E sub p of log q. I didn't see a definition of Cross Entropy in this video though. Is there another video where Josh defines cross entropy? I saw his supremely wonderful video on Entropy but I don't see any more. I more or less understand the argument that the observed probability that the data comes from virginica and versicolor are zero. Any help would be greatly appreciated! BAM
See 2:51
It's amazing !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Thanks!
Thank you for the videos. They are really helpful. I have question about the softmax process in minutes 5:13. All outputs for Versicolor species, were already smaller than 1. Even all outputs smaller than 1, do we have to continue to process of softmax ? And Can't we use raw outputs for cross entropy? Thank you again.
Not only should probabilities be between 0 and 1, but when we add up all possible options (Setosa, Versicolor and Virginica) they should add up to 1. Using the softmax function ensures that both of those are true.
@@statquest Oh I see. Thanks. BAM! :)))
This is great! Thank you!!!
Glad you liked it!
you rock josh!!
Thanks!
Kudos!!!! 🙌🏻 BAM!!!!!
Thanks!
@4:49 is it because we know data is from Virginica, we are putting its probability [0.58] there, or 0.58 is the maximum thus we are taking the observation as Virginica? What if the prob were [Setosa, Virginica, Versicolor] = [0.5, 0.2, 0.3], in that case, will we still take 0.2, or take it as Setosa?
We save 0.58 as the probability, and then use it for cross entropy, because we know the data is from Viriginica. If we knew it was from Setosa or Versicolor, we would have used 0.22.
@@statquest Thank you, Sir, it clears my doubt.
Josh, thanks for such an explanatory video. But i couldn't understand why residual^2 graph is linear in 7:31.
It's because the range of values for cross entropy is much larger than the range of values for the residual^2. In other words, the scale of the y-axis makes the residual^2 look flat.
@@statquest ooh, thank you for responding. Now it makes sense. BAM!
I am a student at Merton College, Oxford University. Please consider visiting our university sometime. Thank you for your absolutely brilliant content.
I would love to! Feel free to put me in touch with anyone who could make it happen.
3:28 Please correct me if I'm wrong, but the negative sign is before the summation sign, so shouldn't it be - ObservedSetosa*log(PredictedSetosa) + ObservedVersicolor*log(PredictedVersicolor) + ObservedVirginica*log(PredictedVirginica)?
Like shouldn't the negative signs be positive instead?
Just like -1 * (a + b + c) = -a - b - c, the minus sign outside of the summation carries through and turns the addition into subtraction.
Thanks!
One question, in 5:40 you measure the total cross entropy as the sum of the 'train' set cross entropy. Could it be bias in unbalanced datasets? Do you recommend this method in these datasets? Thanks again
I'm not sure I understand your question. Can you clarify it?
I can imagine it takes a lot of time to make the videos. Thanks for the amount of efforts! cross entropy function is convex while squared error is not due to the logit function in softmax.
Thank you! I'm not sure it is correct to say that the squared error is not convex - it's that just over the range, from 0 to 1, it doesn't do much.
@@statquest th-cam.com/video/HIQlmHxI6-0/w-d-xo.html
@@statquest in Applied Logistic Regression, actually MSE is one of several applicable cost function for LR.
you nailed it!
Thanks!
I come for the great content, but stay for the "beep boop beep boop" calculation noises. Please make a coffee mug with the calculation nosies and I will buy it.
That's a great idea.
Hello josh. could you explain me how you used entropy function here? I've saw the entropy StatQuest for data science and now I'm wondering why you used observed probability outside of the log and predicted inside it not the opposite (observed inside and predicted outside) I know that results in log(0) which is undefined but I'm seeking the exact intuition. Thank you in advance :)
See 2:57.
@@statquest Thank you. I watched it again and let me express my Q differently. I say we know that generally the entropy equation is this: entropy = -sum(p*log2(p)) and in this equation we have two p. one of them is multiplying the log and the other is its argument. now in cross entropy we have two probability, one we know by having training set and the second one is calculated by the NN (i.e. : first is observed and second is predicted ). I want to know why we have this equation for cross entropy : -sum(Pobserved*log(Ppredicted) and not this one: -sum(Ppredicted*log(Pobserved).
@@mahdimohammadalipour3077 For details, see: en.wikipedia.org/wiki/Cross_entropy (by the way, I plan on covering the Kullback-Leibler divergence soon). Anyway the key part of that article is in the section titled "Motivation". It says... "Therefore, cross-entropy can be interpreted as the expected message-length per datum when a wrong distribution q is assumed while the data actually follows a distribution p. That is why the expectation is taken over the true probability distribution p and not q."
@@statquest I really appreciate it. Thank you :))))
Thank you so much
Bam! :)
Hi Josh, it is me again. Thank you for all these amazing videos. Currently upgrading my own programmed NN to support classification. However, my own softMax function results in slightly different values around 1:30 namely .68, .11 and .21. So, is this just due to rounding or is there something wrong with my function. Many thanks in advance!
It looks like it is probably due to rounding.
@@statquest ah okay thank you :) Good to know I am not going insane :)
As you said in the current video, if the cross-entropy function helps us more in the gradient descent process than what Sum of squared function does, why don't we use the same cross-entropy for the optimization of linear models such as Linear regression also.. why we use SS there and not entropy ? Thank you for the wonderful videos to understand the math and functions ..
For linear regression we use SSR because it works better when the range of y-axis values is not limited.
Thank you!
Thanks!
Hi Josh. You have prepared a amazing video again. Firstly thanks a lot for this.
I have a question. You said that sum of predicted probabilities must be equal to 1. But sum of the probabilities in the video differs 1. are they random datas to explain for video making or I am wrong?Please clarify this issue. Thanksss a lot in advance :) Bam !!
What time point, minutes and seconds, are you asking about?
@@statquest
05:49 we can see in the "p" column.(0.57+0.58+0.52=1.67 not 1)
I knew my mistake:).Bamm!!!)
@@technojos The column of numbers for "p" are not supposed to add up to 1 because each row is a value taken from a completely different set of SoftMax values (however, each set of SoftMax values does, in fact, add up to 1). Let me explain: At 2:14 we are running the first row of the training data through the neural network and at 2:22 we are applying SoftMax to the raw Output values. The corresponding probabilities for Setosa, Versicolor and Viriginica are: 0.57, 0.20 and 0.23. If we add those up, we get 1. Bam. However, because the first row of training data is for Setosa, we select the probability for Setosa, 0.57, and add it to the table (and then use that value to calculate the cross entropy). We then do the same thing for the second row of training data, which is for Viriginica. In this case, the SoftMax values at 4:32 are 0.22, 0.22 and 0.58 which add up to 1.02 instead of 1 because of rounding errors, but the idea is the same. We then select the one probability associated with Virginica (because the training data is for Virginica), 0.58, and add it to the table (and we use that to calculate cross entropy). Thus, the column in the table, "p", refers to the specific probability from a set of probabilities calculated for that row, and thus, there is no need for the column itself to also add up to 1. Does that make sense?
@@statquest thanks Josh. I understood. You are my ideal.Thanks for everything :).
Baam!!!! Thank you~!!!!!!!! It is so clear~!
bam!
What happens if we want to use soft labels? Would the cross entropy loss still be a good loss function that would help the neural net converge to a good predictor of the soft labels?
I don't know off the top of my head.
Hi Josh, just want to let you know that the link for "Neural networks with multiple inputs and outputs" in the description is broken (though I was able to find the video in your Neural Network playlist).
Thanks for the note! I've fixed the link.
Thanks a lot for simple and elegant explanation. Can you please provide download link for slides of this video ?
This will be in my next book.
@@statquest Thanks a lot for all your vids. You are the only one on this planet making calculus as simple as playing candy crush game.
Thank you so much~!
:)
MY Friend! I'm on part 6. How can I learn the differences between LSTM and Bi-LTSM and Recursive Neural Network?
I'm working on those videos. Hopefully they'll be available soon.
@@statquest i'm looking forward to watch them!
@@franciscoruiz6269 bam! :)
Thank you sir
You're welcome! :)
Hey Josh, I saw one of your videos about entropy in general - which is a way to measure uncertainty or surprise. Regarding Cross Entropy, the idea is the same - but now it's for the SoftMax outputs for the predicted Neural Network values?
To be honest, I have no idea how this (cross entropy) is related to the general concept of entropy for data science.
@@statquest They both seem to be calculated in a similar way, I assumed they measured similar things
@@amnont8724 They probably do. It's just not something I've thought about before or know about.
but in a Regression problem we still using SSR right ?
so what will happen if we still using SSR in a Classification problem and after the backpropagation ends his work Check for the maximum output ? is that because if we have an output = 1.64 and the observed = 1 it also tends to decrease the distance so we needed to invent a function to control what is the min and maximum value ? in our case 0 and 1
For a regression problem we would still use SSR. Cross Entropy is only for classification, and I believe it makes training easier.
@@statquest
Thank u josh for making Everything easy for us
Argmax and Softmax are using only with classification? Or could use them also with regression?
And same question for cross entropy is used for classification only?
Thank you
I'm pretty sure they are only used for classification.
@@statquest what about the accuracy for regression? MSE ?
@@Noor.kareem6 SSR or MSE are used for regression.
still trying to wrap my head around how is this related to entropy if entropy is the expected surprise, it's like we are using a different distribution for the surprise and the expected value
It's easier to see the relationship when you focus on the full equation at 2:57 For more details, see: en.wikipedia.org/wiki/Cross-entropy
love your song :))
:)
🎉
:)
Gold!
Thank you! :)
Thanks a lot
:)