Wow, this guy is a deep learning/ML genius! I've been studying deep learning for 2 months now, and I consider myself quite good at math and coding. I've been looking for an explanation of what is happening under the hood when the model is training - an "explain like I'm 5" type of explanation. But the only things I could find were academic explanations of how a deep neural network trains with matrix multiplication of weight, bias, backpropagation, etc. I've probably watched 30 videos of those that are all copycats of each other, and I think those people don't know what they are talking about, just spitting out what they saw or read in academic papers/courses. This video was an eye-opener; the guy really knows what is happening behind the scenes, and his 30 years of expertise in the field really shows in those simple yet very easy-to-understand explanations. Thank you! 🙏
I’ve watched so many videos…. Read so many blogs…. Books…. Trying to understand this thing to understand what a neural network is and how it learns- you explained it perfectly making all the words just fit. The meanings become obvious when presented like this, you did this in…. 15 minutes 🔥
The quadratic example was a really good illustration of how gradient descent works - it is really good for building intuition. Then, the Excel example cements the understanding really well with a solid dataset. This is my favourite of the 3 lectures so far.
I "knew" that deep learning models used the sum of wi +xi + b function, I "knew" that it supposedly was used because it was an "all purpose" function, but now thanks to you Jeremy I know WHY its an "all purpose" function 10/10 explanation. Math should always be explained like this, its actually beautiful to see it all unfold.
Great foundational lecture. Jeremy has a relaxed, non-intimidating approach that works for me. Brilliant step by step walk into the deep end of the pool without getting us lost or scared :) Thank you for taking the time to put this together.
I couldn't understand why ReLu was needed and now I understand. I'm a programmer and I think this is the DL course for me. The explanation is very easy to understand. Thank you!
Great lesson!! Jeremy deciding to approach chapter 4 differently after seeing many student quit at this point really shows that he cares about students' learning. Greatly appreciated for the effort!🙏
I first did this course about a year ago before landing my first Data Science Job. I am now doing it again as a refresher before going on to part 2 to try and get an even more technical DS job. Thank you Jeremy! and good luck to anyone doing the same!!
I've gone through many great courses in all sorts of subjects, but I think this course might be the best. Kudos for putting out this fantastic content out there for free for everyone to learn.
many terms i had heard already, like loss function, fitting a model, activation function, relu JH is Amazing amazing teacher that these things are now clear crystal in my mind Thank you so much JH
For those following along, there was a mistake in the spreadsheet range when calculating total loss, both at 1:14:27 and 1:17:40, it selects from row 662 instead row 4. Correct solved loses are 0.144 and 0.143.
I am a newbie in machine learning. But the approach, you took in this lesson to explain difficult concepts, is making it so easy to understand. Great work.
New didactic and methodological ideas - like them very much - still a bit rough in execution - but discovers amazing new territory to approach neural networks - deep learning ... well done!
(around 17:40) Is taking the ratio of the two `error_rate`s standard practice? I find the "30% improvement" statistic a little misleading? The original error rate is 7.2% and the new error rate is 5.6% (rounding of 5.548 but this is a detail). In other words the accuracy goes from 92.8% to 94.4%. This can be seen as significant or not depending on which scale you adopt: a linear or a logarithmic one.
I think one way to improve the slow/fast issue is that it is actually sometimes, both. The part that needs to go faster, would/should be going faster, or trimmed out unnecessary part. The parts that is complicated, maybe slow down a bit. Then add very short/fast "teaching" for each topic, and then goes into details after each short teaching, short teaching is not summary. So people who gets it can move ahead to the next topic.
1:00:50 how did we go from trying to fit a function to computer vision's pixels ? The jump from relu functions applied on linear functions to speaking about pixels in an image is not clear. Can you please elaborate ? Why did u say each pixel will have a variable of its own ? what is the mapping from computer vision to function fitting in this context ? Why is every single pixel in an image is a single variable ? what is the rationale ?
At 1:14:13 Jeremy describes calculating a loss. Can anyone explain this more, i.e. why subtracting whether the passenger survived (0 or 1) squared from the output of the linear equation for each row equates to a loss or error? It seems arbitrary and I'm not understanding why this is how we judge an error rate.
We want to make prediction equal to actual value. so we dont want a large gap between actual and predicted value thus we define loss as the square of the distance between actual and predicted value (the square will increase loss at higher rate if there is a large distance) now we just have to minimize loss - it will occur by changing weights and biases
Thank you for providing this insightful course, which has been instrumental in enhancing in cementing intuition. I have a question regarding the updating loop at the 41:30 mark. It appears that there may be a minor oversight. Shouldn't we consider resetting all gradients to zero prior to each subsequent call of the backward() function? Because PyTorch, by default, accumulates the accumulation of gradients from previous iterations, eventually leading to inaccuracies in gradient computation.
I was wondering the same... otherwise wouldn't each backward call be accumulating progressively larger gradients, from keeping around the prior gradient before the updates occurred?
So paperspace appears to not be free. When I try starting a notebook he forces me to upgrade to 8/month. Is this still the recommended platform? IS it worth it?
Looks like it's not worth it at all. I purchased the subscription only to get an error message that 'The VM I selected is currently not available please select another'. They indeed showed me a list of available VM. The available ones were at an additional cost of 0.7-3.50 USD per hour. Yes, that's on top of the 8USD/month subcription.
just a quick question: by reproduce the code, is it mean that one should be able to write out the code by memory/understanding as in know all of the parameters within the arguments as well as the defined functions? Of course that would be best case scenario but I feel it would get in the way of moving through the course as one does not need to perfectly be able to reproduce the code, just understand what the parameters are doing, right?
basically we have data, now let's create a general function (from those data) that can kind of produce those data and also predict what the next data would be.
I'm slightly confused about the intuition behind how multiple ReLUs can lead to a squiggly line. Wouldn't it more specifically lead to a line that is always either stagnant or gradually increasing because of how the output must be >=0 ?
Excellent tutorial! I have one question, in the excel, why are Parch and SibSp not normalized? Because they are not "big enough" to negatively interfere?
I don't quite see how the Excel example qualifies as a "deep" neural network, since the layers were not stacked on top of each other but added together. The example is still great, though, and I could see how to stack the layers.
@@elnur0047 Rather than both multiplying the same inputs the 2nd one would multiply the products from the previous output. I was also a little confused when he just added them up at the end instead of feeding one into the other.
@@yaptor0 How would that calculation work? Doesn't he have to first sum up all the products from a given layer and RELU them (i.e. take the max of the sumproduct and 0)? If the 2nd layer simply accepted the individual products as inputs, wouldn't this 2-layer network just be a linear function?
I tried to make a Paperspace account and accidentally mistyped the phone verification, so they decided that I'm no longer allowed to verify with my phone number. Disappointing.
lesson 1 needing math is a myth, awesome lets continue lesson 3 - here are all these math terms/equations you have no idea are or what you are looking at. Now I'm overwhelmed and feel defeated.
Wow, this guy is a deep learning/ML genius!
I've been studying deep learning for 2 months now, and I consider myself quite good at math and coding. I've been looking for an explanation of what is happening under the hood when the model is training - an "explain like I'm 5" type of explanation.
But the only things I could find were academic explanations of how a deep neural network trains with matrix multiplication of weight, bias, backpropagation, etc.
I've probably watched 30 videos of those that are all copycats of each other, and I think those people don't know what they are talking about, just spitting out what they saw or read in academic papers/courses.
This video was an eye-opener; the guy really knows what is happening behind the scenes, and his 30 years of expertise in the field really shows in those simple yet very easy-to-understand explanations.
Thank you! 🙏
I greatly appreciate this effort to uplift the community worldwide
The quadratic section is a beautifully crafted example. Thanks
yeah that made it fully click for me
I’ve watched so many videos…. Read so many blogs…. Books…. Trying to understand this thing to understand what a neural network is and how it learns- you explained it perfectly making all the words just fit. The meanings become obvious when presented like this, you did this in…. 15 minutes 🔥
The quadratic example was a really good illustration of how gradient descent works - it is really good for building intuition. Then, the Excel example cements the understanding really well with a solid dataset. This is my favourite of the 3 lectures so far.
I "knew" that deep learning models used the sum of wi +xi + b function, I "knew" that it supposedly was used because it was an "all purpose" function, but now thanks to you Jeremy I know WHY its an "all purpose" function
10/10 explanation. Math should always be explained like this, its actually beautiful to see it all unfold.
Great foundational lecture. Jeremy has a relaxed, non-intimidating approach that works for me. Brilliant step by step walk into the deep end of the pool without getting us lost or scared :) Thank you for taking the time to put this together.
Glad you enjoyed it!
I couldn't understand why ReLu was needed and now I understand. I'm a programmer and I think this is the DL course for me. The explanation is very easy to understand. Thank you!
Great lesson!! Jeremy deciding to approach chapter 4 differently after seeing many student quit at this point really shows that he cares about students' learning. Greatly appreciated for the effort!🙏
I first did this course about a year ago before landing my first Data Science Job. I am now doing it again as a refresher before going on to part 2 to try and get an even more technical DS job. Thank you Jeremy! and good luck to anyone doing the same!!
I've gone through many great courses in all sorts of subjects, but I think this course might be the best. Kudos for putting out this fantastic content out there for free for everyone to learn.
Great to hear!
many terms i had heard already, like loss function, fitting a model, activation function, relu
JH is Amazing amazing teacher that these things are now clear crystal in my mind
Thank you so much JH
For those following along, there was a mistake in the spreadsheet range when calculating total loss, both at 1:14:27 and 1:17:40, it selects from row 662 instead row 4. Correct solved loses are 0.144 and 0.143.
Probably the most easy to digest material I've seen on the subject, thank you.
This is god-tier educational content, sir. Thanks for sharing it!
the explanation of deep learning foundations as is here, is too good! As said by Jeremy, one has to remind oneself, that is it, there is no more.
The excel example blew my mind. Loved this lesson. Thank you.
I am a newbie in machine learning. But the approach, you took in this lesson to explain difficult concepts, is making it so easy to understand. Great work.
Great to hear!
I was lucky to have good math teachers in high school. Jeremy explaining the concepts reminded me of them. Thanks.
Thank you so much jeremy for making this course, I am going slow but learning a lot everyday, you are a very patient teacher. Thank you.
1:05:02 - "There's a competition I've actually helped create many years ago called Titanic"
Biggest flex ever.
New didactic and methodological ideas - like them very much - still a bit rough in execution - but discovers amazing new territory to approach neural networks - deep learning ... well done!
Amazing talk! Thanks thanks thanks! You're doing the machine learning field so much easier to understand, and that's something invaluable.
17:27 minor correction: it's error rate going down instead of accuracy
Unbelievable content! Thanks to all who have made it possible!
Simply amazing! Excellent lecture.
i am in love with this course
what a great lesson. mind blown! Thank you so much! You are a great teacher!
Quadratic example was just superb. 🎉
Wow, great explanation! Thanks!
This is mind blowing! Great job explaining all these concepts.
(around 17:40) Is taking the ratio of the two `error_rate`s standard practice? I find the "30% improvement" statistic a little misleading? The original error rate is 7.2% and the new error rate is 5.6% (rounding of 5.548 but this is a detail). In other words the accuracy goes from 92.8% to 94.4%. This can be seen as significant or not depending on which scale you adopt: a linear or a logarithmic one.
I think one way to improve the slow/fast issue is that it is actually sometimes, both. The part that needs to go faster, would/should be going faster, or trimmed out unnecessary part.
The parts that is complicated, maybe slow down a bit.
Then add very short/fast "teaching" for each topic, and then goes into details after each short teaching, short teaching is not summary. So people who gets it can move ahead to the next topic.
At 28:40 I believe you run the cell again and it changes the tensors slightly - drove me a bit mad trying to figure out why my results were different.
1:00:50 how did we go from trying to fit a function to computer vision's pixels ?
The jump from relu functions applied on linear functions to speaking about pixels in an image is not clear. Can you please elaborate ?
Why did u say each pixel will have a variable of its own ? what is the mapping from computer vision to function fitting in this context ?
Why is every single pixel in an image is a single variable ? what is the rationale ?
At 1:14:13 Jeremy describes calculating a loss. Can anyone explain this more, i.e. why subtracting whether the passenger survived (0 or 1) squared from the output of the linear equation for each row equates to a loss or error? It seems arbitrary and I'm not understanding why this is how we judge an error rate.
We want to make prediction equal to actual value.
so we dont want a large gap between actual and predicted value
thus we define loss as the square of the distance between actual and predicted value (the square will increase loss at higher rate if there is a large distance)
now we just have to minimize loss - it will occur by changing weights and biases
Thanks! Jeremy, great Lecture, never got into NPL, but now I am understanding it.
Excellent!
@@howardjeremyp Hi Jeremy, You mentioned that there will be part 2 of this course. When can we expect those videos? Thanks
@@jordankuzmanovik5297 you can see videos now
Thank you for providing this insightful course, which has been instrumental in enhancing in cementing intuition. I have a question regarding the updating loop at the 41:30 mark. It appears that there may be a minor oversight. Shouldn't we consider resetting all gradients to zero prior to each subsequent call of the backward() function? Because PyTorch, by default, accumulates the accumulation of gradients from previous iterations, eventually leading to inaccuracies in gradient computation.
In 43:00, isn’t there supposed to be abc.zero_grad() to zero out the gradients?
I was wondering the same... otherwise wouldn't each backward call be accumulating progressively larger gradients, from keeping around the prior gradient before the updates occurred?
@@solaxun Yup, exactly. It's one of the worst bugs (it's bitten me in the neck several times)
Skip 10 minutes to start the lesson
Thanks Jeremy, great tutorial.
So paperspace appears to not be free. When I try starting a notebook he forces me to upgrade to 8/month. Is this still the recommended platform? IS it worth it?
Looks like it's not worth it at all. I purchased the subscription only to get an error message that 'The VM I selected is currently not available please select another'. They indeed showed me a list of available VM. The available ones were at an additional cost of 0.7-3.50 USD per hour. Yes, that's on top of the 8USD/month subcription.
great course! so weird that the videos have less than 100k views.
loved the excelTorch!!
just a quick question: by reproduce the code, is it mean that one should be able to write out the code by memory/understanding as in know all of the parameters within the arguments as well as the defined functions? Of course that would be best case scenario but I feel it would get in the way of moving through the course as one does not need to perfectly be able to reproduce the code, just understand what the parameters are doing, right?
basically we have data, now let's create a general function (from those data) that can kind of produce those data and also predict what the next data would be.
Isn't the sum of two Relu basically like two nodes in single layer? I'm not if we can call it a neural network, let alone a deep learning network.
I just made a NN in Excel. Wow. If you want to predict two different things, do you just have a separate set of weights and Lins for the second item?
I'm slightly confused about the intuition behind how multiple ReLUs can lead to a squiggly line. Wouldn't it more specifically lead to a line that is always either stagnant or gradually increasing because of how the output must be >=0 ?
Excellent tutorial! I have one question, in the excel, why are Parch and SibSp not normalized? Because they are not "big enough" to negatively interfere?
how much the difference betewen train_loss and validation_loss should be accepted ?
Where can I find the walk through of Gradio?
48:55 the computer draws the owl :)
1:11:41 was a nice contradiction :D
building a neural net in spreadsheet. Heck yea!
great content.
I don't quite see how the Excel example qualifies as a "deep" neural network, since the layers were not stacked on top of each other but added together. The example is still great, though, and I could see how to stack the layers.
Hi, can you elaborate bit more regarding this? how does stacking differ from the approach in the video?
@@elnur0047 Rather than both multiplying the same inputs the 2nd one would multiply the products from the previous output. I was also a little confused when he just added them up at the end instead of feeding one into the other.
yeah I have exactly the same doubt when I saw that, these are still 2 independent layers.
Jeremy actually confirms that at 1:16:15
@@yaptor0 How would that calculation work? Doesn't he have to first sum up all the products from a given layer and RELU them (i.e. take the max of the sumproduct and 0)? If the 2nd layer simply accepted the individual products as inputs, wouldn't this 2-layer network just be a linear function?
Excellent!
👏👏👏 applause from online
=IF([@Embarked]="S" , 1, 0) and other IF statements like this seem not to work for me.
Anyone experienced the same thing.
I tried to make a Paperspace account and accidentally mistyped the phone verification, so they decided that I'm no longer allowed to verify with my phone number. Disappointing.
Vpn.
lesson 1 needing math is a myth, awesome lets continue
lesson 3 - here are all these math terms/equations you have no idea are or what you are looking at. Now I'm overwhelmed and feel defeated.
try doing atleast highschool math
❤
I don't even know how to use Excel.