Stochastic Gradient Descent, Clearly Explained!!!
ฝัง
- เผยแพร่เมื่อ 25 มิ.ย. 2024
- Even though Stochastic Gradient Descent sounds fancy, it is just a simple addition to "regular" Gradient Descent. This video sets up the problem that Stochastic Gradient Descent solves and then shows how it does it. Along the way, we discuss situations where Stochastic Gradient Descent is most useful, and some cool features that aren't that obvious.
NOTE: There is a small typo at 9:03. The values for the intercept and slope should be the most recent estimates, 0.86 and 0.68, instead of the original random values, 0 and 1.
NOTE: This StatQuest assumes you already understand "regular" Gradient Descent. If not, check out the 'Quest: • Gradient Descent, Step...
When I was researching Stochastic Gradient Descent, I found a ton of cool websites that provided lots of details. Here are some of my favorites:
Sebastian Ruder has a nice write-up: ruder.io/optimizing-gradient-d...
...as the Usupervised Feature Learning and Deep Learning Tutorial: deeplearning.stanford.edu/tuto...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
TH-cam Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
Corrections:
9:03. The values for the intercept and slope should be the most recent estimates, 0.86 and 0.68, instead of the original random values, 0 and 1.
9:33 the slope should be 0.7.
#statquest #sgd
Corrections:
9:03. The values for the intercept and slope should be the most recent estimates, 0.86 and 0.68, instead of the original random values, 0 and 1.
9:33 the slope should be 0.7.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Nice to see this upon getting confused XD. Great Job!
At 9:40, in the top right corner : ...and the new line ... slope 0.07 is a typo too. Should be 0.7!
@@louislesage3856 You are correct. Dang, I hate typos. ;)
I came down to the comments to check if I am right and thanks god I am right :)
I have another question regarding the new data sample
what if this new data sample is an outlier?
the step will make the line fit to this new point only and the old samples will be ignored
Do we need to add a check for the outliers before we apply the stochastic gradient descent?
@@adelsalaheldeen You should always check for outliers, no matter what you are doing.
I love it that you only go slow, and really slow, and do not assume people have understood everything and skip steps at all. Truly a wonderful explanation for a seemingly hard-to-grasp stuff! Keep up your good work!!
Thank you!
Without this channel... Machine learning is incomplete...
MEGAAAA BAMMMMM
Can u tell me about batch gradient descent
Any video ?????????
Pretty sure you mean... MECHA BAM! I’ll see myself out...
@@dani123456785 ruder.io/optimizing-gradient-descent/index.html#batchgradientdescent
Every time I type some notion and one of your videos pops out, I know the probability of understanding that notion is 100%, and that the effort function is already minimized so I quickly converge towards optimal comprehension :D
Hail to great JS !
This is awesome!!! Can I quote you on my website?
this video is so much better than what we have in university. Thank you man, you are a legend
Thank you!
Our University's professor just screenshoted the whole gradient descent and stochastic gardient descent video and that was our notes for the topic😶🌫😶🌫, should've just pasted the link to Josh's video tbfr😮💨😮💨
Best explanation ever. at first I was sceptical but the BAMs kinda grow on you after a while :)
Nice! :)
This is like my second video on your channel and holy moly everything you explain is so clear and just clicks in my head. I truly appreciate this, you are a blessing to the learners
Awesome! Thank you very much! :)
Just leave a mark here to appreciate every work you did Mr. Josh, thank you very much
Thank you! :)
Super helpful, taught me more than my uni prof, your teaching method is effective and hilarious at the same time.
Thanks! :)
@@statquest Btw, what is the criteria for the mini-batch stochastic gradient descent.
Like if I have a number of data, should I group the ones whose values are closely related
E.g.: 10 , 30, 69, 38, 59, 16
Then I need to group them into groups --> (10, 16) (30, 38) (59, 69) and randomly select an element in a group then do the math
Or I just go with three random data point and do the math?
Thanks!
Bro, I don't know how you did it. You are gooood!
Your subscribers increased like crazzy since the last time I came here too!
Thank you very much! Hooray! The channel is growing and that is very exciting for me. It's an inspiration to keep making videos. :)
This man should be awarded !!
I feel ML is a simple math putted out it in a complicated way, and people like Josh exactly pop in there and makes the math simply understandable....!
And ofc his teachings are BAAAAAAAAAAM!
Cant belive taking his wisdom which was made past 4 years!
Thanks!
Thank you Mr. Josh. Your videos are really game changers. I love them and your songs even more. I will buy so much of your merchandise when I am employed
BAM!
You are great. I'm glad I found you. Whenever I get stuck with the theory of something, you're there to help.
bam! :)
I am sooo sad that I did not find this channel sooner, but now I know what I'm going to do the next weeks or month :) Great job! Really informative videos!
Awesome! Thank you!
good name you got there lol
Thank you for explaining this clearly. Your videos are easy to understand. Thank you so much. Please make a video on SGD with momentum and issues of SGD with saddle points.
Another great and simple video. It's always a pleasure to see that there is a stat-quest video about a thing I'm looking for. Thank you!
Glad you enjoyed it!
You are the best ML instructor I have so far come across !!!
Wow, thanks!
I gotta say this channel is amazing. Its especially nice as an amazing complement to the math side i learn in university.
This is gold, appreciate it! I really like how you take things one step at a time. It helps me understand better!! BAM!!!
BAM! :)
Thanks for these amazing videos and especially for the smile you bring on my face with each BAM :)
Thank you so much! :)
Brilliant explanantion. I was forgetting one little thing which was bugging me about SGD. This helped alot!
Awesome! :)
I am super dunked off of vodka and coffee right now and I feel like I just understood every complex maths class I've ever taken before. I understand now! THANK YOU! even my impaired mind can comprehend this at X2 speed.
bam!
Josh, you make everything easy to understand! Many Thanks!
Thank you!
Thanks for your work! Your explanation is well thought, clear, and entertaining.
You're very welcome!
I am here to study for an exam I'll have soon, and you are saving me lots of time. Plus, it's so much more entertaining than my incomprehensible slides ! THANK YOU !
Good luck on your exam! :)
@@statquest Thank you so much !! Please keep it up with your amazing videos !
How did the exam go?
@@statquest you are the nicest person in this world! It is on Thursday so we will see :oo
@@statquest I passed my exam and got my bachelor this year!! Thank you so much!
I'm writing my thesis, and you are my hero
Good luck! :)
Dude you’re just freakin good at explaining this stuff
Thank you! :)
Man, when you love something you could achieve great things. This lecture is for Garduate and Doctorants level. I would say even a high school student can understand this, and this is the trick! Sorry but not everyone can do it... Joshua, congratulations!
@@mohammedouallal2 Thank you very much! :)
I'm literally liking this video and commenting after the intro song. Well done!!
BAM! :)
Very helpful! Everything is clear and well explained: super BAM!
Thank you! :)
Your videos are simply amazing. A big thank you!!!
Thanks! :)
best channel for machine learning with quality content
Thank you! :)
I'm not sure what I like more: the clear examples or Josh's silly smooth voice on the "double bam"
Great job! Thanks Josh!
This channel is a GEM
Thank you!
Triple BAM!? 💥 My heart can’t take it! Quest on. 👍
you are born to explain ML !
Thanks!
One word ! Revolutionary
Lots of love from India 🇮🇳
Bam 💥
Thank you! :)
Short and Clear explanation. Thanks a lot!!!
Thanks!
This + bandcamp?? Dude you are my hero
Thanks! :)
please upload every day, you R my Machine Learning Hero
Thank you!
Best explanation in the shortest time possible
Thank you!
You're a special kind of awesome! I have learned so much from your videos! Thank you!
Thanks!
Another solid video. Thanks a million!! I had to go through a bit of problem solving to neatly wrap my functions and compare the execution times in Python. But yeah, I found that on the same data set (small in size) regular gradient descent (batch gradient descent) was faster but was less accurate than Stochastic gradient descent in calculating the slope & intercept for the line of best fit.
My Example:
- 13 data points
- Solved for Slope & Intercept using both types of gradient descent
- Used Sum of squared residuals derivative
Batch Gradient Descent Time = 0.0967 s
Stochastic Gradient Descent Time = 1.2740 s
Linear Regression function from scipy stats = 0.0015 s
Very cool! :)
Thank you.. Your video is very helpful in breaking down the concepts to basics :)
Glad it was helpful!
Your channel is super incredible, it has helped me a lot and I always recommended to everybody! What about a statquest of time series analysis? Pleaseeeeeee! thanks! :) Triple BAAAAAM!!!
Time series is on the to-do list. It will still be awhile before I get to it, but I'll do my best.
@@statquest Super BAAAAM! thanks for the answer! It will be life changing! :) keep up the amazing work!
Great videos. Suggestions for future videos: Kernel / Support Vector Machines. ICA (Independent Component Analysis). SOM (Self-Organizing Map). Convolutional Nets. Backpropagation algs for NN training.
Yes, eagerly waiting for the video on Support vector machines.
Thank you so much!! Could you please do a video on normal equations?? Subbed!!!
Thank you for such a good explanation!!
Thanks! :)
I love your channel! Could you make videos on reinforcement learning?
Thanks man. It looks easy when learned from your channel.
Bam! :)
brief and clear explanation, great
Glad you liked it!
5 seconds into the intro: *smashes subscribe*
BAM!!!
dude....you are funny....honestly.....i didnt think i would laugh while learning about SGD....i was suprised, entertained and amazed by your video. thanks for that. now i go back to writing my stuff :D
Glad you enjoyed it!
I would love if you make a series/a playlist of all the basics of Machine Learning videos. Found the best channel for ML
See: statquest.org/video-index/#machine
@@statquest 0.o you are a savior!!!
The musical intro was LIT
Clearly explained indeed! Great video!
Thank you! :)
I loved the way you explain the concepts in a very simple and cool way. Can you please explain how the leaning rate is calculated for each iteration?
I'll consider it.
Thank you. It's very very very clear and helpful.
Thanks!
super , super ,super explanation...till watching this video , I was very much confused with GD.Thanks alot
Thanks! :)
Better than any stats class I’ve ever taken
Thanks! :)
Great tutorial,loved it!
Thanks! :)
Excellent Explanation.Thank You
Thanks! :)
Am I correct that you can use stochastic Gradient Descent when updating new entry of data into a dataset that were being calculated with just regular Gradient Descent with its last used slope and intercept?
Super effective instructional approach...best wishes
Thank you very much! :)
Love your videos! Will you consider doing some on stochastic variational inference and stochastic processes? ;)
One day I will, but not for a while. I want to cover neural networks first.
@@statquest Can't wait for your take on Neural Nets!
I think I am addicted to ML after following your channel
Nice! :)
Thanks for the great video Josh!
Just a quick question, why is it so that at 9:02, the values for the intercept and slope in the derivatives are the original random values of 0 and 1 instead of the most recent estimates of 0.86 and 0.68?
That's just a typo. I've not included a note about this in the video's description. Unfortunately TH-cam will not let me edit videos after they are posted.
Thank u very much. Really clear explanation!
Glad you liked it!
Hi Mr. Stramer, i hope you have a wonderful day.
Thank you!
Very Cool JOHN !
Thank You !
Thanks! :)
Fantastic explanation, Man!!👍👍
Thank you! :)
Hey Man you are awesome. Please make videos about more sophisticated deep learning models, CNN, RNN, and Reinforced learning
Thank you! I'm always working on new stuff and excited about what's coming up.
bro is a savior
:)
Thanks for clearly explaining stochastic gradient descent. :)
Thanks!
These are amazing thank you
Thanks! :)
Hi Josh,
Thanks again for another awesome video, I can't stress enough how lost I would be in my masters degree without your channel.
Could I please trouble you with a question, is it safe to assume that each sample or sample subset picked by Stochastic Gradient Descent is picked at random without replacement? (Just wanted to be sure...)
Thank you and looking forward to more of your wonderful content!
Ben
Typically stochastic gradient descent is performed *with* replacement. However, there are variations on this, including doing it *without* replacement. For more details, see: sebastianraschka.com/faq/docs/sgd-methods.html
Just ordered a red Tshirt from you. Thanks for the great work.
Awesome, thank you!
Very good video! Thanks
Glad you liked it!
excellent explanation.very helpful.
Thanks! :)
Excellent and Informative and Bam!!!!!!!!!!!!!
BAM! :)
Thanks for the video sir
StatQuest you're YYDS !!!!
Thank you!
Thank you for helping with my PhD research!
Good luck with your PhD! :)
Definitely better than reading my black and white book full of jargons
:)
Josh, thank you for the awesome video. I have one question regarding to the cost function used in this video. It seems that the 'slope' will be much smaller for Stochastic Gradient Descent compared to general Gradient Descent due to the fact that SGD will have the 'slope' equals to the sum of one term, while general Gradient Descent will be the sum of n terms where n equals to number of observations. Is it true?
For general Gradient Descent, the cost function I use will result in a larger number, but the minimum, even though it will be larger than for SGD in an absolute sense, will still be at the same location.
Your examples are super clear thanks! How would you figure out if there are redundancies in your data when there are many columns in high dimensional space?
When you have a ton of data in high dimensions, it is often safe to assume that you have redundancies. And when you have a lot of data, SGD might be your only option anyway.
Outstanding! Thanks a lot
Thank you! :)
Really nice. Subscribed.
Thank you!
I could frankly say I learned the theory of stats from u.
double bam! :)
Josh, could you please explain the difference between GBM, Xgboost and Light GBM, etc?
You just made machine learning look so simple
BAAAAAAMMMMMMMMM
Thanks!
Seriously man this was the first video i watched on your channel and you are amazing. I am going to watch all the videos now
Thank you for sharing your knowledge
Your explanations are great, thanks. In fact what do you think about selling your slides on your website? I would certainly buy some.
I'm glad you like my videos! I sell PDF study guides for some of my videos here: statquest.org/studyguides/
Can not appreciate your channel more!!! CAN NOT!
Thank you! :)
Math is not that hard to understand when it's explained properly. For me, this concept went from being super complex to something super simple and logical. Thanks for all the work you put in these videos, you explained stuff in a magnificent way.
Thanks for an other great video. I have a question not directly related to stochastic gradient descent. When there is not a lot of images available for convolutional neural network, what are other methods you suggest to do image classification? I have around 100 images for each class of image.
Look for MAML
You made my day thanks!
Hooray!
What do I get when completing all your quests? 🤗
So much thank you at this point too!
TRIPLE BAM!!! :)
Nice you explained that clearly 👌👍🙂
Thank you 🙂
These videos are amazing.
When adding a new sample, it looked like a bit of an outlier compared to the clusters where you took the original random points. So how much weight do you give one new sample compared to random values from tightly packed clusters?
I'm pretty sure all values are given equal weights.
@Josh, why would people dislike these videos I wonder ! The SGD is a cost saver on large datasets.
bam!