Gradient descent simple explanation|gradient descent machine learning|gradient descent algorithm
ฝัง
- เผยแพร่เมื่อ 28 ก.ย. 2024
- Gradient descent simple explanation|gradient descent machine learning|gradient descent algorithm
#gradientdescent #unfolddatascience
Hello All,
My name is Aman and I am a data scientist. In this video I explain gradient descent piece by piece. In this video, my intention is to make gradient descent extremely simple to understand. Gradient descent being a very important algorithm for machine learning and deep learning is a must know topic for every data scientist. Below questions are answered in this video:
1. What is gradient descent?
2. How gradient descent works?
3. Gradient descent algorithm?
4. What is gradient descent in machine learning?
5. What is gradient descent in deep learning?
6. How gradient descent algorithm works?
About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.
Join Facebook group :
www.facebook.c...
Follow on medium : / amanrai77
Follow on quora: www.quora.com/...
Follow on twitter : @unfoldds
Get connected on LinkedIn : / aman-kumar-b4881440
Follow on Instagram : unfolddatascience
Watch Introduction to Data Science full playlist here : • Data Science In 15 Min...
Watch python for data science playlist here:
• Python Basics For Data...
Watch statistics and mathematics playlist here :
• Measures of Central Te...
Watch End to End Implementation of a simple machine learning model in Python here:
• How Does Machine Learn...
Learn Ensemble Model, Bagging and Boosting here:
• Introduction to Ensemb...
Access all my codes here:
drive.google.c...
Have question for me? Ask me here : docs.google.co...
My Music: www.bensound.c...
My question is when we calculate Partial derivative with respect to 'c' and 'm' ,we should consider one as constant.For example
to calculate partial derivative of cost function J with respect to c ∂J/∂c ,we should consider 'm' as constant .So the above calculation should be like this. -2[2 - (c+m)] + (-2)[4-(c+3m)] => -2[2-(c)]+(-2)[4-(c)] => -2[2] -2[4] =>-4-8=> -12.
Please confirm
Yep when we calculate w.r.t c m is const and vice versa.
Hi Anjani...why it is -2[2-(c+m)] as derivative od [2-(c+m.1)]^2 . dont you think it should be 2[2-(c+m)] from the derivation rule.
why -2 i still didnt get it... it should be ... 2[2-(c)]+2[4-(c)] right?
Could you please elaborate on your derivative method @Anjani Kumar . I guess the value -4 in the video is correct
Why -2?
This is the first time I'm learning about Gredient Desent, and I understood how algorithms work. This video is amazing. Thank you so much.
You're very welcome Pankaj.
Went through lots of articles but didn't understand the core. But your video made it clear within 15 minutes :) Just awesome keep up the good work :)
Thanks Aparna, your comments are very valuable for me.
You're just amazing! Anyone can understand gradient descend by watching this video. Thanks!
Great great lecture for gradient descent I have ever seen....thank u so much for sharing ur knowledge sir ❤️
So nice of you Mani. your comments mean a lot.
I have gone through lots of explanations and it was not understood. But through this video, i got the confidence in continue my learning forward. supeerr sir, thank you
Keep watching
Simply wow. After a month I understood today Gradient Descent. Thank you soo much for the video 😊
Welcome Himanshi. Means a lot to me.
Your teaching method is masterful! None of the books I read go to such depths. Thanks!!
You're very welcome!
Well explained tomorrow I have my project
Thanks Gayatri.
One of the best explanation of gradient descent. Thank you so much. Very informative
Thanks Surajit.
Amazing video! How did you work out the slope value to be -4?
very beneficial video thnk you so much love form pakistan
Thanks Usman. Stay safe. Tc
great ... simple and best. Thank you.
i don't know what this world will be without indians youtubers. thank you very much at least i got something
Glad you found it helpful
Great work sir,i finally understood from your video.Thanks a lot
You are most welcome
Awesome video. This is the best explanation. Please make more videos.
Thanks Madhurima.
Sir plz post vedioes on deep learning.u do a great job sir. Amazing vedioes sir.
Thanks for your positive feedback. Please share with others as well who could be benefited from such content.
Amazing explanation !
I don't know if i have bitten more than i could chew by deciding learn machine learning, this gradient decent is giving me a hard time understanding. i am learning it on coursera same issue, i will keep reading hopefully i get to understand it one day.
You can do it!
When computing the partial derivative where did you get the negative 2 from the exponent 2 is positive how is it that it is negative when differentiating?
we have discussed this in pinned comment.
Thank you for the detailed explanation with simple example :)
Welcome Jagadish.
Awesome ... very simple explanation
Thank you. Happy learning. Tc
I like your explain is very smart
Thanks a lot.
This is one of the best explanation video about Gradient Descent, I like your detailed explaination. Looking forward for more videos on various Optimizers.
Thank you
Thanks a lot Bala. Yes will create on those topics as well.
sir a video on gradient checking plzzz btw amazing explanation plz keep it up
Sure.
Great job my dear Aman .... nice and crispy lecture understood everything except one dark area . could you please elaborate ...why -2 in derivation of c ,rather, it should be ... 2[2-(c)]+2[4-(c)] right?.my apologies in advance if this is due to my ignorance .. please light up my ignorance...
Hi Ashwini, thank you. This -2 thing has been discussed before. Please see the pinned comments on the top.
Hi, simply explained, thanks
Welcome :)
Thank you for such a informative video,
Am from bcom background, wo all of these are like new to me,
Have a doubt,
Have seen other video as well to understand how we calculate derivatives,
But in your example, y we are multiplying with -2 rather then 2
please read first few comments, Its discussed
5:36
what is the last video that you mentioned about derivative. providing that link would be a great help. Need to mention, your way of teaching is of top notch.
keep up the good work sir.
Thanks.
Please watch this video:
th-cam.com/video/WCp1D-wSolo/w-d-xo.html
Bhai , maja a gya .
🫡🫡
Amazing. Thank you soo much for this video . You included everything and its very well explained.
You're very welcome Goundo. Please share the link within data science groups. Thank you.
Thanks a lot..Awesome explanation
💥💥👌👌👌
Thanks Farhan :)
thank you so much for making this video you are amazing
Thanks Mandar.
thank you so much !!!! you are a great teacher
You're very welcome! pls share with friends
Best explanation in the short time
thank you so much sir
Welcome Shahriar.
I already know Gradient Descent, but still going to watch the whole video for some new insights of GD.
Thanks for watching. Stay Safe. tc
thanks sir ! Very helpful video.
Welcome Arun
Awesome explanation.
Thanks Ajay, hope u are doing good and staying safe.
(1/2) factor is missing i.g while discussing about cost function of Linear Regression. Pls correct me if I am wrong...
Very nice video ,Please make a video on what are the other optimization techniques and compare ,that will be very helpful.
Noted Kalyan.
super
Thanks Yatin.
great video
Thanks for the visit
hello every one here u are asking about -c am i right? yes so here we calculate partiakl derivative with respect to c so [2-(c+m)]^2 is = 2[2-(c+m)] d/dx(-c) becoz here derivative with respect to c so we get d/dx(-c)=-1 then multiply -1 to 2[2-(c+m)] then -2[2-(c+m)] . is it clear
Brilliant. Thank you
Welcome.
Thank you. You make me understand this.
Your videos are gold. Thank you for all your efforts
Your comments motivate me. Much appreciated!
Bro next time what will be value of m how we will get the value?
Sir, in last example of partial differentiation why we take -ve sign. Can you please tell me.
Because we are taking for other term
Very thank you sir...
Most welcome Tejas.
Very good video. Clear explanation
Glad you liked it
Good explanation onto the point 😄
Thank you
Great explanation. Thank you for this.
Well explained sir. Thank you
You are welcome
Thanks
One of the best video on GD. Thank you very much.
I really enjoyed this video but it's lack of code. I got a great video here implement sgd using Python!!! Feel free to check it out!! th-cam.com/video/uXuBUkW_0tA/w-d-xo.html
Glad it was helpful Sainath.
Good job
Thanks.
The Best Video. 😀
Thanks Bhavya.
Can you make a video on nelder mead downhill simplex for local minimization?
I HAVE A QUESTION IAM JUST A BEGINNER ANYONES ANSWER WILL BE HIGHLY APPRICIATED SO HERE IS WHAT I NEED TO KNOW IF WE HAVE COST FUNCTION FOR SIMPLE LINEAR REGRESSION SO WHAT IS THE NEED OF GRADIENT DESENT WHAT I THINK OF THIS IS SIMPLE LINEAR REGRESSION DOESNT GIVES THE OUTPUT CLOSE TO THE LOCAL MINIMUM BUT THEN WHAT IS THE USE OF COST FUNCTION ?
😮
hi sir did you also assume learning rate??
Yes I think
Love you bro
Thank you soo much
Always welcome
Hy Aman i, i have a doubt regarding Gradient Descent, if we have Local minima , how gradient Descent handles it to find Global minima? , Could you please explain it in depth?
Yes Hemanth, it uses different concepts, I will cover in separate video
Great sir 👍
4:29 the graph you drawn is parabola and you have taken x^2 to plot this graph which is Exponential graph. If we take x = -1 and then square it we will get +1 so the graph will never move in negative direction.
Yes..there be no negative y - how thats an issue?
@@UnfoldDataScience yes sir that’s what I am saying there will be no negative..but you drawn the graph on board with negative values😊 correct me if I am wrong 😊
Nice Video,what will be new m,initially as you assumed m=1,c=0
Same way it will be calculated.
Brilliant!
Thanks Sahil.
My question is how did you get the Learning rate value.?
Hi Harsh, the recommeded value in industry is suggested in a range.
brother you didn't took the value of 1/N before summation in MSE formula
I understood sir, thank you very much
You are most welcome Vijayalaxmi.
Thank you for the explanation. I have a doubt, shouldn't we multiply with 1/2n in the cost function equation ,where n is number of data given(2 in this case) ???
I can check that once. Thanks for pointing out.
Sir can we take initial assumptions any numbers or we have to take 0and 1
Not sure which part of the video or which param you are asking about
Awesome sir.thank u for the valuable content 😀
Thanks a lot kuppuswamy, happy learning. Tc
Thank you
Welcome
great brother thanks.
This explanation is amazing!
Thanks Bharath.
Sir you are great!!
Thank you sir
Good explanation.. I have a doubt whether we can use the gradient descent method to maximize a function also..? If it's the case, the formula at 3:54 might not hold good I think, it will go in wrong direction.. please clarify my doubt whether it can be used for maximizing the function or not
We never maximize cost function - why would we need to do so
@@UnfoldDataScience For some scenarios, there would be cases where we need to maximize the objective function.. then we need to go in the direction of slope unlike travelling against the slope as given in the video (so we can add it rather than subtracting at 3:54)
this is amazing stuff
Thanks Deen. Pls share with friends
lol samjte samjhte I reached "raam kaun the?" stage lol ye hi bhul gyai ki ye find out kya karwa rhae hain 🤔 coz x, y ka table relate hi ni hua puri ram kahani me :( But good initial explanation about gradient descent and the derivative.
Great explanation
Thankyou
Welcome brenda.
Hi, amazing intuition but can you please explain how the derivative indicates the direction to go in
Thanks Akshay , very good question, This video is a must watch for you:
th-cam.com/video/WCp1D-wSolo/w-d-xo.html
Can we always take 0.001 LR
Thankyou.
You're welcome Brenda.
Hi, in Gradient descent since we subtract LR*slope to reach the min sloution, would we, in Max ascent, add the LR * Slope to the old value?
No, just the sign for slope will reverse automatically. (Negative slope). Watch this video to understand this better:
th-cam.com/video/WCp1D-wSolo/w-d-xo.html
nice video... keep going sir.
Thank you Roshan.
Great job
Thank you.
sir love se tane 🥰🥰
What's the theory/reasoning behind minimizing the function? And can someone pls elaborate on where did Learning Rate originate from pure mathematics not machine learning point of view?
What is the reasoning behind minimizing loss function, watch below videos:
th-cam.com/video/2-Cg_1FtHk8/w-d-xo.html
th-cam.com/video/hSAQkeMOdiI/w-d-xo.html
About Learning rate:
In plain English, it decides at what speed you want to change your assumptions. For example, Lets say you started at x=15, gradient decent says, x need to be increased, so u want to make it 17 or 20
This "shift" or technically "step size" is decided by Learning rate.
Amazing Content
Hello Sir,
my question is, since we have the freedom to choose LR as per our needs, cant i always keep it "1"? or will the change in value of LR lead to different answers that may be wrong?. I want to know what would be the impact of changing LR? Would we get incorrect/less accurate results? If yes, then can you suggest a value that i can choose for LR?
Good question, there are few things which are suggested by experimental process outcomes.
For example p value boundary of 0.5
Similarly LR range is suggested as decimal number like, We should start with a large value like 0.1, then try exponentially lower values: 0.01, 0.001 etc etc.
Awesome!
Thank you! Cheers!
Is cost function and loss function the same
No - search for "Unfold data science cost function vs loss function" in TH-cam to know the difference.
@@UnfoldDataScience OK Thankyou very much. I saw the video. I got the difference b/w the two.. is at what level we are trying to calculate and optimize these errors. the cost function is aimed at model level, whereas the loss function is for a data point or for an observation, as you say..
My head hurts but I learned a lot.
Very good Aman
Thank you :)
sir why cannot we directly made partial derivative equal to 0 and then calculate the value of c and m
Hi Pramod, we need to take care of both "m" and "c" hence we take this approach of calculating individually, one for "m" and other for "c"
Hello sir!! On what basis you have taken learning rate as 0.04? Can you please make it clear for me....
Hello, Learning rate is usually taken in the range 0.001 to 0.9 depending on how aggressively we want to converge. For lower learning rate, convergence is slower but it can converge better, on the other hand if we talk a higher learning rate lets say 0.8, our convergence might be fast but we must have risk of missing the global minima. Here 0.4 is taken just as an example. Hope it gives the answer. Happy Learning. tc