I watched many videos about ridge regression. This is the perfect one that I have seen. Majority of the videos just simple talking about working with few parameters and doing a linear fit. You are going above that and discuss how to generalize the Ridge regression. This video is the best.
Thank you so much for watching! I am glad you found the video useful! Please let me know of there are other topics you would like to see detailed videos on
So far after strugglıng for days, I think you make it almost clear for me about how regularization really can decline the effect of theta (or we can say the slope). I checked most of the videos about regularization, and tbh, none helped me to understand that regularization term and how it really affects the slope/steepness. You used Normal Equation to elaborate the idea of regularization that was magnificent to have a clear view about how you can decrease/decline the steepness of theta by varying lambda. The more lamba, the less steep theta be and vice versa. Unfortunately, most of videos/sources don't elaborate the intuition behind this term and how it really change the thetas/slopes. They all saying the same thing about penalizing/declining the steepness without showing why and how?
If you don't to any normalization, a reasonable choice for theta can easily be much larger than 1. Since with least squares we have a convex error surface, you don't need to normalize. However, I agree that in general normalizing your data doesn't hurt and in that case your suggestion of picking a value between 0 and 1 makes a lot of sense! Kudos for the nice explanation and derivation!
Madone: This was brilliant. Its going straight from your video into Matlab. I'm beginning to understand the maths of the reservoir computing model echo location I'm writing !!!! I got this equation of ridge regression from Tholer's PhD : Wout= (Tm'*M)*((M'*M)+B*eye(N))^-1; and there at 8.51 it's derivation is explained. Thanks.
That's a kind of video what i was looking for. There is a lot of videos with obvious informations and nothing about mathematical representation and derivatives. You did it very well. What about constant - theta_{0}? A lot of sources say that theta_{0} shouldn't be regularized and then in the equation instead of identity matrix we should use modified identity matrix with first row full of zeros.
Hey Adrian. Thanks for watching. You bring up a good point here, and I think the answer to that is that it depends. Every model and every datasets may have different scaling requirements, and having the theta_0 (bias) term regularized or not depends on that. I have personally always implemented it with regularization, ans have not needed to take it out. I would be interested to see how that affects the results. Maybe I can do a test example and make a video on that!
Thank you for watching, glad you found it useful Lambda is a scalar and we can't add a scalar to a vector/matrix directly so we need to multiply by an identity matrix of the proper size
it would be superb if you could do the same from scratch in python i.e. formulation of matrices X and Y,optimizing the cost function(minima) and arriving at theta.
Hi Pranjal, I am working on this for a potential next video. Just like I did the python implementation for linear regression from scratch, I am planning for a video that does ridge regression in python. Stay tuned! And thanks for watching
Thank you for you explain it was wonderful. I have a question how can i use the Ridge regression in matlab ? And if i have my input and output how will I use them in the code of ridge regression and what it will be the coefficient in ridge regression ? Please help me i can’t figure it out
Hi Fatma, thank you for your comment. I have a video on coding ridge regression in python (see link below), The code for that video is on GitHub as well. Fortunately I do not have any code in matlab, but the concepts should directly translate to a matlab implementation. th-cam.com/video/WatqxWFhcZk/w-d-xo.html
What i didn't understand is , now lambda will be only on the diagonal,and how it'll help (X)TX - lamda(Identical matrix) why just the identical element , why not all
Hi Yatin. Lambda is away to penalize the model parameters from getting too large, so if you set lambda=1 you get an identity matrix. But typically in practice that is a large weight to penalize the model parameters, usually lambda is a positive number that is
Hi Anar, thank you for the comment. The reason an identity matrix is required is for mathematical consistency. The first term in brackets (x^Tx) is a square matrix and we can't add a scalar (lambda) to a square matrix, so to have the correct mathematical notation the identity is required.
@@yatinarora9650 a diagonal of 1 does not follow the same formulation as penalizing the norm of the model parameters with lambda. In practice lambda is a positive number that is usually < 1. We do not want to penalize the norm of the model parameters too much, that might cause us to not fit the data well
The sign of a great teacher is the ability to make complicated concepts simple to the student. You, my friend, are a great teacher. Thank you!
I watched many videos about ridge regression. This is the perfect one that I have seen. Majority of the videos just simple talking about working with few parameters and doing a linear fit. You are going above that and discuss how to generalize the Ridge regression. This video is the best.
Thank you so much for watching! I am glad you found the video useful! Please let me know of there are other topics you would like to see detailed videos on
that inverse writting is f...ng awesome
So far after strugglıng for days, I think you make it almost clear for me about how regularization really can decline the effect of theta (or we can say the slope). I checked most of the videos about regularization, and tbh, none helped me to understand that regularization term and how it really affects the slope/steepness. You used Normal Equation to elaborate the idea of regularization that was magnificent to have a clear view about how you can decrease/decline the steepness of theta by varying lambda. The more lamba, the less steep theta be and vice versa.
Unfortunately, most of videos/sources don't elaborate the intuition behind this term and how it really change the thetas/slopes. They all saying the same thing about penalizing/declining the steepness without showing why and how?
Yeah. The same for me too.
If you don't to any normalization, a reasonable choice for theta can easily be much larger than 1. Since with least squares we have a convex error surface, you don't need to normalize. However, I agree that in general normalizing your data doesn't hurt and in that case your suggestion of picking a value between 0 and 1 makes a lot of sense! Kudos for the nice explanation and derivation!
Thanks for watching! Glad you found the video enjoyable.
Very well explained. Your channel should have a lot more views.
Thank you for watching! I am glad you found it useful
awesome...i was totally confused in ridge regression as i am new to Data science. Thanks a lot for your help.
Hi Abhishek, glad to help! Thank you for watching
Madone: This was brilliant. Its going straight from your video into Matlab. I'm beginning to understand the maths of the reservoir computing model echo location I'm writing !!!!
I got this equation of ridge regression from Tholer's PhD : Wout= (Tm'*M)*((M'*M)+B*eye(N))^-1; and there at 8.51 it's derivation is explained. Thanks.
Thank you for watching, glad you found it useful
Great explanation. Followed along just fine after reading ISLR ridge section. Helped me see the approach of RR behind the code and text.
Thank you for watching, glad you found it useful
Amazing explanation ,
That's a kind of video what i was looking for. There is a lot of videos with obvious informations and nothing about mathematical representation and derivatives. You did it very well.
What about constant - theta_{0}? A lot of sources say that theta_{0} shouldn't be regularized and then in the equation instead of identity matrix we should use modified identity matrix with first row full of zeros.
Hey Adrian. Thanks for watching.
You bring up a good point here, and I think the answer to that is that it depends. Every model and every datasets may have different scaling requirements, and having the theta_0 (bias) term regularized or not depends on that. I have personally always implemented it with regularization, ans have not needed to take it out. I would be interested to see how that affects the results. Maybe I can do a test example and make a video on that!
Thanks for the wonderful explanation. Could you please make same video for lasso and elastic net.
Nice video, I appreciate it !
Thank you for watching! Glad you found it clear and useful
Great video 😁
how can I apply this to a small artificial dataet? do you have any examples for that
Great explaination but why does Lambda has to be times the identity matrix?
Thank you for watching, glad you found it useful
Lambda is a scalar and we can't add a scalar to a vector/matrix directly so we need to multiply by an identity matrix of the proper size
Excellently explained! Thank you
Thank you for watching! Glad you found it useful
많은 도움이 되었습니다 from korea Thank you!
Thank you for watching! I am glad you found the video useful
at the beginning, why do you put a bar on top of x?
The bar is to show that this the vector x_bar = [1, x]^T. We add a 1 to the vector x to make the equation compact
In the end I was like "wait what are all those formulas???"
it would be superb if you could do the same from scratch in python i.e. formulation of matrices X and Y,optimizing the cost function(minima) and arriving at theta.
Hi Pranjal, I am working on this for a potential next video. Just like I did the python implementation for linear regression from scratch, I am planning for a video that does ridge regression in python. Stay tuned! And thanks for watching
Thank you. You explain clearly
Thank you for watching!
Thank you for you explain it was wonderful. I have a question how can i use the Ridge regression in matlab ? And if i have my input and output how will I use them in the code of ridge regression and what it will be the coefficient in ridge regression ? Please help me i can’t figure it out
Hi Fatma, thank you for your comment. I have a video on coding ridge regression in python (see link below), The code for that video is on GitHub as well. Fortunately I do not have any code in matlab, but the concepts should directly translate to a matlab implementation.
th-cam.com/video/WatqxWFhcZk/w-d-xo.html
Nice Explaination
excellent work
Thank you! Glad you liked it!
Thank you, good stuff
Thank you for watching! Glad you found it helpful
What i didn't understand is , now lambda will be only on the diagonal,and how it'll help (X)TX - lamda(Identical matrix) why just the identical element , why not all
Hi Yatin.
Lambda is away to penalize the model parameters from getting too large, so if you set lambda=1 you get an identity matrix. But typically in practice that is a large weight to penalize the model parameters, usually lambda is a positive number that is
i love you so much sir
Very helpful
Glad it helped! Thanks for watching
thank you!
Thank you for watching Ali!
nice
The final formula is not correct. You should not get identity matrix $I$ in the formula.
Hi Anar, thank you for the comment.
The reason an identity matrix is required is for mathematical consistency. The first term in brackets (x^Tx) is a square matrix and we can't add a scalar (lambda) to a square matrix, so to have the correct mathematical notation the identity is required.
@@EndlessEngineering why we should not have concidered at one matrix where all values are 1 instead of only diagonal as 1.
@@yatinarora9650 a diagonal of 1 does not follow the same formulation as penalizing the norm of the model parameters with lambda. In practice lambda is a positive number that is usually < 1. We do not want to penalize the norm of the model parameters too much, that might cause us to not fit the data well
I love you