Ridge Regression (L2 Regularization)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ม.ค. 2025

ความคิดเห็น • 57

  • @jehushaphat
    @jehushaphat 11 หลายเดือนก่อน

    The sign of a great teacher is the ability to make complicated concepts simple to the student. You, my friend, are a great teacher. Thank you!

  • @prasama705
    @prasama705 4 ปีที่แล้ว

    I watched many videos about ridge regression. This is the perfect one that I have seen. Majority of the videos just simple talking about working with few parameters and doing a linear fit. You are going above that and discuss how to generalize the Ridge regression. This video is the best.

    • @EndlessEngineering
      @EndlessEngineering  4 ปีที่แล้ว

      Thank you so much for watching! I am glad you found the video useful! Please let me know of there are other topics you would like to see detailed videos on

  • @yonko5787
    @yonko5787 3 ปีที่แล้ว

    that inverse writting is f...ng awesome

  • @hussameldinrabah5018
    @hussameldinrabah5018 3 ปีที่แล้ว +5

    So far after strugglıng for days, I think you make it almost clear for me about how regularization really can decline the effect of theta (or we can say the slope). I checked most of the videos about regularization, and tbh, none helped me to understand that regularization term and how it really affects the slope/steepness. You used Normal Equation to elaborate the idea of regularization that was magnificent to have a clear view about how you can decrease/decline the steepness of theta by varying lambda. The more lamba, the less steep theta be and vice versa.
    Unfortunately, most of videos/sources don't elaborate the intuition behind this term and how it really change the thetas/slopes. They all saying the same thing about penalizing/declining the steepness without showing why and how?

  • @ahans123
    @ahans123 3 ปีที่แล้ว +2

    If you don't to any normalization, a reasonable choice for theta can easily be much larger than 1. Since with least squares we have a convex error surface, you don't need to normalize. However, I agree that in general normalizing your data doesn't hurt and in that case your suggestion of picking a value between 0 and 1 makes a lot of sense! Kudos for the nice explanation and derivation!

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thanks for watching! Glad you found the video enjoyable.

  • @julianocamargo6674
    @julianocamargo6674 4 ปีที่แล้ว

    Very well explained. Your channel should have a lot more views.

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching! I am glad you found it useful

  • @MrAbhishek75
    @MrAbhishek75 4 ปีที่แล้ว +2

    awesome...i was totally confused in ridge regression as i am new to Data science. Thanks a lot for your help.

    • @EndlessEngineering
      @EndlessEngineering  4 ปีที่แล้ว

      Hi Abhishek, glad to help! Thank you for watching

  • @themadone7568
    @themadone7568 3 ปีที่แล้ว +1

    Madone: This was brilliant. Its going straight from your video into Matlab. I'm beginning to understand the maths of the reservoir computing model echo location I'm writing !!!!
    I got this equation of ridge regression from Tholer's PhD : Wout= (Tm'*M)*((M'*M)+B*eye(N))^-1; and there at 8.51 it's derivation is explained. Thanks.

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching, glad you found it useful

  • @nathancruz2843
    @nathancruz2843 3 ปีที่แล้ว +1

    Great explanation. Followed along just fine after reading ISLR ridge section. Helped me see the approach of RR behind the code and text.

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching, glad you found it useful

  • @mathhack8647
    @mathhack8647 2 ปีที่แล้ว

    Amazing explanation ,

  • @adrianbrodowicz3485
    @adrianbrodowicz3485 4 ปีที่แล้ว +1

    That's a kind of video what i was looking for. There is a lot of videos with obvious informations and nothing about mathematical representation and derivatives. You did it very well.
    What about constant - theta_{0}? A lot of sources say that theta_{0} shouldn't be regularized and then in the equation instead of identity matrix we should use modified identity matrix with first row full of zeros.

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว +1

      Hey Adrian. Thanks for watching.
      You bring up a good point here, and I think the answer to that is that it depends. Every model and every datasets may have different scaling requirements, and having the theta_0 (bias) term regularized or not depends on that. I have personally always implemented it with regularization, ans have not needed to take it out. I would be interested to see how that affects the results. Maybe I can do a test example and make a video on that!

  • @rohitkamble8515
    @rohitkamble8515 ปีที่แล้ว

    Thanks for the wonderful explanation. Could you please make same video for lasso and elastic net.

  • @step_by_step867
    @step_by_step867 3 ปีที่แล้ว

    Nice video, I appreciate it !

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching! Glad you found it clear and useful

  • @alanp9936
    @alanp9936 9 หลายเดือนก่อน

    Great video 😁

  • @CarpoMedia
    @CarpoMedia ปีที่แล้ว

    how can I apply this to a small artificial dataet? do you have any examples for that

  • @bastiano2939
    @bastiano2939 3 ปีที่แล้ว +1

    Great explaination but why does Lambda has to be times the identity matrix?

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching, glad you found it useful
      Lambda is a scalar and we can't add a scalar to a vector/matrix directly so we need to multiply by an identity matrix of the proper size

  • @maxbarnes2240
    @maxbarnes2240 3 ปีที่แล้ว

    Excellently explained! Thank you

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching! Glad you found it useful

  • @HaramChoi-l4z
    @HaramChoi-l4z 4 ปีที่แล้ว

    많은 도움이 되었습니다 from korea Thank you!

    • @EndlessEngineering
      @EndlessEngineering  4 ปีที่แล้ว

      Thank you for watching! I am glad you found the video useful

  • @EW-mb1ih
    @EW-mb1ih 3 ปีที่แล้ว

    at the beginning, why do you put a bar on top of x?

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      The bar is to show that this the vector x_bar = [1, x]^T. We add a 1 to the vector x to make the equation compact

  • @WestcoastJourney
    @WestcoastJourney 4 ปีที่แล้ว

    In the end I was like "wait what are all those formulas???"

  • @pranjaldureja9346
    @pranjaldureja9346 4 ปีที่แล้ว +2

    it would be superb if you could do the same from scratch in python i.e. formulation of matrices X and Y,optimizing the cost function(minima) and arriving at theta.

    • @EndlessEngineering
      @EndlessEngineering  4 ปีที่แล้ว +1

      Hi Pranjal, I am working on this for a potential next video. Just like I did the python implementation for linear regression from scratch, I am planning for a video that does ridge regression in python. Stay tuned! And thanks for watching

  • @Amin-gs9mn
    @Amin-gs9mn 3 ปีที่แล้ว

    Thank you. You explain clearly

  • @fatmahm5787
    @fatmahm5787 4 ปีที่แล้ว

    Thank you for you explain it was wonderful. I have a question how can i use the Ridge regression in matlab ? And if i have my input and output how will I use them in the code of ridge regression and what it will be the coefficient in ridge regression ? Please help me i can’t figure it out

    • @EndlessEngineering
      @EndlessEngineering  4 ปีที่แล้ว

      Hi Fatma, thank you for your comment. I have a video on coding ridge regression in python (see link below), The code for that video is on GitHub as well. Fortunately I do not have any code in matlab, but the concepts should directly translate to a matlab implementation.
      th-cam.com/video/WatqxWFhcZk/w-d-xo.html

  • @usmanjavaid4195
    @usmanjavaid4195 4 ปีที่แล้ว

    Nice Explaination

  • @mohamadabdulkarem206
    @mohamadabdulkarem206 4 ปีที่แล้ว

    excellent work

  • @Account-fi1cu
    @Account-fi1cu 3 ปีที่แล้ว

    Thank you, good stuff

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Thank you for watching! Glad you found it helpful

  • @yatinarora9650
    @yatinarora9650 3 ปีที่แล้ว

    What i didn't understand is , now lambda will be only on the diagonal,and how it'll help (X)TX - lamda(Identical matrix) why just the identical element , why not all

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      Hi Yatin.
      Lambda is away to penalize the model parameters from getting too large, so if you set lambda=1 you get an identity matrix. But typically in practice that is a large weight to penalize the model parameters, usually lambda is a positive number that is

  • @ganesh3189
    @ganesh3189 2 ปีที่แล้ว

    i love you so much sir

  • @masterj2245
    @masterj2245 4 ปีที่แล้ว

    Very helpful

  • @alideeb4228
    @alideeb4228 3 ปีที่แล้ว

    thank you!

  • @analistaremoto
    @analistaremoto 3 ปีที่แล้ว

    nice

  • @anarbay24
    @anarbay24 4 ปีที่แล้ว +1

    The final formula is not correct. You should not get identity matrix $I$ in the formula.

    • @EndlessEngineering
      @EndlessEngineering  4 ปีที่แล้ว

      Hi Anar, thank you for the comment.
      The reason an identity matrix is required is for mathematical consistency. The first term in brackets (x^Tx) is a square matrix and we can't add a scalar (lambda) to a square matrix, so to have the correct mathematical notation the identity is required.

    • @yatinarora9650
      @yatinarora9650 3 ปีที่แล้ว

      @@EndlessEngineering why we should not have concidered at one matrix where all values are 1 instead of only diagonal as 1.

    • @EndlessEngineering
      @EndlessEngineering  3 ปีที่แล้ว

      @@yatinarora9650 a diagonal of 1 does not follow the same formulation as penalizing the norm of the model parameters with lambda. In practice lambda is a positive number that is usually < 1. We do not want to penalize the norm of the model parameters too much, that might cause us to not fit the data well

  • @chiaosun3505
    @chiaosun3505 3 ปีที่แล้ว

    I love you