Linear regression (2): Gradient descent

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ต.ค. 2024

ความคิดเห็น • 29

  • @emmanuel5566
    @emmanuel5566 4 ปีที่แล้ว +77

    so, how many of you came here from Andrew NG's ML Course?

  • @yemaneabrha6637
    @yemaneabrha6637 4 ปีที่แล้ว +2

    Simple , Clear , and Gentle Explanation Thanks More Prof.

  • @JulianHarris
    @JulianHarris 9 ปีที่แล้ว +1

    I'm very visual so I particularly loved the visualisations showing the progressive improvement of the hypothesis as the parameters were refined.

  • @redfield126
    @redfield126 5 ปีที่แล้ว

    Very interesting. Thanks for the clear and vizual explanation that provided me quite good intuition of different version of Gradient descent.

  • @sagardolas3880
    @sagardolas3880 6 ปีที่แล้ว +1

    This was the simplest explaination as most beautiful and precise one

  • @AronBordin
    @AronBordin 9 ปีที่แล้ว

    Exactly what I was looking for, thx!

  • @poltimmer
    @poltimmer 8 ปีที่แล้ว

    Thanks! Making an essay on machine learning, and this really helped me out!

  • @mdfantacherislam4401
    @mdfantacherislam4401 6 ปีที่แล้ว

    Thanks for such kinda helpful lecture

  • @JanisPundurs
    @JanisPundurs 9 ปีที่แล้ว

    This helped a lot, thanks

  • @NicoCarosio
    @NicoCarosio 8 ปีที่แล้ว

    gracias!

  • @叶渐师
    @叶渐师 8 ปีที่แล้ว

    thx a lot

  • @fyz5689
    @fyz5689 8 ปีที่แล้ว

    excellent

  • @anthamithya
    @anthamithya 5 ปีที่แล้ว

    First of all, how do we know that j(theta) curve is of that kind? The curve will be obtained only after the gradient descent worked upon or run random 1000 or so theta values...

  • @sagarbhat7932
    @sagarbhat7932 4 ปีที่แล้ว

    Wouldn't online gradient decent cause the problem of overfitting?

    • @AlexanderIhler
      @AlexanderIhler  4 ปีที่แล้ว

      Overfitting is not really related to the *method* of doing the optimization (online=stochastic GD, versus batch GD, or second order methods like BFGS, etc.) but rather to the complexity of the model, and the *degree* to which the optimization process is allowed to fit the data. So, early stopping (incomplete optimization) can reduce overfitting, for example. Changing optimization methods can appear to change overfitting simply because of stopping rules interacting with optimization efficiency, but they don't really change the fundamental issue.

  • @宗宝冯
    @宗宝冯 7 ปีที่แล้ว

    nice

  • @sidbhatia4230
    @sidbhatia4230 5 ปีที่แล้ว

    What modifications can we make to use l2 norm instead?

  • @UmeshMorankar
    @UmeshMorankar 8 ปีที่แล้ว

    what if we set the
    learning rate α to too large a value.

    • @SreeragNairisawesome
      @SreeragNairisawesome 8 ปีที่แล้ว

      +Umesh Morankar Then it might diverge or offshoot from the minima..... for eg. if the minima is 2 ,the latest value of Θ = 4 and α = 8 (suppose) , then it would diverge to 4 - 8 = -4 which is too far away from 2 whereas if α = 1(too small) then it would reach the local minima in the next 2 iterations.
      I havent applied the algorithm. This is just for explanation purpose.

  • @prithviprakash1110
    @prithviprakash1110 6 ปีที่แล้ว

    Can someone explain how the derivative of ⍬X(t) wrt ⍬ becomes X and not X(t)?

    • @patton4786
      @patton4786 6 ปีที่แล้ว

      because it is reference to theta0 therefore derivative theta0*x is 1*theta0^(1-0)*x=1*1*x=x (btw, this is partial derivative, so all other terms before derivative are constant expect theta0*x0)

  • @EngineersLife-Vlog
    @EngineersLife-Vlog 6 ปีที่แล้ว

    Can i get this slide please