Robust Regression with the L1 Norm [Python]

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ธ.ค. 2024

ความคิดเห็น • 18

  • @drskelebone
    @drskelebone 3 ปีที่แล้ว +2

    Can you comment on L1 "robustification" vs weighting schemes like IRLS (iteratively reweighted least squares)? Obviously L1 should be faster (no need to I R the LS), but is the fit better/able to reject less obviously bad outliers?

  • @sacramentofwilderness6656
    @sacramentofwilderness6656 2 ปีที่แล้ว +1

    Can one explain the vulnerability of L2 regression with respect to outliers by the fact that L2 regression is based on assumption that data comes from a normal distribution with light tails (fastly decaying from the mean)? For more robust algorithm one should use distribution with heavier tails, say Cauchy. However, not for all priors on distributions I would think that there exists a symply analytical solutiion as for L2 regression.

  • @neophytefilms1268
    @neophytefilms1268 3 ปีที่แล้ว +2

    Very interesting video! It would have been nice to compare the estimated slope from the L2 and L1 norm without the outlier. The L2 norm is a MLE in the case of normaly distriputed noise which makes it very valueable for clean data. In case someone is interested: a compromise between the MLE property of the L2 norm and robustness is for example weight iteration in a least squares adjustment. In this method the adjustment is done iteratively while the weight of the indiviudal obs are updated based on the size of their error.

    • @nomansbrand4417
      @nomansbrand4417 ปีที่แล้ว

      You could even iterate your way towards a certain cost function / norm this way. Weighting the errors with the absolute distance would eventually return the L1 norm, if I'm not mistaken.

  • @aj35lightning
    @aj35lightning 3 ปีที่แล้ว +5

    I might have missed it in another video, but if the l1 is so robust and makes more sense in real world use cases, why is the l2 so popular? Is there an explicit trade-off or should everything just use l1?

    • @aj35lightning
      @aj35lightning 3 ปีที่แล้ว

      @@taktoa1 thank you, this makes sense now

    • @tommclean9208
      @tommclean9208 3 ปีที่แล้ว +1

      @@taktoa1 With today's processing power, is there basically no negatives to using the L1 norm to the L2 norm?

    • @jafetriosduran
      @jafetriosduran 3 ปีที่แล้ว

      Si se quiere calcular la norma L1 se requiere usar cálculo subdiferencial debido a que la definición usa la función absoluto lo cual al aplicar el subgradiente en la discontinuidad hay una infinidad de tangentes

  • @yerooumarou5340
    @yerooumarou5340 5 หลายเดือนก่อน

    I have a question about it if someone could answer .. the l1 is not différentiable at zero and the default method for scipy is the Bfgs that use gradient information to update .. how is that possible ??

  • @pierregravel5941
    @pierregravel5941 ปีที่แล้ว

    We use the L2 norm everywhere because we can easily differentiate it in order to minimize it. Differentiation is simple because the L2 norm is based on the square of the error terms. Try to differentiate the L1 norm that contains absolute values of the error terms.

  • @MrZitrex
    @MrZitrex 3 ปีที่แล้ว

    Thanks for this vid. Prefectly timed

  • @pythonking1705
    @pythonking1705 3 ปีที่แล้ว

    Witch one is the best Matlab or python in math please help me ??

    • @lena191
      @lena191 3 ปีที่แล้ว +3

      it doesn't really matter as long as you know how to use one of them. However, python is free whereas Matlab is not. So that should make it easier for you to choose.

    • @pythonking1705
      @pythonking1705 3 ปีที่แล้ว

      Thank you so much

  • @insightfool
    @insightfool 3 ปีที่แล้ว +1

    There's a lot of talk these days about the tradeoffs of using L1 vs. L2 norms related to racial/gender bias in machine learning algos. Isn't there some way to get the best of both worlds?

    • @Eigensteve
      @Eigensteve  3 ปีที่แล้ว

      I've been hearing more about this too, which is quite interesting. There are lots of mixed norms that capture aspects of L1 and L2, and also you can have both penalties, as in the elastic net (combines L1 and L2 ridge regression)

  • @prashantsharmastunning
    @prashantsharmastunning 3 ปีที่แล้ว

    fat finger entry :P