What I learnt from this video is that as we move towards greater norm, we tend to go from distinction to similarity. The sensitivity of outliers also increases (in regression). Maybe one of the reason could also be that l2 regularization already set the bar high of sensitivity, and people hardly use L3 regularization. One of the used cases I can think of that it can be used in identifying outliers .....since, the more sensitive it makes the outliers, the clarity of this we could get.
Great videos but I don’t know what L1 regularization is, and I wonder why we expect the objective functions to hit a corner instead of overlapping with the shape or existing far from the shape, or why there are multiple objective functions in the first place, and why we’re limited to a 2D plane. Still, I found the video interesting, so you did a great job.
You want to minimize the objective (whose contours/level curves are shown as ellipses here) while staying inside the convex shape (a diamond for the L1 norm). At 2:33, you see that the constrained minimizers are located at the corners of the diamond.
I was thinking, what difference would larger versions of the L_n norm lead to from placing all features in the same column. My other thought was, would it encode similar information to L1 as as a -1 is anti correlated, 0 is no correlation, 1 is correlated, but just fail to have the ability to state no correlation. I'll definitely have to think a bit more on this.
I notice that when we use regularization for a model trained with centralized, standardized data, its effect is not noticeable and does not affect the parameters that much. Why? I'm new to data science btw.
What I learnt from this video is that as we move towards greater norm, we tend to go from distinction to similarity. The sensitivity of outliers also increases (in regression). Maybe one of the reason could also be that l2 regularization already set the bar high of sensitivity, and people hardly use L3 regularization.
One of the used cases I can think of that it can be used in identifying outliers .....since, the more sensitive it makes the outliers, the clarity of this we could get.
You could've also quickly talked about nuclear norms ($L_0$) also. They are more pointy and spiky.
thanks for the note and good suggestion for a later video!
Great videos but I don’t know what L1 regularization is, and I wonder why we expect the objective functions to hit a corner instead of overlapping with the shape or existing far from the shape, or why there are multiple objective functions in the first place, and why we’re limited to a 2D plane. Still, I found the video interesting, so you did a great job.
You want to minimize the objective (whose contours/level curves are shown as ellipses here) while staying inside the convex shape (a diamond for the L1 norm). At 2:33, you see that the constrained minimizers are located at the corners of the diamond.
I was thinking, what difference would larger versions of the L_n norm lead to from placing all features in the same column.
My other thought was, would it encode similar information to L1 as as a -1 is anti correlated, 0 is no correlation, 1 is correlated, but just fail to have the ability to state no correlation.
I'll definitely have to think a bit more on this.
I notice that when we use regularization for a model trained with centralized, standardized data, its effect is not noticeable and does not affect the parameters that much. Why?
I'm new to data science btw.
Awesome