Sparsity and the L1 Norm

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ธ.ค. 2024

ความคิดเห็น • 56

  • @danbochman
    @danbochman 4 ปีที่แล้ว +13

    It's amazing how you can work for years with a function and still gain new insights about it and improve your intuition. Thanks you so much for these videos.

  • @alexhartung2369
    @alexhartung2369 4 ปีที่แล้ว +6

    mind blown. this guy is a rare breed. Loving the content

  • @SamAndHenryPlaz
    @SamAndHenryPlaz 4 ปีที่แล้ว +6

    Finally got a nibble of insight into L1, L2 norms they bring up in Ng's coursera classes. Seems like it would be good to use when you want to prune the size of a network after training. I love the chunks of insights these videos give and the very understandable way they're explained.

  • @sahilchawla2485
    @sahilchawla2485 2 ปีที่แล้ว

    Best 11 mins spent! Thanks a lot for distinguishing your solution than the rest of the internet!

  • @PunmasterSTP
    @PunmasterSTP 2 ปีที่แล้ว

    Sparsity? More like “Super explanation that’s gotta be”…one of the best I’ve ever heard on the topic. Thank you so much for making and sharing all of these incredibly high-quality videos!

  • @chaser27
    @chaser27 3 ปีที่แล้ว +3

    You've convinced me to buy your book based on these videos.

    • @Eigensteve
      @Eigensteve  3 ปีที่แล้ว

      Awesome, hope you like it!

  • @jasleenkaur1208
    @jasleenkaur1208 2 ปีที่แล้ว

    Really best videos...I have been reading the stuff for dictionaries creation from last two week and now I understood the concept

  • @aantoine5819
    @aantoine5819 4 ปีที่แล้ว

    This was a perfect supplement to a lecture I just watched for class

  • @pevogam
    @pevogam 4 ปีที่แล้ว +1

    These are all very visual and well made lectures, thanks a lot for the high quality videos and openly available materials!

  • @andreygrant8676
    @andreygrant8676 4 ปีที่แล้ว +3

    I don't know why this came up on my suggestions list but was a good explanation! Thank you Steve :)

    • @TheLuceArs
      @TheLuceArs 4 ปีที่แล้ว

      Same
      I was literally browsing YT suggestions while waiting a night at the airport and saw one of these videos. Who would've known that this becomes the most useful night in my university years

  • @nmerrillvt
    @nmerrillvt 3 ปีที่แล้ว

    Amazing videos. The quality and clarity are above and beyond. Thanks!

  • @cDtag00
    @cDtag00 4 ปีที่แล้ว +5

    Visually, l_{2/3} looks like an attractive choice as well, as it approximates l_0 even better than l_1 does. Whats the advantage of l_1 over l_{2/3}?

    • @NowanIlfideme
      @NowanIlfideme 4 ปีที่แล้ว +4

      I think, like the 0-norm, it's not possible to use convex optimization to find solutions with a norm of p

    • @cDtag00
      @cDtag00 4 ปีที่แล้ว

      @@NowanIlfideme Thank you, makes sense... I was indeed thinking of neural networks and other forms of non-linear optimization

  • @nguyenngocly1484
    @nguyenngocly1484 4 ปีที่แล้ว +3

    You can train neural networks with sparse mutations. Continuous Gray Code Optimization mutations work well. Each GPU gets the full neural model and part of the training set. Not much data needs to move around, just a sparse mutation list, costs back and accept or reject.
    A fixed random pattern of sign flips is a simple random projection. If you follow it by a Walsh Hadamard transform you get a much better random projection. The WHT acts as a system of fixed orthogonal dot products. The variance equation for linear combinations of random variables applies to dot products. The CLT applies not just to sums but any combination of adding and subtracting (cf WHT)
    You can have inside_out neural networks with fixed dot products and parametric (adjustable) activation functions. Ie Fast Transform fixed-filter-bank neural nets. You do a simple random projection before the net to stop the first layer taking a spectrum and do a final fast transform as a pseudo_readout layer. f(x)=x.ai x=0 , i=0 to n-1 is a suitable parametric activation function.

  • @hoaxuan7074
    @hoaxuan7074 3 ปีที่แล้ว

    It is quite simple to use smoothing to solve compressed sensing. Though I found quite slow. You repeatedly correct with the compressed sensing data, smooth, correct, smooth..... The binomial filter is quite good. These days I would prefer to solve with a neural net in one step. Likely to be much faster.

  • @yuhuawei1763
    @yuhuawei1763 3 ปีที่แล้ว

    Excellent explanation! Thank you!

  • @adityamrai8508
    @adityamrai8508 3 ปีที่แล้ว

    You are simply amazing. Keep inspiring.😎

  • @hashirroshinvaliyaparambil70
    @hashirroshinvaliyaparambil70 4 ปีที่แล้ว +1

    Hey prof Steve, could you please make videos about Lyapunove stability and especially nonlinear systems

  • @meccamiles7816
    @meccamiles7816 4 ปีที่แล้ว +1

    Does someone have a simple way of describing why the L0 sparse solution is NP complete?

  • @deez_gainz
    @deez_gainz 4 ปีที่แล้ว +2

    Does L1 norm regularization for the neural network layer also promote sparsest weights in the layer?

    • @SamAndHenryPlaz
      @SamAndHenryPlaz 4 ปีที่แล้ว

      I'm wondering this as well. It does seem that way.

  • @yasir9909
    @yasir9909 3 ปีที่แล้ว

    Sir do also have a similar detailed lecture on sparsity introduced by l-infinity norm?

  • @EduardoGarcia-tv2fc
    @EduardoGarcia-tv2fc 4 ปีที่แล้ว +1

    Are we gonna get more videos on chapter 3?

  • @proteuswave
    @proteuswave 8 หลายเดือนก่อน

    Great video. Also..your setup is nuts! Writing backward? Mirror image? Does not compute!

  • @xinyichen4332
    @xinyichen4332 3 ปีที่แล้ว +2

    Hi, I still don’t understand why finding the L0 norm of S is NP hard. Could you please explain a bit? I’d appreciate your help. Thanks!

    • @Eigensteve
      @Eigensteve  3 ปีที่แล้ว +13

      Good question. So computing the L0 norm of a vector is simple: just count all of the non-zero entries. But finding the solution "x" of a system of equations Ax=b where x has the minimum L0 norm is NP hard. You would have to search through all possible sparse vectors (you would try all vectors with only one entry, then all vectors with only 2 entries, then all vectors with only 3 entries), until you find the sparsest solution. This is a combinatorial search that scales like n! (where n is the dimension of x). n! grows way too fast for modern computers to handle.

  • @zrmsraggot
    @zrmsraggot 2 ปีที่แล้ว

    What's the difference between a sparse solution coming up on S1 and one on S2 ?

  • @meganauger354
    @meganauger354 4 ปีที่แล้ว +7

    Sort of a random question, but what would happen if, when using the L1 norm, the s vector was perfectly parallel to one of the sides of the L1 square? Would it return a dense solution, instead of sparse? Would that mean you are in an inappropriate basis for a sparse solution? Thanks!

    • @viniciusvilela5371
      @viniciusvilela5371 3 ปีที่แล้ว

      I guess that's why he said it often brings a sparse solution, not always...

    • @aymericmelt8083
      @aymericmelt8083 5 หลายเดือนก่อน

      I think you can't solve the problem. The algorithme will not converge. It's a case where the L0 norme is better. But as he said more you have dimensions and more it works and I thinks it's because the probability of the ligne to be align decreased exponantialy with higher dimensions.

  • @alegian7934
    @alegian7934 4 ปีที่แล้ว +1

    I love how Y variable sounds like "why"

  • @praveencv431
    @praveencv431 2 ปีที่แล้ว

    Any formal proof that l1 norm minimisation gives the sparsest solution in higher dimensions? I was trying to find a reference for it but unable to get one .

  • @NowanIlfideme
    @NowanIlfideme 4 ปีที่แล้ว +7

    Hi, will you also be covering the mathematical background of the L1 sparsity property? I understand it intuitively, but I haven't found a good intuitive and rigorous mathematical proof/explanation of it.

  • @adityagarg259
    @adityagarg259 ปีที่แล้ว

    What if the line and one-norm diamond becomes parallel?

  • @mikefischbein3230
    @mikefischbein3230 3 ปีที่แล้ว

    I keep picturing scenes from Cheers where Norm walks into the bar and everyone shouts "Norm!"
    Then Sam asks "What'll it be today, Norm? L1 or L2?"

  • @TheMazyProduction
    @TheMazyProduction 4 ปีที่แล้ว

    Is L_infinity norm the most uniform solution?

  • @umangyadav4787
    @umangyadav4787 4 ปีที่แล้ว

    Hey, steve
    I have question like in lasso we have regularisation term |w| if this is. Change to w^2/1+w^2 ...
    What change it make to coefficient?

  • @varunjohnson9570
    @varunjohnson9570 4 ปีที่แล้ว

    Sir when can we expect continuation videos for Chapter 3?

  • @johnathancorgan3994
    @johnathancorgan3994 3 ปีที่แล้ว

    Steve, I love how your abbreviation of "solution" is "sol" to nth power 😏

  • @davidmadsen2761
    @davidmadsen2761 4 ปีที่แล้ว

    Isn't taxi-cab norm L1?
    As in the distance a car will have to drive in a grid system

  • @tcratius1748
    @tcratius1748 4 ปีที่แล้ว +1

    How do you prove a L0 if you can not compute it? Maths is weird, yet the L1 is handy for a project I'm doing, so I should really be asking, how did you read my mind...👻 ps is it useful for imbalance classes?

    • @snippletrap
      @snippletrap 4 ปีที่แล้ว +1

      It's computable but not differentiable (since non-convex). Later in the video he conflates "computable" with "tractable", but that is a mistake and not true.

    • @tcratius1748
      @tcratius1748 4 ปีที่แล้ว

      @@snippletrap hmm, as a fairly poor mathematician studying data science at uni, this all computable, understanding takes a whole new and incredibly sparse vocabulary that seems to be different depending on the authors preference of symbols. Oh well, just keep on keeping on. :)

    • @snippletrap
      @snippletrap 4 ปีที่แล้ว

      @@tcratius1748 Among computer scientists these terms are not ambiguous. "Computable" means it can be computed. Not everything can be computed -- see the Halting Problem for instance. "Tractable" generally means that computation time is not exponential in the size of the input. Hard problems like non-convex optimization are computable but not tractable. A computer will get eventually solve those problems, but it may take centuries to do so.

    • @tcratius1748
      @tcratius1748 4 ปีที่แล้ว

      @@snippletrap yeap, I guess it didn't help I did biology as my bach. Oh, well enjoy, if you really wish to chat more I can give you my LinkedIn otherwise cheers for the chat.

  • @alexeyl22
    @alexeyl22 3 ปีที่แล้ว

    "there are infinitely many asses" 😂😂😂. You said it two times and I concur

  • @enigmabloom
    @enigmabloom 4 ปีที่แล้ว

    Shoot, this guy's from UW? I'm doing the master's for EEs there right now lol

  • @DerekWoolverton
    @DerekWoolverton 4 ปีที่แล้ว

    Now I have to go searching for the formula for l_{2/3} norm. I can guess what it is, but it seems oddly out of place compared to all the integer bases, so I'm as curious where it originally came up.

  • @MrAlextorex
    @MrAlextorex 4 ปีที่แล้ว +1

    He writes backwards. how impressive!

    • @davidmadsen2761
      @davidmadsen2761 4 ปีที่แล้ว

      probably mirroring in edit, so this is not actually how Steve would look like in person

  • @goodgame7474
    @goodgame7474 3 ปีที่แล้ว

    This comment is literally literal !

  • @zakaryaelkhiyati8563
    @zakaryaelkhiyati8563 4 ปีที่แล้ว

    neat

  • @MrAlextorex
    @MrAlextorex 4 ปีที่แล้ว

    He tells a nice story but his explanation requires a lot of background knowledge so in the end I do not feel very satisfied. It would be useful to add also more detailed tutorial links for further study