Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 29

  • @Ben_D.
    @Ben_D. 8 หลายเดือนก่อน

    I don't know why the algorythm has neglected showing me your content so long. It is right up my alley. I hope you keep up coming with the latest news from AI. It is the biggest thing to happen to humanity since ever, and people still dont react to it. I like your vids. You are smart and the animations are fun. And god, the lipstick. Holy shit. 🙃😍

  • @AM-yk5yd
    @AM-yk5yd 9 หลายเดือนก่อน +4

    Nice animations.
    It feels like we are going in circles: this paper (and ReLU Strikes Back) reintroduce ReLU; S6, RWKV, RetNet reintroduce RNN. Flip a coin what past happens next - residual-free models or AI Winter.

  • @poketopa1234
    @poketopa1234 9 หลายเดือนก่อน +3

    What a crazy video. I learned so much, thank you for making this!

  • @zerotwo7319
    @zerotwo7319 9 หลายเดือนก่อน +4

    Great explanation! Spacity is good.

  • @Aca99100
    @Aca99100 9 หลายเดือนก่อน +5

    This channel is a real gem! Can we continue to expect 2 videos a month?

    • @Aca99100
      @Aca99100 9 หลายเดือนก่อน +3

      Btw I already love the "can we run larger models now" comments. Seems like every time there is a breakthrough to make NNs more efficient it's just used to make them bigger :D

    •  9 หลายเดือนก่อน +3

      @@Aca99100 I read this comment just after I wrote my own "can we run larger models now". LOL

    • @AICoffeeBreak
      @AICoffeeBreak  9 หลายเดือนก่อน +4

      @Aca99100 Yes, the more efficient, the larger because the larger, the better (so far). 😅
      I try to keep up the pace, but it will be hard since I will be submitting my thesis in a few months. I am working on a MAMBA video now, but it is hard to find the time, these days. :(

    •  9 หลายเดือนก่อน +2

      @@AICoffeeBreak Same here. I'm also submitting my thesis in a few months. I really appreciate how you can still keep up making quality videos.

    • @AICoffeeBreak
      @AICoffeeBreak  9 หลายเดือนก่อน +3

      Omg, then good luck to us both!
      What topic are you working on? (So what's the title?)

  • @DerPylz
    @DerPylz 9 หลายเดือนก่อน +8

    I never thought that what I need in life was Ms. Coffee Bean telling me to "sit down". Now I know

  •  9 หลายเดือนก่อน +4

    This was a fantastic and concise explanation!! I'll read the paper in more detail; however, is this method also effective when combined with quantization? I want to run large models in reasonably priced hardware just for inference.

    • @AICoffeeBreak
      @AICoffeeBreak  9 หลายเดือนก่อน +2

      Yes, it is compatible with quantization. The paper has ablations on this: "Furthermore, we show several ablations on different components of DEJAVU and its compatibility with quantization techniques."

  • @lelouch1722
    @lelouch1722 9 หลายเดือนก่อน +6

    Isn't GELU already enforcing "input-dependent" sparsity ?

    • @AICoffeeBreak
      @AICoffeeBreak  9 หลายเดือนก่อน +6

      Exactly. You do not need to watch the end of the video. 😅

  • @laurentbruere9708
    @laurentbruere9708 7 หลายเดือนก่อน +1

    Thanks!

    • @AICoffeeBreak
      @AICoffeeBreak  7 หลายเดือนก่อน +1

      Thank you so much! I'll go get a coffee with this money now.

  • @ramkumarr1725
    @ramkumarr1725 หลายเดือนก่อน

    IT majors in India used to recruit for general intelligence and then make it sparse in the profession, focusing on specialized, repetitive tasks rather than broad skill development.

  • @ew3995
    @ew3995 9 หลายเดือนก่อน +7

    does this mean we can run larger models on smaller gpus?

    • @AICoffeeBreak
      @AICoffeeBreak  9 หลายเดือนก่อน +8

      Yes, this is what this means. I do think this would be most beneficial for mobile devices, since the sparsity is input-dependent, thus it makes sense to use it only when you need to load an LLM-powered app to run only a few prompts with it. 02:33

    • @TheSlyMouse
      @TheSlyMouse 9 หลายเดือนก่อน

      ​@@AICoffeeBreakdoes that mean with longer prompt sessions you have less zeros in the matrix, or something like that, so this does not work as well? Or am I missing something

  • @Schaelpy
    @Schaelpy 9 หลายเดือนก่อน

    Awesome summary! Thank you so much

  • @ramkumarr1725
    @ramkumarr1725 หลายเดือนก่อน

    Yes, when a person refers to the human brain in comparison to AI, they generally mean the collective intelligence of humanity, rather than the capabilities of an individual brain.

  • @Artifactorfiction
    @Artifactorfiction 9 หลายเดือนก่อน +2

    GPT makers could have a share price hit