xLSTM: The Sequel To The Legendary LSTM

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ค. 2024
  • Join the waitlist now for exclusive access to the OnDemand platform: on-demand.io/contact
    The LSTM sequel by Sepp Hochreiter is finally here, surpassing its own self by 300x and even Mamba architectures. This innovative architecture blends Scalar and Matrix LSTMs, offering a great performance increase in benchmarks and language modeling tasks. This could be big??
    Check out my newsletter mail.bycloud.ai/
    LSTM (1997)
    [Paper] deeplearning.cs.cmu.edu/F23/d...
    xLSTM (2024)
    [Paper] arxiv.org/abs/2405.04517
    [unofficial resource collection] github.com/AI-Guru/xlstm-reso...
    This video is supported by the kind Patrons & TH-cam Members:
    🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
    [Discord] / discord
    [Twitter] / bycloudai
    [Patreon] / bycloud
    [Music] massobeats - noon
    [Profile & Banner Art] / pygm7
    [Video Editor] Silas
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 84

  • @bycloudAI
    @bycloudAI  28 วันที่ผ่านมา +12

    Join the waitlist now for exclusive access to the OnDemand platform: on-demand.io/contact
    let me know if you like these kinds of research breakdown too!
    my newsletter: mail.bycloud.ai/

  • @khla.mp4
    @khla.mp4 28 วันที่ผ่านมา +330

    I'm glad we have someone to translate research language into memes and references for us

    • @animalnt
      @animalnt 28 วันที่ผ่านมา +13

      Just need the soundboard and temple run in the corner and we skibidi

    • @Wagner-uv6yp
      @Wagner-uv6yp 28 วันที่ผ่านมา +1

      @@animalnt Yeah like what is this guy's background lol he sounds like a college kid not a PhD person.

    • @animalnt
      @animalnt 28 วันที่ผ่านมา +2

      @@Wagner-uv6yp "not a PhD person" 💀

    • @thebrownfrog
      @thebrownfrog 27 วันที่ผ่านมา +3

      Good thing I'm 15. Perfect channel for me

    • @revimfadli4666
      @revimfadli4666 24 วันที่ผ่านมา

      Other than bycloud and fireship (and maybe 2 minute papers and yannic kilcher), is there another?

  • @SmallLanguageModel
    @SmallLanguageModel 28 วันที่ผ่านมา +93

    I heard people in the RWKV and EleutherAI discord complain that they used wrong hyperparameters for some other architectures, while they used the most optimal hyperparameters for xLSTM. So the results are not entirely honest and they try to hype up their own architecture, but what else is new...

    • @bycloudAI
      @bycloudAI  28 วันที่ผ่านมา +38

      I was a tiny bit suspicious when xLSTM didn't publish codes, but damn okay

    • @whannabi
      @whannabi 28 วันที่ผ่านมา +7

      When money ruins research... But hey, it's been a thing since research exists.

    • @adamrak7560
      @adamrak7560 28 วันที่ผ่านมา +13

      The context extension comparison is practically wrong too, and highly misleading.
      If you train transformer for a small context length without any extra extension tricks it will perform really badly for large context, by design. Comparing to that is simply not useful. It is simply misleading for most readers/viewers. There are context extension tricks you can use which avoid perplexity explosion for the transformers, those would chance the graph very significantly.

    • @nnnik3595
      @nnnik3595 28 วันที่ผ่านมา +2

      Also when you actually look at the graph at 3:54 in detail you can see how the presented combination of both [1, 1] is worse than either of the previously existing methods in almost all metrics.

  • @benjamineidam
    @benjamineidam 28 วันที่ผ่านมา +11

    (> complex topic = > memes / s) = AWESOME!

  • @ij9375
    @ij9375 28 วันที่ผ่านมา +30

    I am a simple man, I see King Baldwin, I read AI, I click thumbnail😂

    • @sanderbos4243
      @sanderbos4243 28 วันที่ผ่านมา +2

      Kingdom of Heaven, my love

  • @DrW1ne
    @DrW1ne 28 วันที่ผ่านมา +8

    Every video of yours is like my own Christmas! Keep it up.

  • @user-cg7gd5pw5b
    @user-cg7gd5pw5b 28 วันที่ผ่านมา +25

    So many memes per second, my brain can't process everything.

  • @bobspianosbffl
    @bobspianosbffl 28 วันที่ผ่านมา +4

    This video was so good what the hell
    Thank you for delivering me this complex info in an easily consumable way
    Subscribed

  • @OnDemandAI
    @OnDemandAI 13 วันที่ผ่านมา

    This OnDemand looks like quite the hoot!

  • @literailly
    @literailly 28 วันที่ผ่านมา

    Editing is amazing lol. Nice job, bycloud

  • @cdkw2
    @cdkw2 28 วันที่ผ่านมา

    The newsletter is fire though bro, thanks for that

  • @aykutakguen3498
    @aykutakguen3498 28 วันที่ผ่านมา +1

    Interesting, thanks for the le vid

  • @whoami6107
    @whoami6107 28 วันที่ผ่านมา +5

    Really like the way you make these videos, i couldn't have understood these things otherwise

  • @Meleeman011
    @Meleeman011 28 วันที่ผ่านมา +1

    i have first hand exp with lstms and they are amazing. i can wait to try or implement this architect a solution with it

  • @tom-et-jerry
    @tom-et-jerry 27 วันที่ผ่านมา

    Very nice music !

  • @Ludifant
    @Ludifant 24 วันที่ผ่านมา

    We have seen time and again, that modeling stuff on what we see in humans helps. Now we are venturing in to macroscopic structures like memory, where we actually have working theories of how people do it, which can give clues to smart architecture. GPT is brute forcing, LSTM is social engineering. It's not like any is guaranteed to get you in, but it is good to have both tools in the toolbelt or both expertises and viewpoints in your team. As the landscape keeps changing, the winds of fashion favour one or the other.. But when motion pictures came out, they didn't stop painting or sculpting. But when talkies came out, silent movies were left behind. We need to get past this obsession with 'better' and see that as long as they differ enough, these are all valid strategies and the choice of tool depends on the nature of problem. It helps if you understand your tools and your problem.

  • @jonatan01i
    @jonatan01i 28 วันที่ผ่านมา +34

    outperforms mamba AND TRANSFORMERS?!?!?!

    • @monad_tcp
      @monad_tcp 28 วันที่ผ่านมา +14

      nothing outperforms Decepticons thou

  • @alkeryn1700
    @alkeryn1700 28 วันที่ผ่านมา +3

    all these cool architecture never actually used on open models.

  • @ccash3290
    @ccash3290 28 วันที่ผ่านมา +5

    Put some clouds in the background or something so people aren't confused by the similar thumbnails to other creators

    • @gualcasas528
      @gualcasas528 28 วันที่ผ่านมา +4

      that is the goal, for people to get confused and click on the video

  • @nathanpotter1334
    @nathanpotter1334 28 วันที่ผ่านมา +2

    Back from 2017

  • @mrrespected5948
    @mrrespected5948 28 วันที่ผ่านมา +1

    Nice

  • @KW-jj9uy
    @KW-jj9uy 28 วันที่ผ่านมา +2

    I wonder if this is good for real time robotics, where you get fast real time data in tiny chunks, and you need a fast model with memory

  • @ZenchantLive
    @ZenchantLive 28 วันที่ผ่านมา

    Love your B Rolls lol

  • @mawungeteye6609
    @mawungeteye6609 28 วันที่ผ่านมา +1

    Up next xlstm-mamba hybrid

  • @AMA14700
    @AMA14700 28 วันที่ผ่านมา +5

    Well at this point companies now should stop training their AIs for some time in order to wit for the best architecture 😂

    • @whannabi
      @whannabi 28 วันที่ผ่านมา

      Reminds me of web development where there's always a new best framework and everything becomes "obsolete" every 2 months

  • @initialsjd5867
    @initialsjd5867 28 วันที่ผ่านมา +3

    I find this all extremely interesting, but am having a hard time finding the right way into understanding these topics, does anyone have a suggestion on where to start?

    • @Detril2000
      @Detril2000 28 วันที่ผ่านมา

      Start by a Neural Networks introduction course. Besides that, currently your only options would be to then study this on your own or enter a Master's program in Computer Science, as all "courses" on LLMs are currently extremely dumbed down and mostly just go over how to type on ChatGPT.

  • @beyse101
    @beyse101 28 วันที่ผ่านมา +1

    I believe I just got Schmiedhubered

  • @mfpears
    @mfpears 25 วันที่ผ่านมา

    I wish I had time to research this stuff myself. This is exactly what I would be researching. Transformers seem limited in some fundamental way to me. I want to see something more recursive and dynamic. But these are impressions only since I haven't had money/time to really dive in yet.

  • @andyizawsome
    @andyizawsome 28 วันที่ผ่านมา +3

    i know some of these words

  • @Alpha_GameDev-wq5cc
    @Alpha_GameDev-wq5cc 27 วันที่ผ่านมา

    LSTMs were all the hype before the Transformers dropped in 2017… Love to see the prodigal son return

  • @agenticmark
    @agenticmark 28 วันที่ผ่านมา

    funny. subbed

  • @Embassy_of_Jupiter
    @Embassy_of_Jupiter 28 วันที่ผ่านมา +1

    I was wondering how you could just cram an tens of thousands of tokens into a vector. But actually LLama-3-8B has 128,256 embedding dims each of them 16bit, so you could actually encode 2^(128,256*16) = 2^2,052,096 or 10^617742 numbers with it. That is kind of a lot.
    If you can figure out a way to most efficiently pack meaning within it, have a lot of space in such a vector.
    I wonder how close we are to the theoretical limit of how much meaning we can cram in a vector.

  • @cdkw2
    @cdkw2 28 วันที่ผ่านมา +1

    With enough copium we sure can 🔥🔥🔥

  • @SamArmstrong-DrSammyD
    @SamArmstrong-DrSammyD 28 วันที่ผ่านมา

    Hell yeah, sigmoid FTW!

  • @butterbee2384
    @butterbee2384 24 วันที่ผ่านมา

    02:40 "beated" lmao

  • @olegpetrov2624
    @olegpetrov2624 28 วันที่ผ่านมา

    Superior meme taste. Thanks bycloud.

  • @catoleg
    @catoleg 27 วันที่ผ่านมา

    This amount of memes per second remind me of the Bad Gear channel

  • @TheGiovany82
    @TheGiovany82 26 วันที่ผ่านมา

    My brain went out of Ram with this one 😂😂😂😂

  • @FaultyTwo
    @FaultyTwo 26 วันที่ผ่านมา

    Ah yes. LSTM: EXTREME EDITION! 🔥🔥🔥
    HIDE YO KIDS, HIDE YO DATA. THIS RNN IS OUT THERE FOR BLOOD 🩸🩸

  • @user-cw7go6kj7c
    @user-cw7go6kj7c 28 วันที่ผ่านมา +1

    bro othe memes got me crying 💀😆

  • @pallharaldsson9015
    @pallharaldsson9015 28 วันที่ผ่านมา

    3:33 the first line there is slightly wrong, former sLSTM should be mLSTM apparently (no harm done, since 0), but more importantly the ratio could I guess be e.g. [2:3] similar to hybrid Mamba/transformer models mixing different number of Mamba layers and transformer/attention layers. It should be interesting to know the optimal [x:y] ratio, and even if it makes sens to mix also more e.g. Mamba into this... As seen at 3;::28 none of the ratios seem optimal. mLSTM has d x d matrix and the d could also be tuned also I suppose.

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 28 วันที่ผ่านมา

    Hopfield NN is the real GOAT

  • @crackwitz
    @crackwitz 28 วันที่ผ่านมา

    Hoch-rei-ter
    Hoooch-reeeiii-teeer

  • @prenomnom2686
    @prenomnom2686 27 วันที่ผ่านมา

    Pleeeeeease talk about YOCO 😢

  • @Kenneth_James
    @Kenneth_James 28 วันที่ผ่านมา

    What happened to the videos that showed examples of where the best AI generated images and video and gave a dummy like me an idea of what could be done with the current 'best'

  • @user-io4sr7vg1v
    @user-io4sr7vg1v 28 วันที่ผ่านมา

    A person really can't 'shill' their own thing. That's not the right word.

  • @anshulsingh8326
    @anshulsingh8326 26 วันที่ผ่านมา +1

    Ok, so how to get started with AI development

    • @BoHorror
      @BoHorror 19 วันที่ผ่านมา

      Try and predict the stock market

  • @fmmmtmm
    @fmmmtmm 28 วันที่ผ่านมา +1

    What's an "architecture"?

    • @drdca8263
      @drdca8263 28 วันที่ผ่านมา

      Structure of the network

  • @Ludifant
    @Ludifant 24 วันที่ผ่านมา

    Hoch-reiter means High Rider... Sooo he might be high on copium..

  • @AntoshaPushkin
    @AntoshaPushkin 27 วันที่ผ่านมา +5

    "xLSTM" sounds so lame. Should have been "2 LS 2 TM" or maybe "LSTM: Tokyo Drift"

  • @MartinDxt
    @MartinDxt 28 วันที่ผ่านมา

    to whoever thought or is thinking the the ai revolution is dead this is just the tip of the iceberg :D

  • @lulboiking5806
    @lulboiking5806 28 วันที่ผ่านมา

    Noice!👍

  • @ucngominh3354
    @ucngominh3354 26 วันที่ผ่านมา

    hi

  • @8jhj345gg
    @8jhj345gg 28 วันที่ผ่านมา

    Fireship clone?

  • @jasonhemphill8525
    @jasonhemphill8525 28 วันที่ผ่านมา

    2nd

  • @Amasglobulaires
    @Amasglobulaires 28 วันที่ผ่านมา

    good video but the music is badly mixed with the music

  • @newyorthtimes4496
    @newyorthtimes4496 28 วันที่ผ่านมา

    the way I filter papers nowadays is through code. If they come with a github repo, I check. If not, just toss to the bin

  • @Pepcen
    @Pepcen 28 วันที่ผ่านมา

    Hmmm I sure wonder where he got the idea for this thumbnail format

  • @jonnylukejs
    @jonnylukejs 28 วันที่ผ่านมา +1

    wtf thats mine i made the block matrix lstm thats lame they made a paper and just claim it

  • @TwiceVisible
    @TwiceVisible 28 วันที่ผ่านมา +1

    Everyone copying the Fireship thumbnail style these days

    • @Adventure1844
      @Adventure1844 28 วันที่ผ่านมา

      The entire social media world is based solely on copy & paste style.

  • @JosephJair97
    @JosephJair97 28 วันที่ผ่านมา

    xd

  • @newbie8051
    @newbie8051 26 วันที่ผ่านมา

    Fucking hate Language Models, they have single handedly shitted on the entire community making everyone focus on chatbots.
    Gone are the days when ppl used to showcase their work on some image-data.
    Meh, ig I just don't like Language Models that much. transformers as an idea is really amazing, images is make intuitive for me, cannot grasp much stuff from "embeddings for words". I'll try xLSTMs as regressors, will definitely make a good projects, thanks for the video buddy 💖💖

  • @JAD3N
    @JAD3N 26 วันที่ผ่านมา

    This guy, copying Fireship thumbnails 😂

  • @andiiacob1627
    @andiiacob1627 19 วันที่ผ่านมา

    That off key minecraft sounding music in the background is really distracting

  • @ScottSummerill
    @ScottSummerill 28 วันที่ผ่านมา

    Not watchable with all the ridiculous movie clips. Leaving.

  • @monstercameron
    @monstercameron 27 วันที่ผ่านมา

    xLSTM BYTE when?