xLSTM: The Sequel To The Legendary LSTM
ฝัง
- เผยแพร่เมื่อ 28 พ.ค. 2024
- Join the waitlist now for exclusive access to the OnDemand platform: on-demand.io/contact
The LSTM sequel by Sepp Hochreiter is finally here, surpassing its own self by 300x and even Mamba architectures. This innovative architecture blends Scalar and Matrix LSTMs, offering a great performance increase in benchmarks and language modeling tasks. This could be big??
Check out my newsletter mail.bycloud.ai/
LSTM (1997)
[Paper] deeplearning.cs.cmu.edu/F23/d...
xLSTM (2024)
[Paper] arxiv.org/abs/2405.04517
[unofficial resource collection] github.com/AI-Guru/xlstm-reso...
This video is supported by the kind Patrons & TH-cam Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - noon
[Profile & Banner Art] / pygm7
[Video Editor] Silas - วิทยาศาสตร์และเทคโนโลยี
Join the waitlist now for exclusive access to the OnDemand platform: on-demand.io/contact
let me know if you like these kinds of research breakdown too!
my newsletter: mail.bycloud.ai/
I'm glad we have someone to translate research language into memes and references for us
Just need the soundboard and temple run in the corner and we skibidi
@@animalnt Yeah like what is this guy's background lol he sounds like a college kid not a PhD person.
@@Wagner-uv6yp "not a PhD person" 💀
Good thing I'm 15. Perfect channel for me
Other than bycloud and fireship (and maybe 2 minute papers and yannic kilcher), is there another?
I heard people in the RWKV and EleutherAI discord complain that they used wrong hyperparameters for some other architectures, while they used the most optimal hyperparameters for xLSTM. So the results are not entirely honest and they try to hype up their own architecture, but what else is new...
I was a tiny bit suspicious when xLSTM didn't publish codes, but damn okay
When money ruins research... But hey, it's been a thing since research exists.
The context extension comparison is practically wrong too, and highly misleading.
If you train transformer for a small context length without any extra extension tricks it will perform really badly for large context, by design. Comparing to that is simply not useful. It is simply misleading for most readers/viewers. There are context extension tricks you can use which avoid perplexity explosion for the transformers, those would chance the graph very significantly.
Also when you actually look at the graph at 3:54 in detail you can see how the presented combination of both [1, 1] is worse than either of the previously existing methods in almost all metrics.
(> complex topic = > memes / s) = AWESOME!
I am a simple man, I see King Baldwin, I read AI, I click thumbnail😂
Kingdom of Heaven, my love
Every video of yours is like my own Christmas! Keep it up.
So many memes per second, my brain can't process everything.
This video was so good what the hell
Thank you for delivering me this complex info in an easily consumable way
Subscribed
This OnDemand looks like quite the hoot!
Editing is amazing lol. Nice job, bycloud
The newsletter is fire though bro, thanks for that
Interesting, thanks for the le vid
Really like the way you make these videos, i couldn't have understood these things otherwise
i have first hand exp with lstms and they are amazing. i can wait to try or implement this architect a solution with it
Very nice music !
We have seen time and again, that modeling stuff on what we see in humans helps. Now we are venturing in to macroscopic structures like memory, where we actually have working theories of how people do it, which can give clues to smart architecture. GPT is brute forcing, LSTM is social engineering. It's not like any is guaranteed to get you in, but it is good to have both tools in the toolbelt or both expertises and viewpoints in your team. As the landscape keeps changing, the winds of fashion favour one or the other.. But when motion pictures came out, they didn't stop painting or sculpting. But when talkies came out, silent movies were left behind. We need to get past this obsession with 'better' and see that as long as they differ enough, these are all valid strategies and the choice of tool depends on the nature of problem. It helps if you understand your tools and your problem.
outperforms mamba AND TRANSFORMERS?!?!?!
nothing outperforms Decepticons thou
all these cool architecture never actually used on open models.
Put some clouds in the background or something so people aren't confused by the similar thumbnails to other creators
that is the goal, for people to get confused and click on the video
Back from 2017
Nice
I wonder if this is good for real time robotics, where you get fast real time data in tiny chunks, and you need a fast model with memory
Love your B Rolls lol
Up next xlstm-mamba hybrid
Well at this point companies now should stop training their AIs for some time in order to wit for the best architecture 😂
Reminds me of web development where there's always a new best framework and everything becomes "obsolete" every 2 months
I find this all extremely interesting, but am having a hard time finding the right way into understanding these topics, does anyone have a suggestion on where to start?
Start by a Neural Networks introduction course. Besides that, currently your only options would be to then study this on your own or enter a Master's program in Computer Science, as all "courses" on LLMs are currently extremely dumbed down and mostly just go over how to type on ChatGPT.
I believe I just got Schmiedhubered
I wish I had time to research this stuff myself. This is exactly what I would be researching. Transformers seem limited in some fundamental way to me. I want to see something more recursive and dynamic. But these are impressions only since I haven't had money/time to really dive in yet.
i know some of these words
LSTMs were all the hype before the Transformers dropped in 2017… Love to see the prodigal son return
funny. subbed
I was wondering how you could just cram an tens of thousands of tokens into a vector. But actually LLama-3-8B has 128,256 embedding dims each of them 16bit, so you could actually encode 2^(128,256*16) = 2^2,052,096 or 10^617742 numbers with it. That is kind of a lot.
If you can figure out a way to most efficiently pack meaning within it, have a lot of space in such a vector.
I wonder how close we are to the theoretical limit of how much meaning we can cram in a vector.
With enough copium we sure can 🔥🔥🔥
Hell yeah, sigmoid FTW!
02:40 "beated" lmao
Superior meme taste. Thanks bycloud.
This amount of memes per second remind me of the Bad Gear channel
My brain went out of Ram with this one 😂😂😂😂
Ah yes. LSTM: EXTREME EDITION! 🔥🔥🔥
HIDE YO KIDS, HIDE YO DATA. THIS RNN IS OUT THERE FOR BLOOD 🩸🩸
bro othe memes got me crying 💀😆
3:33 the first line there is slightly wrong, former sLSTM should be mLSTM apparently (no harm done, since 0), but more importantly the ratio could I guess be e.g. [2:3] similar to hybrid Mamba/transformer models mixing different number of Mamba layers and transformer/attention layers. It should be interesting to know the optimal [x:y] ratio, and even if it makes sens to mix also more e.g. Mamba into this... As seen at 3;::28 none of the ratios seem optimal. mLSTM has d x d matrix and the d could also be tuned also I suppose.
Hopfield NN is the real GOAT
Hoch-rei-ter
Hoooch-reeeiii-teeer
Pleeeeeease talk about YOCO 😢
What happened to the videos that showed examples of where the best AI generated images and video and gave a dummy like me an idea of what could be done with the current 'best'
A person really can't 'shill' their own thing. That's not the right word.
Ok, so how to get started with AI development
Try and predict the stock market
What's an "architecture"?
Structure of the network
Hoch-reiter means High Rider... Sooo he might be high on copium..
"xLSTM" sounds so lame. Should have been "2 LS 2 TM" or maybe "LSTM: Tokyo Drift"
to whoever thought or is thinking the the ai revolution is dead this is just the tip of the iceberg :D
Noice!👍
hi
Fireship clone?
2nd
good video but the music is badly mixed with the music
the way I filter papers nowadays is through code. If they come with a github repo, I check. If not, just toss to the bin
Hmmm I sure wonder where he got the idea for this thumbnail format
wtf thats mine i made the block matrix lstm thats lame they made a paper and just claim it
Everyone copying the Fireship thumbnail style these days
The entire social media world is based solely on copy & paste style.
xd
Fucking hate Language Models, they have single handedly shitted on the entire community making everyone focus on chatbots.
Gone are the days when ppl used to showcase their work on some image-data.
Meh, ig I just don't like Language Models that much. transformers as an idea is really amazing, images is make intuitive for me, cannot grasp much stuff from "embeddings for words". I'll try xLSTMs as regressors, will definitely make a good projects, thanks for the video buddy 💖💖
This guy, copying Fireship thumbnails 😂
That off key minecraft sounding music in the background is really distracting
Not watchable with all the ridiculous movie clips. Leaving.
xLSTM BYTE when?