1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

bycloud

มุมมอง 46 763

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 166

@bycloudAI 5 หลายเดือนก่อน ⁺¹⁰
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription!
Like this comment if you wanna see more MoE related content, I have quite a good list for a video;)
@erobusblack4856 5 หลายเดือนก่อน ⁺¹
You should do a video on virtual humans and cognitive AI.Virtual humans. Look at all the nonplayer character technology.We have a red dead redemption and the Sims Though a chat bottom one of those and we have a great virtual human
@PankajDoharey 5 หลายเดือนก่อน
Thanks for linking to all papers in the description.
@pro100gameryt8 5 หลายเดือนก่อน ⁺³⁷¹
Imagine assembling 1 milliont PhD students together to discuss someone's request like "write a poem about cooking eggs with c++". Thats MoE irl
@MrPicklesAndTea 5 หลายเดือนก่อน ⁺²⁰
i'm tellin chatgpt this now.
@tommasocanova4547 5 หลายเดือนก่อน ⁺¹
Enjoy:
In a kitchen lit by screens,
Where code and cuisine intertwine,
A programmer dreams of breakfast scenes,
With a syntax so divine.
Int main() begins the day,
With ingredients lined up neat.
Eggs and spices on display,
Ready for a code-gourmet feat.
int eggs = 2; // Declare the count,
Double click on the pan.
Heat it up, and don’t discount,
Precision’s the plan.
std::cout
@ElaraArale 5 หลายเดือนก่อน ⁺³
hahahahaha LMAO
@igorsawicki4905 5 หลายเดือนก่อน ⁺⁶
AI: Resonable request sir
@zipengli4243 5 หลายเดือนก่อน ⁺²⁴
And MoME is getting 1 million 5th graders to teach a baby to PhD level only on how to write a poem about cooking eggs with c++
@GeoMeridium 5 หลายเดือนก่อน ⁺³¹
It's crazy how Meta's 8B parameter Llama 3 model has nearly the same performance as the original GPT-4 with 1.8T parameters.
That's a 225x reduction in compute in just 2 years.
@sorakagodess 5 หลายเดือนก่อน ⁺⁴⁵
The only thing in my mind is "MoE moe kyuuuuun!!!"
@JuuzouRCS 5 หลายเดือนก่อน ⁺¹
Intentional naming fr.
@gemstone7818 5 หลายเดือนก่อน ⁺¹⁴⁸
to some extent this seems closer to how brains work
@tushargupta9428 5 หลายเดือนก่อน ⁺¹⁰
neurons
@redthunder6183 5 หลายเดือนก่อน ⁺¹⁹
Yeah, kind of like how spiking networks work, but more discrete/blocky and less efficient.
I think this concept should be applied to fundamental MLP, so you can increase model performance with out decreasing speed or RAM usage. The only sacrifice being storage which is easily scalable. IMO this is the future
@reinerzer5870 5 หลายเดือนก่อน ⁺¹
Jeff Hawkins approves this message
@johndoe-j7z 5 หลายเดือนก่อน ⁺⁸
I think this is how almost any informational system works. From molecules to galaxies, there are specialized units that use and process information individually in the system. An agentic expert approach was long forthcoming and is certainly the future of AI. Even individual ants have specialized jobs in the colony.
@tempname8263 5 หลายเดือนก่อน ⁺²
@@johndoe-j7z That's how perceptrons worked right from the start
@randomlettersqzkebkw 5 หลายเดือนก่อน ⁺⁵³
i see what you did there with "catastrophic forgetting" lmao 🤣
@Askejm 5 หลายเดือนก่อน
troll emoji
@AkysChannel 5 หลายเดือนก่อน ⁺⁷
These videos format is GOLD 🏆 such specific and nerdy topics produced as memes 😄
@Quantum_Nebula 5 หลายเดือนก่อน ⁺¹¹
Now I really am excited for a 800B model with fine-grained MoE to surface that I can run on basically any device.
@FuZZbaLLbee 5 หลายเดือนก่อน ⁺⁶
You would still need a lot of storage tough, but that is easier then downloading VRAM 😋
@cdkw2 5 หลายเดือนก่อน ⁺²
I watch you so that I feel smart, it really works!
@lazyalpaca7 5 หลายเดือนก่อน ⁺⁸
3:37 wasn't it just yesterday that they released their model 😭
@KCM25NJL 5 หลายเดือนก่อน ⁺¹⁹
In a very real sense, the MoME concept is similar to diffusion networks. On their own, the tiny expert units are but grains of noise in an ocean of noise..... and the routing itself is the thing being trained. Whether or not it's more efficient than having a monolithic neural net with simpler computation units (neurons)........ I dunno. I suspect like most things ML, there is probably a point of diminishing return.
@shApYT 5 หลายเดือนก่อน ⁺³⁶
Yo dog, I heard you liked AI so we put an AI inside your AI which has an AI in the AI which can AI another AI so that you can AI while you AI.
@mykhailyna1 3 หลายเดือนก่อน ⁺¹
Idk if this was intended just as entertainment, but I used it as education
Like I needed to understand MoE/MMoE on a high level for my research and this video totally helped me. It will be easier to dive deeper into one of the papers now
@farechildd 5 หลายเดือนก่อน
Thank u for linking the papers in the description ❤
@Saphir__ 5 หลายเดือนก่อน ⁺³³
I watch your videos yet I have no idea what you are explaining 99% of the time. 🙃
@bycloudAI 5 หลายเดือนก่อน ⁺¹⁴
I will try better next time 😭
@BeauAD 5 หลายเดือนก่อน ⁺¹⁷
@@bycloudAI Personally I watch your content because you elaborate on academic papers and their relevancy very well. Do hope you continue with content like this. But I can see something like a fireship style code report for LLMs being duigestable.
@johndoe-j7z 5 หลายเดือนก่อน ⁺¹
@@bycloudAI I liked the video, but to their point, it might help to give a brief overview of what things are. I.e. parameters, feed forward, etc., the exact same way you briefly explained what hybridity and redundancy are. This is a good video if you're already familiar with LLMs and how they work but can probably be pretty confusing if you aren't.
@RevanthMatha 5 หลายเดือนก่อน ⁺³
I lost track at 8:24
@simeonnnnn 5 หลายเดือนก่อน ⁺⁴
Damn.. You blew my mind on the 1 million experts and Forever learning thing
@tiagotiagot 5 หลายเดือนก่อน ⁺¹
How far are we from just having a virtual block of solid computronium with inference result simply being the exit points of virtual Lichtenberg figures forming thru it, with most of the volume of the block remaining dark?
@electroflame6188 4 หลายเดือนก่อน ⁺²
it's about the distance between you and the inside of your skull
@marcombo01 5 หลายเดือนก่อน ⁺²
What is the 3D animation around 1:45 ?
@P1XeLIsNotALittleSquare 5 หลายเดือนก่อน
yea, want to know this too
@holycow6935 4 หลายเดือนก่อน
Blender
@keri_gg 5 หลายเดือนก่อน ⁺¹
What resource is this 2:01 seems useful for teaching
@soraygoularssm8669 5 หลายเดือนก่อน
Actually really cool idea, i liked the deep seek meo version too, it's so clever
@npc4416 5 หลายเดือนก่อน ⁺¹
my go to channel to understand ai
@-mwolf 5 หลายเดือนก่อน ⁺²
I'm telling you: Just do it like the brain. Have every expert/node be a router, choosing who to send to.
@-mwolf 5 หลายเดือนก่อน
And, have every node be a RL agent.
@akkilla5166 4 หลายเดือนก่อน
Thank you. I think i understand the impact of moe.
@sabarinathyt 5 หลายเดือนก่อน
00:01 Exploring the concept of fine-grained MoE in AI expertise.
01:35 Mixr has unique Fe-Forward Network blocks in its architecture.
03:11 Sparse MoE method has historical roots and popularity due to successful models.
04:46 Introducing Fine-Grained MoE method for AI model training
06:16 Increasing experts can enhance accuracy and knowledge acquisition
07:52 Efficient expert retrieval mechanism using pure layer technique
09:29 Large number of experts enables lifelong learning and addresses catastrophic forgetting
11:01 Brilliant offers interactive lessons for personal and professional growth
Crafted by Merlin AI.
@PotatoKaboom 5 หลายเดือนก่อน ⁺²
hey where are the 3d visualisations of the transformer blocks from?
@hodlgap348 5 หลายเดือนก่อน ⁺²
What is the source of your 3D transformation layer demonstration???? plz tell me
@Words-. 5 หลายเดือนก่อน ⁺¹
4:13 nice editing here🤣
@norlesh 5 หลายเดือนก่อน
What tool was used for the Transformer visualization starting at 2:01 ?
@Alice_Fumo 5 หลายเดือนก่อน ⁺¹
Today I saw a video about the paper "Exponentially faster Language Modelling" and I feel like the approach is just better than MoE and I wonder why not more work has been done on top of it.. (although I think it's possible thats how GPT-4o mini was made, but who knows)
@NIkolla13 5 หลายเดือนก่อน ⁺¹¹
Mixture of a million experts just sounds like a sarcastic description of Reddit
@Ryu-ix8qs 5 หลายเดือนก่อน
Great Video once again
@jorjiang1 4 หลายเดือนก่อน
how is the visualization in 2:01 made
@Limofeus 5 หลายเดือนก่อน ⁺¹
I'd imagine in a month someone will come with MoE responsible for choosing the best MoE to choose the best MoE out of billions of experts
@redthunder6183 5 หลายเดือนก่อน ⁺⁴
I would love a model with the performance of a 8b model with practical performance like gpt-3.5, but with much smaller active parameters so it can run on anything super lightweight.
@4.0.4 5 หลายเดือนก่อน ⁺⁶
Current 8B beats GPT 3.5 on most metrics, we've come a long way.
@redthunder6183 5 หลายเดือนก่อน
@@4.0.4 yeah, but metrics are not everything, and from my experience, gpt 3.5 still beats out llama 3 8b (or at least 8b quantized) in terms of interpolation/generalization/flexibility, meaning while it can mess up in difficult, specific or confusing tasks, it doesn't get overly lost/confused.
metrics are good at simple well defined one-shot questions, which I'd agree it is better at
@4.0.4 5 หลายเดือนก่อน
@@redthunder6183 remember not to run 8b at q4 (default in ollama for example, but BAD, use q8)
@4.0.4 5 หลายเดือนก่อน
@@redthunder6183 true but make sure you're using 8-bit quant, not 4-bit - it matters for those small LLMs
@mirek190 5 หลายเดือนก่อน
@@redthunder6183 llama 3 8b? That model is so outdated already ...who is even using that ancient model ....
@pathaleyguitar9763 5 หลายเดือนก่อน ⁺¹
Was hoping someone would make a video on this! Thank you! Would love to see you cover Google's new Diffusion Augmented Agents paper.
@j.d.4697 5 หลายเดือนก่อน
I have no idea what you just said but I'm glad they didn't just stubbornly stick to increasing training data and nothing else, like everyone seemed to assume they would. 🙂
@larssy68 5 หลายเดือนก่อน ⁺²
I Like Your Funny Words, Magic Man
@vinniepeterss 5 หลายเดือนก่อน
great video!
@kendingel8941 5 หลายเดือนก่อน
YES!!! NEW BYCLOUD VIDEO!!!
@anren7445 5 หลายเดือนก่อน
Where did you get the clips of attention mechanism visualization from?
@CristianGarcia 5 หลายเดือนก่อน
Thanks! Incredibly useful to keep up.
@ChristophBackhaus 5 หลายเดือนก่อน ⁺⁹
1991... We are standing on the shoulders of giants.
@warsin8641 5 หลายเดือนก่อน ⁺¹
I love these rabit holes!
@Napert 5 หลายเดือนก่อน
if 13B is ~8gb (q4) then why does ollama load the entire 47b (26gb) model into memory?
@renanmonteirobarbosa8129 5 หลายเดือนก่อน
Damn, you finally catching up. You should try Nemo and Megatron-LM they have the best MoE framework
@hightidesed 5 หลายเดือนก่อน
Can you maybe make a video explaining how Llama 3.1 8B is able to have a 128k context window while still fitting in an average computers ram?
@SweetHyunho 5 หลายเดือนก่อน ⁺¹
1:05 Brilliant pays youtubers $20000-50000 per sponsored video!?
@Zonca2 5 หลายเดือนก่อน
Ngl, I wish we got more videos about video generators making anime waifus like in the old days, but it seems like development on that front is slowing down at the moment, hopefully you'll cover any new breakthroughs in the future.
@noiJadisCailleach 5 หลายเดือนก่อน ⁺²
So if these Millions of Experts are cute...
Should we call them...
Moe MoE?
@Filup 5 หลายเดือนก่อน
I did a semester of ML the first half of this year, and I don't understand half of what you post lmao. Do you have any recommended resources to learn from? It is very hard to learn.
@ricardocosta9336 5 หลายเดือนก่อน
Dude! Ty❤
@maxpiau4004 5 หลายเดือนก่อน
wow, top quality video
@MrJul12 5 หลายเดือนก่อน ⁺¹
Can you cover Deepminds recent breakthrough on winning the math olympiad? does that mean RL is the way forward when it comes to reasoning? because as of right now, as far as I know, LLM's cant actually 'reason', they are just guesing the next token, but reasoning does not work like that.
@zergosssss 5 หลายเดือนก่อน ⁺³
5k views after 3h is a shame, you deserve much more, go go go algorithm
@pedrogames443 5 หลายเดือนก่อน ⁺²
Bro is good
@marshallodom1388 5 หลายเดือนก่อน
You lost me when that guy pointed at the gravesite of his brother
@cvs2fan 5 หลายเดือนก่อน
0:42
Undrinkable water my favorite :v
@Originalimoc หลายเดือนก่อน
My feeling of after all these methods we'll eventually back at an essentially singleton model😂😂😂
@rishipandey6219 5 หลายเดือนก่อน
i did'nt understand anything but sounded cool
@npc4416 5 หลายเดือนก่อน ⁺¹
meanwhile meta having no moe
@setop123 5 หลายเดือนก่อน
We might be onto something here... 👀
@nanow1990 5 หลายเดือนก่อน
Peer doesn't scale, I've tried multiple times
@just..someone 5 หลายเดือนก่อน
it should be MMoE ... massive mixture of experts XD
@大支爺 5 หลายเดือนก่อน
Best based language models (multi Languages) + LoRAs is enough.
@RedOneM 5 หลายเดือนก่อน
Seems as if the greatest optimisation for practical AI tech are dynamic mechanisms.
Lifelong memory plus continuous learning would become game changers in the space.
At this rate humanity will be able to leave machines behind, which are able to recall our biological era. At least something is able to carry on our legacy for at least hundreds of thousands of years.
@narpwa 5 หลายเดือนก่อน
what about 1T experts
@cwhyber5631 4 หลายเดือนก่อน
Yes we need more MOM-eis 💀💀
@thedevo01 4 หลายเดือนก่อน
Your thumbnails are a bit too similar to Fireship
@Waffle_6 4 หลายเดือนก่อน
also the entire composition of his videos, a little more than just taking inspiration lol
@VincentKun 5 หลายเดือนก่อน
1 millions beer
@koktszfung 5 หลายเดือนก่อน
more like a mixture of a million toddlers
@MasamuneX 4 หลายเดือนก่อน
wait till the use genetic programing with monty carlo tree search and UTP and other stuff on the router
@imerence6290 5 หลายเดือนก่อน ⁺¹
MoME ? Nah. MOMMY ✅🤤
@picklenickil 5 หลายเดือนก่อน
😂😂😂as a behavioral scientist.. i think this one is going straight to the crapper.. mark my words.😂😂😂
@driedpotatoes 5 หลายเดือนก่อน
too many cooks 🎶
@ickorling7328 5 หลายเดือนก่อน ⁺²
Bro, but did you read about Lory? It merges models with soft merging building on several papers. Lory is new paint on a method developed for vision AI to make soft mergers possible for LLM's. ❤
@ickorling7328 5 หลายเดือนก่อน
Whats really key about Lory is back propagation to update it's own weights, it's fine tuning itself at inference. It's also compatable with transformer or Mamba, or Mamba-2. In addition, it looks like Test Time Training could be used with all these methods for even more context awareness.
@myta1837 2 หลายเดือนก่อน
Bot
@realtimestatic 5 หลายเดือนก่อน ⁺¹
The thing about life long learning really reminds me of our human brains. Basically for every different thought or key combination it sounds like its building a seperate new model with all the required experts for said task. So basically like all relevant neurons we trained working on one thought to solve it, with the possibility of changing and adding new neurons. I can't see it going well if we keep increasing the number of experts forever tho, as the expert picking will become more and more fragmented. I think being able to forget certain thing would probably be useful too.
I'm no scientist but I really do wonder how close this comes to the actual way our brain works.
@Summanis 5 หลายเดือนก่อน
Eat your heart out Limitless, we're making AI smarter by having them use less of their "brain" at a time
@jondo7680 5 หลายเดือนก่อน
I'm a big fan of
@borb5353 4 หลายเดือนก่อน
i was like
schizophrenic AI
but then they went further............
anyway finally they are optimizing instead of making them bigger
@blockshift758 5 หลายเดือนก่อน
Ill call it Moe(moe ehh) instead of em oh ih
@Words-. 5 หลายเดือนก่อน
That million expert strategy sounds super cool. I'm not too knowledgeable, though it does seem to sound like it literally allows for a more liquid neural network by using the attention mechanism to literally pick neurons to be used. I feel like this will be the future of NNs.
@crisolivares7847 5 หลายเดือนก่อน ⁺¹
fireship clone
@OwenIngraham 5 หลายเดือนก่อน
always bet on owens
@revimfadli4666 4 หลายเดือนก่อน
Shared expert isolation seems to be doing something similar to the value output in duelling networks; collecting the gradients for shared information so other subnets only need to account for the small tweaks. This mean the shared infirmation is learned faster, which in turn speeds up the learning of the tweaks
@hglbrg 5 หลายเดือนก่อน
Oh yeah "acidentally" added something to a graph they intended to show. Not just builing hype to inflate the bubble of nothing that is this whole business?
@veyselbatmaz2123 4 หลายเดือนก่อน
Good news: Digitalism is killing capitalism. A novel perspective, first in the world! Where is capitalism going? Digitalism vs. Capitalism: The New Ecumenical World Order: The Dimensions of State in Digitalism by Veysel Batmaz is available for sale on Internet.
@x-mishl 5 หลายเดือนก่อน ⁺¹
why all comments before this one bots???
@OnTheThirdDay 5 หลายเดือนก่อน ⁺¹
It's possible that it's because TH-cam shaddow banned all the real comments.
@cesarsantos854 5 หลายเดือนก่อน
@@OnTheThirdDayBut they can't ban bots...
@OnTheThirdDay 5 หลายเดือนก่อน
@@cesarsantos854 I don't know why bots (and I mean, obvious bots) do not always get banned but half of my comments that I write out myself do.
@IllD. 5 หลายเดือนก่อน
Don't see any bots 3 hours after this comment. Gj TH-cam 👍
@raunakchhatwal5350 5 หลายเดือนก่อน ⁺¹
Honestly I think your old moe video was better.
@OnTheThirdDay 5 หลายเดือนก่อน
I agree. Definitely more understandable and this one would be harder to follow without seeing that first.
@saltyBANDIT 5 หลายเดือนก่อน
Temu fireship…oh I’ll watch it tho.
@OnTheThirdDay 5 หลายเดือนก่อน ⁺²
This channel seems to go into more detail and is more AI focused.
@reaperzreaperz2412 4 หลายเดือนก่อน
Tf are you talking about
@unimposings 4 หลายเดือนก่อน
Dude Wake Up, AI is just a Stupid Buzzword! There is no AI.
@Waffle_6 4 หลายเดือนก่อน
ive made my own transformer model before, as shitty as it was, it sorta worked. i agree that the term “ai” is misleading as its not sentient or anything like that. its just a really fancy autocomplete generator that understands surprisingly abstract and complex connections, relations, and context. but these models are real and arent just a million indians typing you essay for you , you can download models like llama to try it out locally
@UnemployMan396-xd7ov 5 หลายเดือนก่อน
I knew it, your content so mid bro has to redemp it
@大支爺 5 หลายเดือนก่อน
Its useless and waste a lot of resources.
@bluehorizon9547 5 หลายเดือนก่อน ⁺³
Using MoE is an admission of failure. It means that they are unable to make a "smarter" model and have to rely on arbitrary gimmics.
@zrakonthekrakon494 5 หลายเดือนก่อน ⁺¹³
Not really, they are testing if it makes models smarter without having to do much more work
@a_soulspark 5 หลายเดือนก่อน ⁺⁴
I don't see it as a problem. if you think about it, all things in machine learning are just arbitrary gimmicks that happen to work out
@bluehorizon9547 5 หลายเดือนก่อน ⁺²
@@a_soulspark As a human if you understand N new disciplines you become N^2 more powerful because you can apply ideas from one field to any another. This is why you want a monolith not MoE. They chose MoE because they run into the wall, they can't improve the fundamentals so they have to use add-hoc measures just to boost the numbers.
@francisco444 5 หลายเดือนก่อน ⁺⁶
RLHF seems gimmicky but it worked. MoE might seem gimmicky, but it works. Multimodality might seems gimmicky but it works.
@bluehorizon9547 5 หลายเดือนก่อน
@@zrakonthekrakon494 Nobody would even bother with MoE if they hadn't run into the wall. They did.
@jvf890 5 หลายเดือนก่อน
MOE onichan
@BackTiVi 5 หลายเดือนก่อน ⁺⁵
We needed someone to say this, so thank you for sacrificing your dignity for us.
@a_soulspark 5 หลายเดือนก่อน ⁺¹
get ready to call your MOME as well now
@x-mishl 5 หลายเดือนก่อน ⁺¹
bro fell off
@635574 5 หลายเดือนก่อน ⁺⁷
Becauae theres no bot comments after 24m?
@DhrubaPatra1611 5 หลายเดือนก่อน ⁺¹
This channel is nice copy of fireship
@soraygoularssm8669 5 หลายเดือนก่อน
Actually really cool idea, i liked the deep seek meo version too, it's so clever
@soraygoularssm8669 5 หลายเดือนก่อน
Actually really cool idea, i liked the deep seek meo version too, it's so clever

ต่อไป

เล่นอัตโนมัติ

AI can't cross this line and we don't know why.