MAMBA AI (S6): Better than Transformers?

Discover AI

มุมมอง 33 819

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 ก.ย. 2024
MAMBA (S6) stands for a simplified neural network architecture that integrates selective state space models (SSMs) for sequence modelling. It's designed to be a more efficient and powerful alternative to Transformer models (like current LLMs, VLMs, ..) , particularly for long sequences. It is an evolution on classical S4 models.
By making the SSM parameters input-dependent, MAMBA can selectively focus on relevant information in a sequence, enhancing its modelling capability.
Does it have the potential to disrupt the transformer architecture, that almost all AI systems currently are based upon?
#aieducation
#insights
#newtechnology

ความคิดเห็น • 42

@mjp152 9 หลายเดือนก่อน ⁺²³
Another interesting architecture is the Tolman-Eichenbaum Machine which is inspired by the hippocampus and lends some interesting abilities to infer latent relationships in the data.
@mike-q2f4f 9 หลายเดือนก่อน ⁺⁸
It's clear transformers can be improved. Excited to see this proposal play out. Thanks for the update!
@_tnk_ 8 หลายเดือนก่อน ⁺²
First video i’ve watched from you and very impressed! Looking forward to watching more
@sadface7457 9 หลายเดือนก่อน ⁺²⁴
The way you say hello community is a ray of sunshine 🌞 😊
@code4AI 9 หลายเดือนก่อน ⁺⁵
Big smile.
@planorama.design 8 หลายเดือนก่อน
That's the truth! Always love the enthusiastic hellos!
@davidreagan1287 8 หลายเดือนก่อน ⁺¹
Best way to learn
@StephenRayner 9 หลายเดือนก่อน ⁺³¹
Just as they start etching the transformer architecture onto silicon ha!
@jumpstar9000 8 หลายเดือนก่อน
that also made me chuckle
@vinvin8971 8 หลายเดือนก่อน
Just a Bullshit...
@lizardy2867 9 หลายเดือนก่อน ⁺⁴
One of the problems I face when trying to implement simple models which utilize a latent space, is the volatility of their input and output sizes. Never should a model require truncation, nor should a model allow inaccuracies. How for example, shall you model a compression algorithm (encode-decode) for any and all data that can exist? You are required to make the latent space before the model, effectively becoming part of the preprocessing step.
This is of course, expected and within reason.
I am one to think the solution to this problem is one which would uppend most of the field.
@kevon217 9 หลายเดือนก่อน ⁺²
Excellent high level overview.
@laurinwagner8127 9 หลายเดือนก่อน ⁺⁵
The GPT family of models are a decoder-only architecture which is not covered by the patent.
@code4AI 9 หลายเดือนก่อน
GPT (Generative Pretrained Transformer)
@therainman7777 9 หลายเดือนก่อน ⁺⁴
@@code4AIYes, GPT models are transformers. But they are not the type of transformer architecture covered by Google’s patent. Google’s patent is for the original encoder-decoder architecture only. GPT models are decoder-only, which is a different type of architecture.
@planorama.design 8 หลายเดือนก่อน ⁺²
Great coverage, and thanks once again. One issue I am grappling with is attention, which is managed at "run-time" (i.e. inference) on the prompt for transformers, where Mamba seems to capture this concept entirely during training. No need for an attention matrix, as with transformers. Very long context windows, improved access to early information from the stream, and faster performance. Love all this.
My concern / reasoning: Removing the "run-time" attention at inference means we're using statistical understandings of language from training. For prompts that are quite varied from the training set data, can Mamba LLMs excel at activities that aim for creativity and brainstorming?
Also seems to me that training Mamba LLMs on multiple languages may degrade predictability in any one language since the "attention" (conceptually) is calculated at training time. But I am still pondering this; certainly may be wrong as I wrap my head around it!
@code4AI 8 หลายเดือนก่อน ⁺³
Like your q. I am struggling to find benchmarks on the ICL performance of Mamba like systems. Also actual performance data in direct comparison w/ current generation LLMs are missing. And some authors hint, that given the few-shot examples ability might be associated with the self-attention mechanism itself? That would pose some serious limitations to State Space systems, linear RNN and alike, if I loose the ability to inject new data and info in my prompt and the system understands the new semantic config and its semantic correlations (eg for reasoning).
But I trust the open source community to come up with advanced solutions ....
@davidreagan1287 8 หลายเดือนก่อน
Conceptually, this is brilliant: Savoir-Faire for Accuracy and Precision.
However, a deeper understanding of non-Matrix mathematics and challenges of serial hardware engineering would be greatly appreciated.
@javiergimenezmoya86 9 หลายเดือนก่อน ⁺¹
My intuition: Transformers for capture very linked concepts and words in each chapter and its summarization and mamba for union and interconexión of all sumarized ideas (no linked words but link group of very disperse ,distributed among chapters, ideas )
@kevon217 8 หลายเดือนก่อน
That sounds like a cool combo
@shekhinah5985 5 หลายเดือนก่อน
What's stored in the real space if not the position? Isn't the example phase space storing an even bigger vector because it now doesn't only store the position of the center of the mass but also the velocity?
@EkShunya 9 หลายเดือนก่อน ⁺²
can you make more content for state space
@letsplaionline 8 หลายเดือนก่อน
Thank you!
@juancarlospizarromendez3954 8 หลายเดือนก่อน
I do think that an artificial brain does plug to many engines StepByStep as the arithmetic calculator, the logical reasoner, the theorem prover, etc becoming it to a cyborg-like.
@renanmonteirobarbosa8129 9 หลายเดือนก่อน ⁺²
I appreciate your attempt at simplifying and introducing how state spaces are used in a very particular application at dynamical systems. However I am afraid you are missing quite a lot and, perhaps, confused about the mathematics.
@TiaguinhouGFX 9 หลายเดือนก่อน ⁺¹
como assim?
@dhnguyen68 9 หลายเดือนก่อน ⁺⁴
Please further elaborate your claim.
@code4AI 8 หลายเดือนก่อน ⁺²
Your comment made it into my next video on BEYOND MAMBA (th-cam.com/video/C2fFL8pVX2M/w-d-xo.html) and provided a beautiful transition from the origin of State Space mathematics to overcome the limitations of current S4 and S6 State Space Model. Hope the new video explains your mistake and that you learned that interdisciplinary (from Physics to Statistics and time series) is something beautiful. Thanks for your comment.
@oryxchannel 8 หลายเดือนก่อน
I’m a new fan.
@oguzhanylmaz4586 9 หลายเดือนก่อน ⁺¹
hi ,I am developing offline chatbot with RAG. Should I use Llama 7b as the llm model? Or should I choose the Zephyer 7B model? It needs to work locally without internet.
@qwertydump4720 9 หลายเดือนก่อน ⁺⁷
I don't know details of your project but I had the best experience with dolphin-mistral 7b 2.2.1
@oguzhanylmaz4586 9 หลายเดือนก่อน
@@qwertydump4720 I am doing an offline chatbot as a graduation project. So I may need a lot of information about the model I'm going to use.
@davidjohnston4240 8 หลายเดือนก่อน
I came to see a new better means of AC voltage conversion. I was disappointed.
@qwertasd7 9 หลายเดือนก่อน
Interesting
(and also all replies here, there doesnt seam to be a place anymore where thinkers can exchange ideas).
Do you know of a model using this concept (to try out in lm studio or in jupiter notebook ?).
Personally i think they way LLM's work/are trained is not the way to go.
To many useless facts inside them, for fact they should just use a callout to wiki pedia or other sites.
LLM's 'world domain', should be language, no politics, no famous people, but theoretical skills, translations, medicine, law, math, physics coding, etc. Not who was Trump or JF kennedy or Madona. Those gigs should be removed.
@remsee1608 9 หลายเดือนก่อน ⁺⁵
This not good for that startup that is building transformer chips
@thechatgpttutor 9 หลายเดือนก่อน
exactly what I thought
@redone823 9 หลายเดือนก่อน
Rip startup. Died before birth.
@planorama.design 8 หลายเดือนก่อน
I think the jury is still out. There isn't enough real world usage [yet] to say how well the Mamba arch really performs against Transformers. Over the past couple years, we (at large) been able to evaluate a variety of business use cases for Transformer-based LLMs. We have no idea how the Mamba arch will compare in those same use cases.
@csmrfx 9 หลายเดือนก่อน
Just the use of term "AI" implies zero real understanding. It doesn't really mean anything, its "marketing speak".
@nabereon 8 หลายเดือนก่อน ⁺¹
Can you please elaborate?
@acasualviewer5861 8 หลายเดือนก่อน ⁺¹
AI has a very specific meaning in Computer Science. But it probably isn't what the general public thinks of it.

ต่อไป

เล่นอัตโนมัติ

How to START with AI: Real-time DATA / RAG!