Mamba, SSMs & S4s Explained in 16 Minutes
ฝัง
- เผยแพร่เมื่อ 11 ม.ค. 2024
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
arxiv.org/abs/2312.00752
Resources & Figures Used:
- Beyond Transformers with Mamba (patmcguinness.substack.com/p/...)
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (gonzoml.substack.com/p/mamba-...)
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces - Arxiv Dives (blog.oxen.ai/mamba-linear-tim...)
- NumByNum :: Mamba - Linear Time Sequence Modeling with Selective State Spaces (Gu et al., 2023) Reviewed ( / numbynum-mamba-linear-... )
- Decoding Mamba: The Next Big Leap in AI Sequence Modeling ( / decoding-mamba-the-nex... )
- Attention is not Exactly What you Need. Introducing Mamba! ( / attention-exactly-what... )
- Practical ML Dive - How to train Mamba for Question Answering ( / practical-ml-dive-how-... )
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces ( • Mamba: Linear-Time Seq... )
thanks for this
thanks for breaking it down vivian
Merci! Très instructif et bien présenté.
Thank you for explaining clearly. My goal this year is to create my own chatbot from scratch!
This is excellent
What is said about How to Train your HiPPO?
🤖 beep boop
I understand this better now. I hope to see more of similar stuff in the future!
So, at 11:00 you lost me. I see that S6/Mamba uses selection mechanism; the question is how it works and at least, what's is basics? Otherwise this comparison gives a little benefit. Can you please explain a little bit in depth, what kind of selection used? How does it select?
The addition of the mamba that makes it faster, it moves all the calculation into SRAM of the GPU instead of loading them back into the High bandwidth memory(HBM). because GPU are fast at calculating but slow at moving data around internally. If you check the memory hierarchy of the GPU it will make sense.
@@Patrick-wn6uj , absolutely. But you are explaining what is it doing and I am asking to provide a little bit more details on how is it doing this? Namely, what selection does (what it exactly selects and how) and how it is implemented.
Roko's Basilisk