Mamba, SSMs & S4s Explained in 16 Minutes

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ม.ค. 2024
  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    arxiv.org/abs/2312.00752
    Resources & Figures Used:
    - Beyond Transformers with Mamba (patmcguinness.substack.com/p/...)
    - Mamba: Linear-Time Sequence Modeling with Selective State Spaces (gonzoml.substack.com/p/mamba-...)
    - Mamba: Linear-Time Sequence Modeling with Selective State Spaces - Arxiv Dives (blog.oxen.ai/mamba-linear-tim...)
    - NumByNum :: Mamba - Linear Time Sequence Modeling with Selective State Spaces (Gu et al., 2023) Reviewed ( / numbynum-mamba-linear-... )
    - Decoding Mamba: The Next Big Leap in AI Sequence Modeling ( / decoding-mamba-the-nex... )
    - Attention is not Exactly What you Need. Introducing Mamba! ( / attention-exactly-what... )
    - Practical ML Dive - How to train Mamba for Question Answering ( / practical-ml-dive-how-... )
    - Mamba: Linear-Time Sequence Modeling with Selective State Spaces ( • Mamba: Linear-Time Seq... )

ความคิดเห็น • 11

  • @s8x.
    @s8x. 2 วันที่ผ่านมา

    thanks for this

  • @wafflebutsad
    @wafflebutsad 4 หลายเดือนก่อน +1

    thanks for breaking it down vivian

  • @jeangenest5359
    @jeangenest5359 หลายเดือนก่อน

    Merci! Très instructif et bien présenté.

  • @user-wg6rk5hw8g
    @user-wg6rk5hw8g 4 หลายเดือนก่อน +1

    Thank you for explaining clearly. My goal this year is to create my own chatbot from scratch!

  • @user-mr7dd5ye8e
    @user-mr7dd5ye8e หลายเดือนก่อน

    This is excellent

  • @boi4367
    @boi4367 หลายเดือนก่อน

    What is said about How to Train your HiPPO?

  • @BooleanDisorder
    @BooleanDisorder 3 หลายเดือนก่อน +1

    🤖 beep boop
    I understand this better now. I hope to see more of similar stuff in the future!

  • @doctorshadow2482
    @doctorshadow2482 14 วันที่ผ่านมา

    So, at 11:00 you lost me. I see that S6/Mamba uses selection mechanism; the question is how it works and at least, what's is basics? Otherwise this comparison gives a little benefit. Can you please explain a little bit in depth, what kind of selection used? How does it select?

    • @Patrick-wn6uj
      @Patrick-wn6uj 4 วันที่ผ่านมา

      The addition of the mamba that makes it faster, it moves all the calculation into SRAM of the GPU instead of loading them back into the High bandwidth memory(HBM). because GPU are fast at calculating but slow at moving data around internally. If you check the memory hierarchy of the GPU it will make sense.

    • @doctorshadow2482
      @doctorshadow2482 4 วันที่ผ่านมา

      @@Patrick-wn6uj , absolutely. But you are explaining what is it doing and I am asking to provide a little bit more details on how is it doing this? Namely, what selection does (what it exactly selects and how) and how it is implemented.

  • @eugene-bright
    @eugene-bright 3 หลายเดือนก่อน

    Roko's Basilisk