Mamba: Linear-Time Sequence Modeling with Selective State Spaces

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 14

  • @gabrielmongaras
    @gabrielmongaras  11 หลายเดือนก่อน +10

    I forgot to mention that this model is trained like a normal transformer and since everything is causal, you should be able to train using the efficient parallel technique that the transformer uses, a single forward pass for an entire sequence of data.

  • @berkk1993
    @berkk1993 11 หลายเดือนก่อน +6

    I just opened your channel to ask your for Mamba video and here I see this video. You are awesome dude, I can express how much you contribute to my life. Thank you many times!!!

  • @Anonn724
    @Anonn724 11 หลายเดือนก่อน +4

    Please don't stop with this videos. They are extremely useful to go through with you. Much love

  • @orrimoch5226
    @orrimoch5226 10 หลายเดือนก่อน +1

    Wow Gabrial, Great job!
    I like your calm attitude and simple way of explaining this complex subject!
    As electrical engineer and as a data scientist I highly appreciate your content!

  • @AM-yk5yd
    @AM-yk5yd 11 หลายเดือนก่อน +8

    19:50 I think A is DxN because they use diagonal matrix. They mention S4D, and that paper has example of also linear initialization: "A = -0.5 + 1j * np.pi * np.arange(N//2) # S4D-Lin initialization". It's structured after all.

  • @marshallmcluhan33
    @marshallmcluhan33 11 หลายเดือนก่อน +3

    Thanks for the vid. I Can't wait to see if it's overhyped or not hehe. TriDao knows his attention mechanisms.

  • @MatterExplained
    @MatterExplained 11 หลายเดือนก่อน +3

    thx for doing this paper, was a bit lost on state space models

    • @acasualviewer5861
      @acasualviewer5861 11 หลายเดือนก่อน +1

      I was a bit lost.. now I'm more lost. ;)

    • @MatterExplained
      @MatterExplained 11 หลายเดือนก่อน

      @@acasualviewer5861haha, i did watch some lectures by the first author tho

  • @grimsk
    @grimsk 10 หลายเดือนก่อน +1

    점점 물리학의 개념들에 가까워지는 기분이.. 🙂

  • @ml-ok3xq
    @ml-ok3xq 11 หลายเดือนก่อน +2

    I think it's independent because you can diagonalize the state transition matrix and then each value only interacts with itself.

  • @saculzemog
    @saculzemog 10 หลายเดือนก่อน +3

    shouldn't 24:28 A,B, and C be LxN not LxD ?

  • @yccui
    @yccui 7 หลายเดือนก่อน

    If all the matrices are learnable, I wonder why the authors use the HiPPO matrix to initialize A? What's the point?

    • @gabrielmongaras
      @gabrielmongaras  7 หลายเดือนก่อน

      I was actually wrong about the HiPPO "A" matrix being learnable. I think this matrix is actually static, which makes sense as it adds some basic structure to the model.