xLSTM: Extended Long Short-Term Memory

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 5

  • @gabrielmongaras
    @gabrielmongaras  5 หลายเดือนก่อน +1

    Forgot to mention, you just stack sLSTM/mLSTM layers similar to a transformer, like usual 😏
    The sLSTM uses a transformer-like block and the mLSTM uses a SSM-like block which can be seen in section 2.4.

  • @acasualviewer5861
    @acasualviewer5861 5 หลายเดือนก่อน

    Is it slow to train like LSTMs and RNNs are? A major benefit from Transformers is faster parallelized training. I would assume xLSTMs would be constrained by their sequential nature.

    • @gabrielmongaras
      @gabrielmongaras  5 หลายเดือนก่อน

      Yep, should still be slow to train. I don't see any way to make one of the cells into something parallel like a transformer since the cells are so complicated.

  • @-slt
    @-slt 5 หลายเดือนก่อน +1

    constant movement of the screen makes my (and sure many others) head to explode. please move a little less. zoom in and out less. it helps the viewer to focus on the text and your explanation. thanks. :)

    • @gabrielmongaras
      @gabrielmongaras  5 หลายเดือนก่อน

      Thanks for the feedback! Will keep this in mind next time I'm recording