Forgot to mention, you just stack sLSTM/mLSTM layers similar to a transformer, like usual 😏 The sLSTM uses a transformer-like block and the mLSTM uses a SSM-like block which can be seen in section 2.4.
Is it slow to train like LSTMs and RNNs are? A major benefit from Transformers is faster parallelized training. I would assume xLSTMs would be constrained by their sequential nature.
Yep, should still be slow to train. I don't see any way to make one of the cells into something parallel like a transformer since the cells are so complicated.
constant movement of the screen makes my (and sure many others) head to explode. please move a little less. zoom in and out less. it helps the viewer to focus on the text and your explanation. thanks. :)
Forgot to mention, you just stack sLSTM/mLSTM layers similar to a transformer, like usual 😏
The sLSTM uses a transformer-like block and the mLSTM uses a SSM-like block which can be seen in section 2.4.
Is it slow to train like LSTMs and RNNs are? A major benefit from Transformers is faster parallelized training. I would assume xLSTMs would be constrained by their sequential nature.
Yep, should still be slow to train. I don't see any way to make one of the cells into something parallel like a transformer since the cells are so complicated.
constant movement of the screen makes my (and sure many others) head to explode. please move a little less. zoom in and out less. it helps the viewer to focus on the text and your explanation. thanks. :)
Thanks for the feedback! Will keep this in mind next time I'm recording