Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

xLSTM: Extended Long Short-Term Memory

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

หมาผมมันเปลี่ยนไป #minecraft #shorts #มายคราฟ #fyp

[UNCUT] The Loyal Pin ปิ่นภักดิ์ EP.15 (2/4)

MISS GRAND SONGKHLA 2025 | FINAL SHOW

xLSTM: Extended Long Short-Term Memory

Gabriel Mongaras

มุมมอง 2 016

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 5

@gabrielmongaras 5 หลายเดือนก่อน ⁺¹
Forgot to mention, you just stack sLSTM/mLSTM layers similar to a transformer, like usual 😏
The sLSTM uses a transformer-like block and the mLSTM uses a SSM-like block which can be seen in section 2.4.
@acasualviewer5861 5 หลายเดือนก่อน
Is it slow to train like LSTMs and RNNs are? A major benefit from Transformers is faster parallelized training. I would assume xLSTMs would be constrained by their sequential nature.
@gabrielmongaras 5 หลายเดือนก่อน
Yep, should still be slow to train. I don't see any way to make one of the cells into something parallel like a transformer since the cells are so complicated.
@-slt 5 หลายเดือนก่อน ⁺¹
constant movement of the screen makes my (and sure many others) head to explode. please move a little less. zoom in and out less. it helps the viewer to focus on the text and your explanation. thanks. :)
@gabrielmongaras 5 หลายเดือนก่อน
Thanks for the feedback! Will keep this in mind next time I'm recording

ต่อไป

เล่นอัตโนมัติ

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

xLSTM: Extended Long Short-Term Memory

xLSTM: Extended Long Short-Term Memory

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

หมาผมมันเปลี่ยนไป #minecraft #shorts #มายคราฟ #fyp

หมาผมมันเปลี่ยนไป #minecraft #shorts #มายคราฟ #fyp

[UNCUT] The Loyal Pin ปิ่นภักดิ์ EP.15 (2/4)

[UNCUT] The Loyal Pin ปิ่นภักดิ์ EP.15 (2/4)

MISS GRAND SONGKHLA 2025 | FINAL SHOW

MISS GRAND SONGKHLA 2025 | FINAL SHOW

No more ice cream mess 😫 #parenting #cleaning #diy #hacks #useful #parentingtips

No more ice cream mess 😫 #parenting #cleaning #diy #hacks #useful #parentingtips

LSTM Networks: Explained Step by Step!

LSTM Networks: Explained Step by Step!

Scaling Rectified Flow Transformers for High Resolution Image Synthesis v2Stability AI 2024

Scaling Rectified Flow Transformers for High Resolution Image Synthesis v2Stability AI 2024

Why 4d geometry makes me sad

Why 4d geometry makes me sad

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

WARP: On the Benefits of Weight Averaged Rewarded Policies

WARP: On the Benefits of Weight Averaged Rewarded Policies

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

The Strange Physics Principle That Shapes Reality

The Strange Physics Principle That Shapes Reality

TIPS & TRICKS - How to Reshape Input Data for Long Short-Term Memory (LSTM) Networks in Tensorflow

TIPS & TRICKS - How to Reshape Input Data for Long Short-Term Memory (LSTM) Networks in Tensorflow

CoPE - Contextual Position Encoding: Learning to Count What's Important

CoPE - Contextual Position Encoding: Learning to Count What's Important

RoV : ความแรงของฮุค Grakk #rov #theped #เดอะเป็ด #shorts

RoV : ความแรงของฮุค Grakk #rov #theped #เดอะเป็ด #shorts

แจ๊สสร้างตำนานอีกแล้ว 😂😂 #แจ๊สชวนชื่น #แจ๊สแจง #แจงปุณณาสา #ก็มาดิครับ #ตลก #shorts

แจ๊สสร้างตำนานอีกแล้ว 😂😂 #แจ๊สชวนชื่น #แจ๊สแจง #แจงปุณณาสา #ก็มาดิครับ #ตลก #shorts

น้องๆ พี่เจอผีหมูเด้ง!!

น้องๆ พี่เจอผีหมูเด้ง!!

🔴Live สด! PUBG GLOBAL SERIES 6 | FINAL STAGE DAY 3

🔴Live สด! PUBG GLOBAL SERIES 6 | FINAL STAGE DAY 3

[Full] 4 ต่อ 4 Celebrity EP.922 | 10 พ.ย. 67 | one31

[Full] 4 ต่อ 4 Celebrity EP.922 | 10 พ.ย. 67 | one31

"ทนายเดชา" ท้า "สนธิ" ลั่นแฉมาแฉกลับ | ข่าวเย็นช่องวัน | สำนักข่าววันนิวส์

"ทนายเดชา" ท้า "สนธิ" ลั่นแฉมาแฉกลับ | ข่าวเย็นช่องวัน | สำนักข่าววันนิวส์

จารย์❌ จาน✅ #ตลก #บ้านกูเอง

จารย์❌ จาน✅ #ตลก #บ้านกูเอง

ครูบาช่วยต้ายัตด้วย - แกล้งเดย์ ให้หลับเเล้วโกนหนวด เอาขนหมออ้อยแปะทำหนวด

ครูบาช่วยต้ายัตด้วย - แกล้งเดย์ ให้หลับเเล้วโกนหนวด เอาขนหมออ้อยแปะทำหนวด