Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

FlashAttention - Tri Dao | Stanford MLSys #67

路飞不应该去天堂的吧！ #路飞#海贼王

ร้องเพลงสั่งข้าว Ver.โอ้เธอช่าง... - บี้เดอะสกา | Feat @jamsaijs @ramer.official #ร้องเพลงสั่งข้าว

ยางจัดฟัน สีประเทศไทย‼️เหมือนมั้ย?? #jamsai #แจ่มใส #jamsaijs #จัดฟัน

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Stanford MLSys Seminars

มุมมอง 6 131

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 6

@halilibrahimakgun7569 หลายเดือนก่อน
can you share slides ?
@attention42 10 หลายเดือนก่อน ⁺¹
Thank you for sharing this insightful video. In the introduction of Mamba, it says "parellelizable training", can you explain how parallel training is possible in an autoregressive model?
@robertjflynn4206 10 หลายเดือนก่อน
Teacher forcing
@icriou 10 หลายเดือนก่อน
Follow this video and you will have hands on understanding why AR model could be trained in parallel. th-cam.com/video/kCc8FmEb1nY/w-d-xo.html
@matthewnorton2315 10 หลายเดือนก่อน
I think you might be looking for the "selective scan" part of Mamba. In section 3.3.2 of the paper arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf, they say "To avoid the sequential recurrence, we observe that despite not being linear it can still be parallelized with a
work-efficient parallel scan algorithm (Blelloch 1990; Martin and Cundy 2018; Smith, Warrington, and Linderman
2023)". In short, they use a well known parallel algorithm trick to calculate a prefix sum. See en.wikipedia.org/wiki/Prefix_sum#Parallel_algorithms and you'll notice the similarity. Hope this helps!
@ostrov11 4 หลายเดือนก่อน
... какие то откровения ML джуна

ต่อไป

เล่นอัตโนมัติ

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

路飞不应该去天堂的吧！ #路飞#海贼王

路飞不应该去天堂的吧！ #路飞#海贼王

ร้องเพลงสั่งข้าว Ver.โอ้เธอช่าง... - บี้เดอะสกา | Feat @jamsaijs @ramer.official #ร้องเพลงสั่งข้าว

ร้องเพลงสั่งข้าว Ver.โอ้เธอช่าง... - บี้เดอะสกา | Feat @jamsaijs @ramer.official #ร้องเพลงสั่งข้าว

ยางจัดฟัน สีประเทศไทย‼️เหมือนมั้ย?? #jamsai #แจ่มใส #jamsaijs #จัดฟัน

ยางจัดฟัน สีประเทศไทย‼️เหมือนมั้ย?? #jamsai #แจ่มใส #jamsaijs #จัดฟัน

ฟังสดเดอะโกสเรดิโอ 23/11/2567 เรื่องเล่าผีเดอะโกส

ฟังสดเดอะโกสเรดิโอ 23/11/2567 เรื่องเล่าผีเดอะโกส

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

Efficiently Modeling Long Sequences with Structured State Spaces - Albert Gu | Stanford MLSys #46

Efficiently Modeling Long Sequences with Structured State Spaces - Albert Gu | Stanford MLSys #46

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Hardware-Aware Efficient Primitives for Machine Learning - Dan Fu

Hardware-Aware Efficient Primitives for Machine Learning – Dan Fu

Bill Dally | Directions in Deep Learning Hardware

Bill Dally | Directions in Deep Learning Hardware

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

Tri Dao on FlashAttention and sparsity, quantization, and efficient inference

Tri Dao on FlashAttention and sparsity, quantization, and efficient inference

Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU)

Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU)

Angelina Jolie and Brad Pitt’s son Knox makes rare public appearance at Governor Awards

Angelina Jolie and Brad Pitt’s son Knox makes rare public appearance at Governor Awards

เมื่อเพื่อนคิดอะไรแปลกๆกับผม แค่นอนข้างๆเอง|Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

เมื่อเพื่อนคิดอะไรแปลกๆกับผม แค่นอนข้างๆเอง|Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

Part I | 美女们被人贩子绑去，一招声东击西成功自救！

Part I | 美女们被人贩子绑去，一招声东击西成功自救！

ROSÉ - number one girl (official music video)

ROSÉ - number one girl (official music video)

ชาย 68 มาเข้าฝัน ถูกกระบะทับกลางไร่อ้อย | ข่าวเที่ยงช่องวัน | สำนักข่าววันนิวส์

ชาย 68 มาเข้าฝัน ถูกกระบะทับกลางไร่อ้อย | ข่าวเที่ยงช่องวัน | สำนักข่าววันนิวส์

BOWKYLION - วิงวอน (ex-change) [Official MV]

BOWKYLION - วิงวอน (ex-change) [Official MV]

หนังเต็มเรื่อง | ฝ่ามือยูไล | หนังแอคชั่น หนังกำลังภายใน หนังกังฟูจีน | พากย์ไทย HD

หนังเต็มเรื่อง | ฝ่ามือยูไล | หนังแอคชั่น หนังกำลังภายใน หนังกังฟูจีน | พากย์ไทย HD

[#2024MAMA] G-DRAGON - 무제(Untitled, 2014)+POWER+HOME SWEET HOME+뱅뱅뱅+FANTASTIC BABY | Mnet 241123 방송

[#2024MAMA] G-DRAGON - 무제(Untitled, 2014)+POWER+HOME SWEET HOME+뱅뱅뱅+FANTASTIC BABY | Mnet 241123 방송