Turing-NLG, DeepSpeed and the ZeRO optimizer

Microsoft DeepSpeed introduction at KAUST

The moment we stopped understanding AI [AlexNet]

One Bangkok 7 ปีแห่งการรอคอย อาณาจักรใหญ่ของสิริวัฒนภักดี มูลค่า 1.2 แสนล้าน น่าสนใจยังไง ?

เพื่อนหาย!! ตามหาท่านเพียวสุดขอบโลก!! (SPDชิวๆ)

เพื่อนผมทำสิ่งที่สยองมากๆ กลัวแล้ววว!.. | Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

DeepSpeed: All the tricks to scale to gigantic models

Mark Saroufim

มุมมอง 20 056

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 พ.ย. 2024

ความคิดเห็น • 19

@Emily-p8e5q 11 หลายเดือนก่อน ⁺¹
Thanks mark!. You have been helping me understand concepts better.
@mekaneeky ปีที่แล้ว ⁺³
Thanks Mark! Quite a thorough and useful explanation.
@darrenbrien 3 ปีที่แล้ว ⁺⁵
Thanks Mark great vid. Good update on SOTA in distributed training since horovod
@sandraviknander7898 3 ปีที่แล้ว ⁺³
If you just add a pair of aviator sunglasses then this is a Yannic Kilcher video. Instant 100k sub upgrade.
Jokes aside, this was a great explanation of a great library!
@randolphzeng6051 ปีที่แล้ว ⁺²
Thanks for such an inspiring and insightful video. What a knowledge feast to enjoy !
@saratbhargavachinni5544 ปีที่แล้ว ⁺¹
Great Video Mark! A few corrections, A100 is available in 40 GB and 80 GB variants.
@adriangabriel3219 2 ปีที่แล้ว ⁺³
Hi Mark, great vid. Could you make a video on how to fine-tune large transformer models (e.g. T5 B-11) without running into CUDA errors?
@marksaroufim 2 ปีที่แล้ว ⁺⁴
Great suggestion! Yes I’ll do it
@adriangabriel3219 2 ปีที่แล้ว ⁺¹
@@marksaroufim great! There is a lot information about fine-tuning T-5 base , but not about fine-tuning models above T-5 base
@JordanArsenaultYT ปีที่แล้ว
@@adriangabriel3219 Did you ever get t5-11b working?
@vini8123 หลายเดือนก่อน
I tried to train a model that has embedding layer having vocab size of 100 million and embedding dim 128 on a 3 A100 80GiB Gpus with deepspeed (zero stage 3, offloading parameters and optimizers to cpu) but it fails with cuda Out of memory error 😢
@limitlesslife7536 ปีที่แล้ว
amazing!
@user-wp8yx ปีที่แล้ว
Nice explanation, but how to do in ooba?
@Georgesbarsukov ปีที่แล้ว
You're looking at RAM, not vRAM btw.
@AndersOland ปีที่แล้ว
A 2080ti with 30 gigs? 🤭 If only my 4090 had that much RAM 😅
@juliusvalentinas หลายเดือนก่อน
A100 gpu is 30k usd, is this offloading all theoretical nonsense? Where is apps that allow to run actual llama 3.1 on one or two 3090? Offloading non used stuff on nvme ssd?

ต่อไป

เล่นอัตโนมัติ

Turing-NLG, DeepSpeed and the ZeRO optimizer

Turing-NLG, DeepSpeed and the ZeRO optimizer

Microsoft DeepSpeed introduction at KAUST

Microsoft DeepSpeed introduction at KAUST

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

One Bangkok 7 ปีแห่งการรอคอย อาณาจักรใหญ่ของสิริวัฒนภักดี มูลค่า 1.2 แสนล้าน น่าสนใจยังไง ?

One Bangkok 7 ปีแห่งการรอคอย อาณาจักรใหญ่ของสิริวัฒนภักดี มูลค่า 1.2 แสนล้าน น่าสนใจยังไง ?

เพื่อนหาย!! ตามหาท่านเพียวสุดขอบโลก!! (SPDชิวๆ)

เพื่อนหาย!! ตามหาท่านเพียวสุดขอบโลก!! (SPDชิวๆ)

เพื่อนผมทำสิ่งที่สยองมากๆ กลัวแล้ววว!.. | Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

เพื่อนผมทำสิ่งที่สยองมากๆ กลัวแล้ววว!.. | Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

ไฮไลท์การแข่งขัน แมนเชสเตอร์ ยูไนเต็ด 5-2 เลสเตอร์ ซิตี้ | รอบ 16 ทีม | คาราบาว คัพ 2024-25

ไฮไลท์การแข่งขัน แมนเชสเตอร์ ยูไนเต็ด 5-2 เลสเตอร์ ซิตี้ | รอบ 16 ทีม | คาราบาว คัพ 2024-25

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

Stephen Wolfram - Where the Computational Paradigm Leads (in Physics, Tech, AI, Biology, Math, ...)

Stephen Wolfram - Where the Computational Paradigm Leads (in Physics, Tech, AI, Biology, Math, ...)

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

Why Isn't Functional Programming the Norm? - Richard Feldman

Why Isn't Functional Programming the Norm? – Richard Feldman

Diffusion models from scratch in PyTorch

Diffusion models from scratch in PyTorch

Faster than Rust and C++: the PERFECT hash table

Faster than Rust and C++: the PERFECT hash table

$How on Earth does ^.?$|^(..+?)\1+$ produce primes?$ 18:37 $How on Earth does ^.?$|^(..+?)\1+$ produce primes?$

The Complete Machine Learning Roadmap [2024]

The Complete Machine Learning Roadmap [2024]

So You Think You Know Git - FOSDEM 2024

So You Think You Know Git - FOSDEM 2024

แมพที่เข้าแล้วออกไม่ได้ #roblox #irongaming

แมพที่เข้าแล้วออกไม่ได้ #roblox #irongaming

เธอเดินทางผ่านกาลเวลาและอวกาศและกลายเป็นลูกสาวของราชินีโดยบังเอิญ เธอมีสามีห้าคน

เธอเดินทางผ่านกาลเวลาและอวกาศและกลายเป็นลูกสาวของราชินีโดยบังเอิญ เธอมีสามีห้าคน

🔴LIVE เชียร์สด : ลิเวอร์พูล พบ ไบรท์ตัน | หงส์แดงเปิดรังแอนฟิลด์รับนกนางนวล MW10

🔴LIVE เชียร์สด : ลิเวอร์พูล พบ ไบรท์ตัน | หงส์แดงเปิดรังแอนฟิลด์รับนกนางนวล MW10

Trick-or-Treating in a Rush. Part 2

Trick-or-Treating in a Rush. Part 2

⭐ฮิตในTikTok!! ( หมูเด้ง MooDeng ) Ver. แดนซ์โจ๊ะๆ ReMix BY [ ดีเจกิต รีมิกซ์ ]

⭐ฮิตในTikTok!! ( หมูเด้ง MooDeng ) Ver. แดนซ์โจ๊ะๆ ReMix BY [ ดีเจกิต รีมิกซ์ ]

amazing#devil #lilith #funny #shorts

amazing#devil #lilith #funny #shorts

When u fight over the armrest

When u fight over the armrest

ทั้งหมด จั๊ดแถว

ทั้งหมด จั๊ดแถว