Bjarne Stroustrup: C++ | Lex Fridman Podcast #48

Errichto Stream, POI 22/1

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 8 : แมนเชสเตอร์ ยูไนเต็ด พบ เบรนท์ฟอร์ด

CAN YOU DO THIS ?

my daisy「 Official MV 」- RISA NARISA

Lecture 32: Unsloth

GPU MODE

มุมมอง 2 192

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 22 ต.ค. 2024

ความคิดเห็น • 18

@danielhanchen วันที่ผ่านมา ⁺⁴
Thanks for inviting me! If anyone has any questions, feel free to comment below or ask on the GPU MODE or Unsloth server!
@robertjalanda วันที่ผ่านมา ⁺⁴
Daniel is such a gem and Unsloth is the best. Would not be able to afford or do proper fine-tuning without unsloth
@danielhanchen วันที่ผ่านมา
:)
@mfc1190 วันที่ผ่านมา ⁺⁶
This dude is awesome.
@danielhanchen วันที่ผ่านมา ⁺²
Thanks!
@waynelau3256 2 วันที่ผ่านมา ⁺³
WAS WAITING FOR THIS THANKS🎉
@nvbkdw วันที่ผ่านมา ⁺²
heros!
@NoorR-ox5im วันที่ผ่านมา ⁺²
Wait around 1:16:00 I thought the question was about expectation as in minibatches working because linearity of expectation? That should be correct as far as I know but this variable input length issue maybe should be looked into wrt minibatches as well! Also it was fairly standard to do full batch training until it became impossible :)
@NoorR-ox5im วันที่ผ่านมา
Also about the next question, perhaps muP resolves some of those concerns?
@danielhanchen วันที่ผ่านมา
Oh I was a bit unsure on the exact question so I thought it was related to the grad accum bug
@TheQu3tzalify วันที่ผ่านมา
Gradient accumulation IS mathematically equivalent to full batch training. My implementations have always returned the same results for both (as everyone should see if they did testing!). The original bug comes from a poor quality implementation of gradient accumulation.
@danielhanchen วันที่ผ่านมา ⁺²
For non sequence models, it's fine. For LLMs where all the sequence lengths are the same, it's fine - both cases can use the generally accepted GA formulas. The blog post we wrote up proved mathematically the old GA formulation was incorrect especially for padded LLM finetuning and pretraining.
This is also not a issue of grad accumulation being implemented poorly in trainers - this problem exists in nearly all trainers that use grad accum.
@TheQu3tzalify 23 ชั่วโมงที่ผ่านมา
@@danielhanchen How do you go from:
L = 1/m_bar * L1 + 1/m_bar * L2 + 1/m_bar * L3 + 1/m_bar * L4 = 1/m_bar * (L1 + L2 + L3 + L4)
to:
L = G * 1/m_bar * (L1 + L2 + L3 + L4) ?
It seems like when you wrote "Let's first set them to the mean length of the entire document to make our calculations easier", it actually hid the problem.
Then in the "Extra - mathematical proof" section what prevents you from having precalculated the proper denominator m1 + m2 + m3 + m4 and then doing (L1 + L2) / sum + (L3 + L4) / sum? Because that's the original and proper formulation of gradient accumulation for padded sequences.
@danielhanchen 22 ชั่วโมงที่ผ่านมา
@@TheQu3tzalify Oh actually you're correct - it's a mistake in the formulation - I forgot to write it's not L1/m + L2/m + L3/m + L4/m, which will get you (L1+L2+L3+L4)/m, but rather we also need the average length L_bar ie we shall just use L ie L/m + L/m + L/m + L/m = 4 * L/m = G * L/m, and so we divide by G in gradient accumulation to get back L/m.
1/n*sum(Li) / 1/n*sum(Mi) will cancel the 1/n = mean(L)/mean(m)
The incorrect version will get G*mean(L)/mean(m), and so we have to divide by G.
So the *G is still there, just I skipped some steps and should have explained better. huggingface.co/docs/accelerate/en/usage_guides/gradient_accumulation has more details on why there's a *G and a division by G to fix it up.
@danielhanchen 21 ชั่วโมงที่ผ่านมา ⁺¹
@@TheQu3tzalify Sorry I skipped some steps - I updated the blog to make it clearer on the first part.
Yes the second point you make is correct - that's what most trainers should do, but nearly all do not do that. Most implementations use torch.nn.CrossEntropyLoss directly, and use the mean reduction, which means you can't pre-derive the denominator (which you have to, as you also mentioned). Instead you set it to the sum, then as you said, derive it manually and divide at the end.
@tomtyiu วันที่ผ่านมา ⁺²
Create we have Vision support?
@danielhanchen วันที่ผ่านมา
Working on it!

ต่อไป

เล่นอัตโนมัติ

Bjarne Stroustrup: C++ | Lex Fridman Podcast #48

Bjarne Stroustrup: C++ | Lex Fridman Podcast #48

Errichto Stream, POI 22/1

Errichto Stream, POI 22/1

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 8 : แมนเชสเตอร์ ยูไนเต็ด พบ เบรนท์ฟอร์ด

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 8 : แมนเชสเตอร์ ยูไนเต็ด พบ เบรนท์ฟอร์ด

CAN YOU DO THIS ?

CAN YOU DO THIS ?

my daisy「 Official MV 」- RISA NARISA

my daisy「 Official MV 」- RISA NARISA

Angry bird PIZZA?

Angry bird PIZZA?

Lecture 31: Beginners Guide to Metal

Lecture 31: Beginners Guide to Metal

Scalable, Robust, and Hardware-aware Speculative Decoding

Scalable, Robust, and Hardware-aware Speculative Decoding

Design Patterns Revisited in Modern Java by Venkat Subramaniam

Design Patterns Revisited in Modern Java by Venkat Subramaniam

GPU MODE IRL 2024 Keynotes

GPU MODE IRL 2024 Keynotes

ICML 2024 Tutorial: Physics of Language Models

ICML 2024 Tutorial: Physics of Language Models

NixOS Setup Guide - Configuration / Home-Manager / Flakes

NixOS Setup Guide - Configuration / Home-Manager / Flakes

Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training

Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training

Simple Code, High Performance

Simple Code, High Performance

Atoms and Light: The Nature of Light, Matter, and Quantum Mechanics

Atoms and Light: The Nature of Light, Matter, and Quantum Mechanics

Insane Reaction Test 🤯

Insane Reaction Test 🤯

这是怎么回事？#shorts #Fairy#fairytales

这是怎么回事？#shorts #Fairy#fairytales

🔴LIVE เชียร์สด : ลิเวอร์พูล พบ เชลซี | หงส์แดงเปิดรังแอนฟิลด์ดวลสิงห์บลู MW8

🔴LIVE เชียร์สด : ลิเวอร์พูล พบ เชลซี | หงส์แดงเปิดรังแอนฟิลด์ดวลสิงห์บลู MW8

Angry bird PIZZA?

Angry bird PIZZA?

ALLY - OH MY! [ REACTION ] 'Amarin Nitibhon' #ALLY_OHMY #ALLY #allynitibhon #แอลลี่

ALLY - OH MY! [ REACTION ] 'Amarin Nitibhon' #ALLY_OHMY #ALLY #allynitibhon #แอลลี่

ซ้ายหรือขวาEP2 #mnjtv

ซ้ายหรือขวาEP2 #mnjtv

the balloon deflated while it was flying #tiktok

the balloon deflated while it was flying #tiktok

เปิดโลกธรรมกับ "แพรรี่" และ "ฅนตื่นธรรม" l EP.1784 l 21 ต.ค.67 l#โหนกระแส

เปิดโลกธรรมกับ "แพรรี่" และ "ฅนตื่นธรรม" l EP.1784 l 21 ต.ค.67 l#โหนกระแส