Why Does Diffusion Work Better than Auto-Regression?

AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss

How to train a model to generate image embeddings from scratch

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

The Wrong Batch Size Will Ruin Your Model

Underfitted

มุมมอง 20 435

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ม.ค. 2025

ความคิดเห็น • 34

@ErlendDavidson 2 ปีที่แล้ว ⁺²⁵
If you scale the batch size by the learning rate (i.e. lr=(batch_size/32.)*0.01) then the stochastic gradient descent looks sort of okay here.
@underfitted 2 ปีที่แล้ว
Interesting :)
@jasdeepsinghgrover2470 2 ปีที่แล้ว ⁺²
I completely agree ... Because the number of updates happening depend on batch size and even the size of the update. So if the learning rate is scaled according to batch size linearly the model can perform very well even with much smaller batches.
@agenticmark 7 วันที่ผ่านมา
For me batch size and grad accum make the most difference along with lr scaling
@OliverHennhoefer 2 ปีที่แล้ว ⁺⁴
Really like the videos. However, I want to warn against the general statement that a batch size of one is not recommended. It really depends on the problem/data. So don't simply dismiss stochastic gradient descent, try it!
@underfitted 2 ปีที่แล้ว ⁺²
I think that’s fair. I’ve never used it in any of the problems I’ve worked on, but you are right.
@Metryk ปีที่แล้ว ⁺²
Hi! Maybe you can help me with this one: if I want to test an already pre-trained image classifier, how do I proceed regarding the amount of images used? The set containing test images has 100k images, I guess it wouldn't make any sense to load them all at once, so how do I proceed? Thanks!
@LucasTheTopG1 21 วันที่ผ่านมา
could you just load some at a time. Like the first 50, then while that is processing you make a request to get the next 50 while discarding the first 50. Then just repeat? assuming you can get them via a request.
@lakeguy65616 2 ปีที่แล้ว ⁺³
so, what is the optimal batch size?
@underfitted 2 ปีที่แล้ว ⁺¹
It depends. Start with 32 and experiment from there.
@lakeguy65616 2 ปีที่แล้ว ⁺¹
@@underfitted Does the amount of main memory Ram or GPU ram make a difference? (great videos!)
@underfitted 2 ปีที่แล้ว ⁺²
It does! Your batch has to fit in memory, or it won't work. When you are working with images, for example, you'll quickly find that your batch size can't be too large if you want to fit it in the GPU's memory.
@ErlendDavidson 2 ปีที่แล้ว ⁺⁵
What do you think of (artificially) adding noise to the learning rate. I feel like it used to be more popular to do that, but almost never see it these days.
@underfitted 2 ปีที่แล้ว ⁺²
Yeah… never seen that honestly. I’ve used schedules to decrease the learning rate over time, but never read about adding noise to it.
@edmundfreeman7203 2 ปีที่แล้ว ⁺²
This is the kind of thing that I hate about deep learning. A single parameter in the optimization method can completely change the results. Batches should be small but not too small. How small? That's for heuristics but will change on different data sets.
@johnmoustakas8897 2 ปีที่แล้ว ⁺²
Good work, hope your channel gets more attention
@underfitted 2 ปีที่แล้ว
Thanks, John! It takes time and work but I’ll make it happen.
@Agrover112 2 ปีที่แล้ว ⁺²
Hey love this video! Was losing touch of the basics !
@underfitted 2 ปีที่แล้ว ⁺¹
Glad it was helpful!
@axelanderson2030 2 ปีที่แล้ว
If you generate a dummy dataset and set a static learning rate, then smaller batch sizes work better? wtf?
@OmarBoukchana ปีที่แล้ว
i didnt see a helpful video like this one in the entire internet, thank you ♥
@underfitted ปีที่แล้ว
Glad it was helpful!
@Levy957 2 ปีที่แล้ว ⁺¹
Amazing!!
Did u know why the batch size os always 32, 64, 128?
@underfitted 2 ปีที่แล้ว ⁺²
I read somewhere about the ability to fit batches in a GPU... can't remember where exactly. That being said, I've seen experiments that show that it really doesn't matter much (if at all.)
@MrAleksander59 2 ปีที่แล้ว ⁺¹
It's better for memory usage. GPU, CPU, hard drives, SSD and other in the current 2-bit logic uses memory blocks with sizes of power 2. 2^5 = 32, 2^6=64, 2^7=128 etc. You always want maximum usage of memory. For example you have array with floats, each float will take 32 bits. So, at least it divisible by 32.
@Darkraak 3 หลายเดือนก่อน
Great video man 👏
@muhammadtalmeez3276 2 ปีที่แล้ว
Your videos are amazing. Thank you so much for this great knowledge and beautiful videos.
@underfitted 2 ปีที่แล้ว ⁺¹
Glad you like them!
@ziquaftynny9285 2 ปีที่แล้ว
I love your presentation style! Very energetic :)
@underfitted 2 ปีที่แล้ว
Thanks
@akshay0072 8 หลายเดือนก่อน ⁺¹
Good content. Try improving ur way of teaching. Learning should in relaxed tone
@underfitted 8 หลายเดือนก่อน ⁺¹
Thanks! This was an old video. I’ve tried to improve in the latest few.
@michaelsprinzl9045 9 หลายเดือนก่อน ⁺¹
A new cat video. Cute.
@sarahpeterson2702 ปีที่แล้ว
the question is whether if you use a batch and reach the global minimum is your model functionally equivalent to one that didn't batch? Are the weights identical... no they aren't . if your model is generative you don't have equivalence with batch/non batch.

ต่อไป

เล่นอัตโนมัติ

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss

AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss

How to train a model to generate image embeddings from scratch

How to train a model to generate image embeddings from scratch

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts

หนูขอไปด้วย #แม่สุซูกัส #ตลก #shorts

How might LLMs store facts | DL7

How might LLMs store facts | DL7

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Epoch, Batch, Batch Size, & Iterations

Epoch, Batch, Batch Size, & Iterations

How to fine-tune a model using LoRA (step by step)

How to fine-tune a model using LoRA (step by step)

Should You Stop Splitting Your Data Like This?

Should You Stop Splitting Your Data Like This?

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

154 - Understanding the training and validation loss curves

154 - Understanding the training and validation loss curves

All Machine Learning algorithms explained in 17 min

All Machine Learning algorithms explained in 17 min

Why are vector databases so FAST?

Why are vector databases so FAST?

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

#เดอะตุ๊ก !! เจาะเดือด ทีมชาติ ผ่าฟอร์ม !! ทีมชาติไทย มันส์ เปิด สาเหตุ !! ระบบ+แท็คติก

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ

ผู้หญิงแต่งงานกับขอทาน แต่กลับถูกดูหมิ่น ในที่สุดชายขเทานก็เผยตัวตย#ละครหวานๆ#ชอบ