Diffusion Models - Live Coding Tutorial

dtransposed

มุมมอง 20 709

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 มิ.ย. 2024
This is my live (to the most extent) coding video, where I implement from a scratch a diffusion model that generates 32 x 32 RGB images. The tutorial assumes a basic knowledge of deep learning and Python.
Links:
- The Jupiter notebook built in this video: github.com/dtransposed/code_v...
- My website: dtransposed.github.io
- My Twitter: / dtransposed
Sources:
- Lil' Log - What are Diffusion Models: lilianweng.github.io/posts/20...
- Understanding Diffusion Models: A Unified Perspective: arxiv.org/abs/2208.11970
- Denoising Diffusion Probabilistic Models: arxiv.org/abs/2006.11239
Timestamps:
0:00 Introduction
0:32 Theoretical background
13:13 Live Coding - Forward diffusion
41:29 Live Coding - Training loop
1:00:05 - Live Coding - Overfitting one batch
1:03:36 - Live Coding - Reverse diffusion
1:13:40 - Live Coding - Training on CIFAR - 10 dataset
1:17:24 - Live Coding - Result evaluation
1:19:40 - (Bonus) Quick explanation of the UNet architecture used in the tutorial
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 35

@outroutono4937 26 วันที่ผ่านมา ⁺¹
I have looked almost every video on this subject and this is by far the best approach, it's simple enough to be well understood but it gives all the tools to built more advanced models. I wish you could do a remake of this one because sometimes the code snippet is out of frame and sometimes its hard to read because of the font size. Thx a lot for this upload!
@dtransposed79 24 วันที่ผ่านมา
Thank you!
@danielfirebanks4973 10 หลายเดือนก่อน ⁺¹³
Good tutorial, just wished that we could see the screen while you're coding, as most of the new lines you added were off-screen :/ Keep it up!
@adeolaogunleye7965 ปีที่แล้ว ⁺²
Thanks man, I really appreciate your work
@bbbaaa9421 ปีที่แล้ว
Thanks for sharing your work with us, Appreciate!,
@777chichi 3 หลายเดือนก่อน
Thanks a lot! I really appreciate.
This tutorial explain clearly. Awesome!
Hope to see more tutorial vedios on your youtube channel, thanks.
@heera_ai ปีที่แล้ว ⁺¹
A great tutorial to start with!!!
@kanakraj3198 11 หลายเดือนก่อน ⁺¹
Great tutorial. Thanks for sharing.
Please make slightly advanced tutorials, like Conditonal (Image or Text) Generation of Images using Diffusion.
I see that there are very few advanced tutorials by any TH-camr.
@VitaliyHAN ปีที่แล้ว ⁺⁴
Better font, but still can’t read not only the phone, that is main content consuming device, but even on my 13 inch MacBook. God bless I have 55 inch tv I can watch on. Even with such struggles I will continue to watch such a diamond video!
Thanks for video! Great content!
@dtransposed79 ปีที่แล้ว
Thank you for your comment!
@dontitube1394 ปีที่แล้ว
@@dtransposed79 One more tip, @ 34:36 and sometime so on, i cannot read the code you were writing. I mean litterly it is not in the video, but very informativ video.
@dtransposed79 ปีที่แล้ว
@@dontitube1394 Yeah. I think I will not be changing it now. A bit of a hiccup, but you can always look the code up in the attached notebook. Sorry for that tho.
@dontitube1394 ปีที่แล้ว ⁺¹
@@dtransposed79 yeah no worries, it was more ment as a tip for future videos
@TD_Dev 6 หลายเดือนก่อน
Thanks a lot for your tutorial!
@paneercheeseparatha 9 หลายเดือนก่อน ⁺¹
You should have zoomed in the screen more so that its visible properly. Still appreciate your efforts! Nice vid.
@thepresistence5935 ปีที่แล้ว ⁺¹
Awesome one!
@aleksandrrybnikov8701 ปีที่แล้ว
Hi, Damian! Nice videos!
@dtransposed79 ปีที่แล้ว
Thanks for dropping by Sasha!
@duyquangnguyen2664 6 หลายเดือนก่อน
Thank for this video. Can you make video about apply high resolution for this project ?
@anshumansinha5874 ปีที่แล้ว
Hi, thanks for the video. But can you explain the part on how you introduce the positional encoding to the network? Also, can this model work for a feed forward neural network rather than a U-net ?
@dtransposed79 ปีที่แล้ว ⁺¹
Positional encodings in this paper directly mimic those introduced in the "Attention Is All You Need" paper. There are plenty of resources online that explain how that works.
In terms of the architecture, in theory, you could probably use any encoder-decoder architecture I think. But for images, UNet is the most fitting.
@user-jf6li8mn3l ปีที่แล้ว ⁺¹
Thanks for tutorial.
Why posterior_variance_t = betas_t? Shouldn't it be equal to betas_t*(1 - alphas_cumprod_t_minus_1)/(1 - alphas_cumprod_t) according [Lil' Log]?
@dtransposed79 ปีที่แล้ว
Excellent question. Please refer to the original paper: arxiv.org/abs/2006.11239 Section 3.2. The short version: those two are the extreme values that we can set the posterior to. The choice will depend on the assumptions on x_0. My choice assumes that x_0 is sampled from Gaussian ~ N(0,1), while the other choice is optimal for x_0 deterministically set to one point.
@user-jf6li8mn3l ปีที่แล้ว
@@dtransposed79 Yes, it's clear now. Thanks for the detailed answer.
@nqvst ปีที่แล้ว ⁺¹
Coming from a programming background, I always find it very strange to name variables by generic Greek letters or just X, Y. I am not criticizing your video specifically, it is a pattern that is very wide spread. But for example, you are naming the first parameter to the forward_diffusion function "x0". is it to save space? is it because you think it is easier to reference it from the mathematical formulas?
In my mind it would be much more clear if "x0" would be named "image". or am I misunderstanding your explanation maybe.
As I mentioned, I don't think your video is bad. I'm just curious as to why it is so common that code related to machine learning is generally so generically named.
@dtransposed79 ปีที่แล้ว ⁺¹
Interesting comment. I agree - some people indeed use more "mathematical" names, and others use more generic ones. Using the "mathematical" names comes from the fact that many of the ML code you can find online implements a logic showcased in a research paper. Since ML research borrows from the mathematical notation, the it is often convenient for the code to use the same notation, as long as they have the same context (read the paper, understand what the symbols mean). If you are confused, I would advice you to read the paper and even if you are confused by any concept, just try to grasp the high level meaning of the symbols. This would definitely help you with reading (and writing) your ML code in the future!
@nqvst ปีที่แล้ว
@@dtransposed79 Thanks for the reply! Yes it makes sens. if you understand the concept from reading the equations, it is more convenient to reuse the notation in the code.
And while following along this video i realized that some of the variable names gets really long if they are to be considered "good" variable names.
betas -> noise_amount
alphas -> preserved_image_data
alpha_hat_t -> cumulative_preserved_image_data_at_step
i think I'm just frustrated over not being fluent in the math language.
anyways, thanks for the video!
@brunokemmer ปีที่แล้ว ⁺¹
Is there a difference between `result = alpha_hat.gather(-1, t)` and `result = alpha_hat[t]` ?
@dontitube1394 ปีที่แล้ว ⁺¹
No there is not, atleast for this kind of case. But for more information you can look at the documantation of torch.gather, which even states the equivalant indexing of arrays.
@dtransposed79 ปีที่แล้ว
@@dontitube1394 Yeah., that's right. Nevertheless, I suggest learning and using torch.gather, It is a really useful, powerful and efficient function.
@chiscoduran9517 ปีที่แล้ว
Can you make a Image to Image tutorial?
@dtransposed79 ปีที่แล้ว
Could you be more concrete? Image-to-Image can mean multiple things.
@chiscoduran9517 ปีที่แล้ว
@@dtransposed79 for example a model capable of change colors to certain objects in an image, where input is an image and put is the same image with changes.
@playmaker2404 8 หลายเดือนก่อน
can u say why output was not as fascinating and what can be done from here to make output clearer @dtransposed79
@user-ut2xu8eb7c 11 หลายเดือนก่อน
Thanks man, I really appreciate your work

ต่อไป

เล่นอัตโนมัติ

Why Does Diffusion Work Better than Auto-Regression?