Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation

ExplainingAI

มุมมอง 16 897

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ก.ย. 2024

ความคิดเห็น • 35

@Explaining-AI 9 หลายเดือนก่อน
*Github Code* - github.com/explainingai-code/DDPM-Pytorch
*DDPM Math Explanation Video* - th-cam.com/video/H45lF4sUgiE/w-d-xo.html
@zhuangzhuanghe530 6 หลายเดือนก่อน ⁺¹
I am very thankful for your nice video; it's the best explanation of the diffusion model I have seen!
@Explaining-AI 6 หลายเดือนก่อน
Thank you so much for your encouraging words!
@prathameshdinkar2966 4 หลายเดือนก่อน
Nicely explained! Keep the good work going! 😁
@ShuchengLiu-l7z 11 วันที่ผ่านมา
Unbelievable！
@_RMSG_ หลายเดือนก่อน
This is incredible
@purnavindhya27 5 หลายเดือนก่อน
Hi, amazing explanation! Thanks for all the efforts you put into making the video.
Can you please share the details of the UNet model that you've used (maybe a link to a paper/blog)? Thank you!
@Explaining-AI 5 หลายเดือนก่อน ⁺¹
Thank you for the appreciation! For the UNet model, I just mimicked the architecture from the huggingface Unet2DModel class in diffusers library (huggingface.co/docs/diffusers/en/api/models/unet2d) with minor changes(at what point concatenation and upsampling happens in upblock). The diffusers Unet2DModel class (which itself is based on unet paper arxiv.org/abs/1505.04597 ) and this comment thread (th-cam.com/video/vu6eKteJWew/w-d-xo.html&lc=UgzBFfe4anyDf4txEZx4AaABAg) should give you all the necessary information regarding the Unet Model. Do let me know if that ends up not being the case.
@takihasan8310 6 หลายเดือนก่อน
Thank you so much for the video. It was amazing and your video explained many things that I couldn't understand anywhere. Though I have a question regarding the up channels. You have given down channels as [32, 64, 128, 256]. As per your code the channels for the first upsample will be (256, 64) but after concatenating from the last down layer the number of channels for the first convolution of the resnet layer should be 128 + 256 = 384 but as per your code it is 256. The same thing will happen for each upblock. In second case 128 + 64 should be the in channels but as per your code 128, and the third upsample layer should have in channels 64 + 32 = 96 but as per your code it is 64. I think there is little miscalculation.
@Explaining-AI 6 หลายเดือนก่อน ⁺¹
Hello, according to the code the first down layer to be concatenated is not the last down layer but the second last down layer. Its a bit easier to explain with a diagram so can you take a look at the below text representing whats happening and let me know if you have any issues still.
Downblocks Upblocks
32 ---------------------------64->16
|down |upsample(&concat)
64 ------------------128->32
|down |upsample(&concat)
128------------256->64
|down |upsample(&concat)
256----256---128
@takihasan8310 6 หลายเดือนก่อน
@Explaining-AI Sorry, my mistake. I got it. You are saving the feature tensors before passing it through the down block hence the math works out if we consider that. But isn't normally we concatenate the feature tensor obtained after passing through the downblock? in my brief experience with unets I have seen that normally. That's why I thought there is mistake.
@Explaining-AI 6 หลายเดือนก่อน
@@takihasan8310 yes you are right. That way is indeed closer to the "official unet" implementation. After spending limited amount of time on this, I found this way enabled me to write simpler code. So went with this only. And as long as the network has layers of downsampling followed by layers of upsampling together with concatenation of downblock feature maps, I would say it still qualifies as a unet per say. But yes, definitely not the official paper's unet implementation.
@ahmetberkegokmen2342 2 หลายเดือนก่อน
Amazing.
@efstathiasoufleri6881 5 หลายเดือนก่อน
Thank you so much!
@binyaminramati3010 9 หลายเดือนก่อน ⁺¹
Hi there, thanks for the video, may I ask a question: to my understanding, the multi-headed attention first applies 3 ff networks for key, query, and value, and in this model, you applied multiheaded attention on the image where channels play as sequence length and flattened image plays as the token_length that should mean that the query network for example should be a Linear(token_length/4,token_length/4) which means its parameter count should be (token_length*token_length/16) = ((h*w)**2)/16 which is huge, or am I wrong?
@Explaining-AI 9 หลายเดือนก่อน
Thank you! @binyaminramati3010
So the channel dimension here is the embedding dimension and the H*W is the sequence length.
If you notice before attention, we do a transpose this is to make the channel dimension as the embedding dimension.
Assuming the feature map is 128x7x7 (CxHxW) and lets assume we only have one head.
So that means we have a sequence of 49 tokens(feature map cells) each of 128 dimensions.
Q/K/V will be 128*128
(QKT) attention weights will be 49x49
Weighted Values will 49x128
So no huge computation as such required right? Or am I not understanding your question correctly ?
@binyaminramati3010 9 หลายเดือนก่อน
@@Explaining-AIThank you, I missed the transpose. and again, applause for the impressive content👏
@muhammadawais2173 9 หลายเดือนก่อน
very well explained. what changes would we need to make if we used our own dataset? specifically greyscale
@Explaining-AI 9 หลายเดือนก่อน
Thank you. Have replied on github regarding this.
@muhammadawais2173 9 หลายเดือนก่อน
yeah it was me@@Explaining-AI
@muhammadawais2173 8 หลายเดือนก่อน
hi Sir, i would like to request you kindly make changing in the stable diffusion model repository regarding size of the images because this repository is not supporting high image size and required very high GPU memory like for 256 size images its required almost 200Gb which is high cost effective. also if possible include few evaluation metrics for quantitative analysis between the original and the generated images. waiting for the next video!
@Explaining-AI 8 หลายเดือนก่อน ⁺¹
Hi @muhammadawais2173, I will next start working on the Stable diffusion video but unfortunately it would take me a month to get it up with code and video. Sorry but its going to take that long given my other works. In case you are really blocked because of this might I suggest using the hugging face diffusers library . They will anyway have much more efficient implementation than me :)
@muhammadawais2173 8 หลายเดือนก่อน
@@Explaining-AI Thank you so much. I will go through it. Infect, i already went through many diffusion model implementation but you explained very well and an easiest way also your model give satisfactory results as compared to others.
@colder4163 2 หลายเดือนก่อน
Amazing explaination. But i have a question that i want to train on my custom rgb data with the shapr 128x128 or 256x256, buy i always gave the results of outofmemory, but the training params is inly about 10m params. Can you help with that?
@colder4163 2 หลายเดือนก่อน
Moreover, i set the config params that the batchsize is 1, and i trained on the gpu t4
@Explaining-AI 2 หลายเดือนก่อน
@@colder4163 Its most likely because of image size. Can you try with 64x64 . Have responded on what changes need to be made for this here - github.com/explainingai-code/DDPM-Pytorch/issues/1#issuecomment-2236651773
@colder4163 2 หลายเดือนก่อน
@@Explaining-AI oh i see, thank you so much
@colder4163 2 หลายเดือนก่อน
@@Explaining-AI if i want to restore the blured image like motion blur of exposure blur, what should i do, could you give me some advises?
@paramthakkar4658 6 หลายเดือนก่อน
I am getting a Cuda out of memory error when used on my own dataset. The dataset consists of .npy files
@Explaining-AI 6 หลายเดือนก่อน
Hello, If you have already tried reducing the batch size and are still getting this error, could you take a look at this github.com/explainingai-code/DDPM-Pytorch/issues/1 specifically this comment - github.com/explainingai-code/DDPM-Pytorch/issues/1#issuecomment-1862244458 and see if that helps getting rid of the out of memory error.
@xdhanav5449 7 หลายเดือนก่อน
Thanks for the very informative video! I am having trouble with using my own dataset in this. I'm doing this on a macbook in google colab. Currently, I have mounted my drive to the colab and pulled in my dataset from my drive, through the default.yaml. However, I am getting an error, saying that num_samples should be positive, and not 0. I am not sure what you mean by "Put the image files in a folder created within the repo root (example: data/images/*.png ).". What is this repo root and where can I find it? Is it local on my computer? Could you help with this? Thank you in advance!
@Explaining-AI 7 หลายเดือนก่อน
You are welcome! So the path in config can either be the relative path from the "DDPM-Pytorch" directory or the absolute path. So currently the config assumes inside DDPM-Pytorch directory there would be a data/images folder which will have all image files.
@takihasan8310 6 หลายเดือนก่อน
@Explaining-AI
Sorry to bother you but I don't know why but whenever I am training on any dataset, I tried mnist, cifar10 etc but mse loss is always nan. Is this expected, I checked my transformation. It is correct, first transforms.ToTensor(), and transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]). All the losses are nan values, will the model learn anything meaningful?
@Explaining-AI 6 หลายเดือนก่อน
Were you able to get rid if this issue? Is it possible for you send me a link to your repo in case you have changed any part of code or parameters of training.
@PoojaSharma-ms5jf 5 หลายเดือนก่อน
Amazing.

ต่อไป

เล่นอัตโนมัติ

Stable Diffusion from Scratch in PyTorch | Unconditional Latent Diffusion Models