Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 มิ.ย. 2024
  • In this video I get into Denoising Diffusion Probabilistic Models implementation ( DDPM ) and walk through the complete Denoising Diffusion Probabilistic Models code in pytorch.
    I give a quick overview of math behind diffusion models before getting into DDPM implementation.
    I cover the denoising diffusion probabilistic models pytorch implementation in 4 parts:
    1. Noise scheduler in ddpm - coding forward and reverse process of ddpm in pytorch
    2. Model architecture for denoising diffusion probabilistic models - Unet
    3. Implementing the unet which can be used in any diffusion models code
    4. Training and sampling code of ddpm
    5. Results of training ddpm
    Timestamps:
    00:00 Intro
    00:30 Denoising Diffusion Probabilistic Models Math Review
    03:15 Noise Scheduler for DDPM
    04:30 Noise Scheduler Pytorch Code for DDPM
    07:10 Denoising Diffusion Probabilistic Models Architecture
    08:10 Time embedding Block for DDPM Implementation
    08:54 Overview of Unet Architecture for DDPM
    09:49 Downblock of DDPM Unet
    11:34 Midblock and Upblock for DDPM Unet
    12:40 Code for Positional Embedding in DDPM in Pytorch
    14:07 Code for Downblock in DDPM Unet
    16:42 Code for Mid and Upblock in DDPM Unet
    18:53 Unet class for DDPM
    22:04 Code for Diffusion Model training
    22:47 Code for Sampling in Denoising Diffusion Probabilistic Model
    23:24 Configurable Code
    24:15 Dataset for training
    24:56 Results after DDPM training
    25:42 Thank you
    📄 Code Repository:
    Access the full implementation, along with detailed comments and explanations from GitHub repository - github.com/explainingai-code/.... Feel free to explore, experiment, and adapt the code to suit your specific needs.
    🔔 Subscribe :
    tinyurl.com/exai-channel-link
    Background Track - Fruits of Life by Jimena Contreras
    Email - explainingai.official@gmail.com
    🔗 Related Tags:
    #DDPM #DiffusionModels #DDPMImplementation #GenerativeAI

ความคิดเห็น • 27

  • @Explaining-AI
    @Explaining-AI  6 หลายเดือนก่อน

    *Github Code* - github.com/explainingai-code/DDPM-Pytorch
    *DDPM Math Explanation Video* - th-cam.com/video/H45lF4sUgiE/w-d-xo.html

  • @prathameshdinkar2966
    @prathameshdinkar2966 26 วันที่ผ่านมา

    Nicely explained! Keep the good work going! 😁

  • @zhuangzhuanghe530
    @zhuangzhuanghe530 3 หลายเดือนก่อน +1

    I am very thankful for your nice video; it's the best explanation of the diffusion model I have seen!

    • @Explaining-AI
      @Explaining-AI  3 หลายเดือนก่อน

      Thank you so much for your encouraging words!

  • @efstathiasoufleri6881
    @efstathiasoufleri6881 2 หลายเดือนก่อน

    Thank you so much!

  • @PoojaSharma-ms5jf
    @PoojaSharma-ms5jf 2 หลายเดือนก่อน

    Amazing.

  • @purnavindhya27
    @purnavindhya27 2 หลายเดือนก่อน

    Hi, amazing explanation! Thanks for all the efforts you put into making the video.
    Can you please share the details of the UNet model that you've used (maybe a link to a paper/blog)? Thank you!

    • @Explaining-AI
      @Explaining-AI  2 หลายเดือนก่อน +1

      Thank you for the appreciation! For the UNet model, I just mimicked the architecture from the huggingface Unet2DModel class in diffusers library (huggingface.co/docs/diffusers/en/api/models/unet2d) with minor changes(at what point concatenation and upsampling happens in upblock). The diffusers Unet2DModel class (which itself is based on unet paper arxiv.org/abs/1505.04597 ) and this comment thread (th-cam.com/video/vu6eKteJWew/w-d-xo.html&lc=UgzBFfe4anyDf4txEZx4AaABAg) should give you all the necessary information regarding the Unet Model. Do let me know if that ends up not being the case.

  • @muhammadawais2173
    @muhammadawais2173 6 หลายเดือนก่อน

    very well explained. what changes would we need to make if we used our own dataset? specifically greyscale

    • @Explaining-AI
      @Explaining-AI  6 หลายเดือนก่อน

      Thank you. Have replied on github regarding this.

    • @muhammadawais2173
      @muhammadawais2173 6 หลายเดือนก่อน

      yeah it was me@@Explaining-AI

  • @binyaminramati3010
    @binyaminramati3010 6 หลายเดือนก่อน +1

    Hi there, thanks for the video, may I ask a question: to my understanding, the multi-headed attention first applies 3 ff networks for key, query, and value, and in this model, you applied multiheaded attention on the image where channels play as sequence length and flattened image plays as the token_length that should mean that the query network for example should be a Linear(token_length/4,token_length/4) which means its parameter count should be (token_length*token_length/16) = ((h*w)**2)/16 which is huge, or am I wrong?

    • @Explaining-AI
      @Explaining-AI  6 หลายเดือนก่อน

      Thank you! @binyaminramati3010
      So the channel dimension here is the embedding dimension and the H*W is the sequence length.
      If you notice before attention, we do a transpose this is to make the channel dimension as the embedding dimension.
      Assuming the feature map is 128x7x7 (CxHxW) and lets assume we only have one head.
      So that means we have a sequence of 49 tokens(feature map cells) each of 128 dimensions.
      Q/K/V will be 128*128
      (QKT) attention weights will be 49x49
      Weighted Values will 49x128
      So no huge computation as such required right? Or am I not understanding your question correctly ?

    • @binyaminramati3010
      @binyaminramati3010 6 หลายเดือนก่อน

      @@Explaining-AIThank you, I missed the transpose. and again, applause for the impressive content👏

  • @takihasan8310
    @takihasan8310 3 หลายเดือนก่อน

    Thank you so much for the video. It was amazing and your video explained many things that I couldn't understand anywhere. Though I have a question regarding the up channels. You have given down channels as [32, 64, 128, 256]. As per your code the channels for the first upsample will be (256, 64) but after concatenating from the last down layer the number of channels for the first convolution of the resnet layer should be 128 + 256 = 384 but as per your code it is 256. The same thing will happen for each upblock. In second case 128 + 64 should be the in channels but as per your code 128, and the third upsample layer should have in channels 64 + 32 = 96 but as per your code it is 64. I think there is little miscalculation.

    • @Explaining-AI
      @Explaining-AI  3 หลายเดือนก่อน +1

      Hello, according to the code the first down layer to be concatenated is not the last down layer but the second last down layer. Its a bit easier to explain with a diagram so can you take a look at the below text representing whats happening and let me know if you have any issues still.
      Downblocks Upblocks
      32 ---------------------------64->16
      |down |upsample(&concat)
      64 ------------------128->32
      |down |upsample(&concat)
      128------------256->64
      |down |upsample(&concat)
      256----256---128

    • @takihasan8310
      @takihasan8310 3 หลายเดือนก่อน

      @Explaining-AI Sorry, my mistake. I got it. You are saving the feature tensors before passing it through the down block hence the math works out if we consider that. But isn't normally we concatenate the feature tensor obtained after passing through the downblock? in my brief experience with unets I have seen that normally. That's why I thought there is mistake.

    • @Explaining-AI
      @Explaining-AI  3 หลายเดือนก่อน

      @@takihasan8310 yes you are right. That way is indeed closer to the "official unet" implementation. After spending limited amount of time on this, I found this way enabled me to write simpler code. So went with this only. And as long as the network has layers of downsampling followed by layers of upsampling together with concatenation of downblock feature maps, I would say it still qualifies as a unet per say. But yes, definitely not the official paper's unet implementation.

  • @xdhanav5449
    @xdhanav5449 4 หลายเดือนก่อน

    Thanks for the very informative video! I am having trouble with using my own dataset in this. I'm doing this on a macbook in google colab. Currently, I have mounted my drive to the colab and pulled in my dataset from my drive, through the default.yaml. However, I am getting an error, saying that num_samples should be positive, and not 0. I am not sure what you mean by "Put the image files in a folder created within the repo root (example: data/images/*.png ).". What is this repo root and where can I find it? Is it local on my computer? Could you help with this? Thank you in advance!

    • @Explaining-AI
      @Explaining-AI  4 หลายเดือนก่อน

      You are welcome! So the path in config can either be the relative path from the "DDPM-Pytorch" directory or the absolute path. So currently the config assumes inside DDPM-Pytorch directory there would be a data/images folder which will have all image files.

  • @takihasan8310
    @takihasan8310 3 หลายเดือนก่อน

    @Explaining-AI
    Sorry to bother you but I don't know why but whenever I am training on any dataset, I tried mnist, cifar10 etc but mse loss is always nan. Is this expected, I checked my transformation. It is correct, first transforms.ToTensor(), and transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]). All the losses are nan values, will the model learn anything meaningful?

    • @Explaining-AI
      @Explaining-AI  3 หลายเดือนก่อน

      Were you able to get rid if this issue? Is it possible for you send me a link to your repo in case you have changed any part of code or parameters of training.

  • @muhammadawais2173
    @muhammadawais2173 5 หลายเดือนก่อน

    hi Sir, i would like to request you kindly make changing in the stable diffusion model repository regarding size of the images because this repository is not supporting high image size and required very high GPU memory like for 256 size images its required almost 200Gb which is high cost effective. also if possible include few evaluation metrics for quantitative analysis between the original and the generated images. waiting for the next video!

    • @Explaining-AI
      @Explaining-AI  5 หลายเดือนก่อน +1

      Hi @muhammadawais2173, I will next start working on the Stable diffusion video but unfortunately it would take me a month to get it up with code and video. Sorry but its going to take that long given my other works. In case you are really blocked because of this might I suggest using the hugging face diffusers library . They will anyway have much more efficient implementation than me :)

    • @muhammadawais2173
      @muhammadawais2173 5 หลายเดือนก่อน

      @@Explaining-AI Thank you so much. I will go through it. Infect, i already went through many diffusion model implementation but you explained very well and an easiest way also your model give satisfactory results as compared to others.

  • @paramthakkar4658
    @paramthakkar4658 3 หลายเดือนก่อน

    I am getting a Cuda out of memory error when used on my own dataset. The dataset consists of .npy files

    • @Explaining-AI
      @Explaining-AI  3 หลายเดือนก่อน

      Hello, If you have already tried reducing the batch size and are still getting this error, could you take a look at this github.com/explainingai-code/DDPM-Pytorch/issues/1 specifically this comment - github.com/explainingai-code/DDPM-Pytorch/issues/1#issuecomment-1862244458 and see if that helps getting rid of the out of memory error.