DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 พ.ค. 2024
  • #ddpm #diffusionmodels #openai
    GANs have dominated the image generation space for the majority of the last decade. This paper shows for the first time, how a non-GAN model, a DDPM, can be improved to overtake GANs at standard evaluation metrics for image generation. The produced samples look amazing and other than GANs, the new model has a formal probabilistic foundation. Is there a future for GANs or are Diffusion Models going to overtake them for good?
    OUTLINE:
    0:00 - Intro & Overview
    4:10 - Denoising Diffusion Probabilistic Models
    11:30 - Formal derivation of the training loss
    23:00 - Training in practice
    27:55 - Learning the covariance
    31:25 - Improving the noise schedule
    33:35 - Reducing the loss gradient noise
    40:35 - Classifier guidance
    52:50 - Experimental Results
    Paper (this): arxiv.org/abs/2105.05233
    Paper (previous): arxiv.org/abs/2102.09672
    Code: github.com/openai/guided-diff...
    Abstract:
    We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512×512. We release our code at this https URL
    Authors: Alex Nichol, Prafulla Dhariwal
    Links:
    TabNine Code Completion (Referral): bit.ly/tabnine-yannick
    TH-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: / discord
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
    Parler: parler.com/profile/YannicKilcher
    LinkedIn: / yannic-kilcher-488534136
    BiliBili: space.bilibili.com/1824646584
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 127

  • @YannicKilcher
    @YannicKilcher  3 ปีที่แล้ว +39

    OUTLINE:
    0:00 - Intro & Overview
    4:10 - Denoising Diffusion Probabilistic Models
    11:30 - Formal derivation of the training loss
    23:00 - Training in practice
    27:55 - Learning the covariance
    31:25 - Improving the noise schedule
    33:35 - Reducing the loss gradient noise
    40:35 - Classifier guidance
    52:50 - Experimental Results

    • @TechyBen
      @TechyBen 3 ปีที่แล้ว +1

      Will you cover Nvidias or Intels "AI photorealism" examples for game images to photorealism? IIRC a new paper was just released on it. Still early work, but is having better progress, as it no longer fails the temporal or hallucination (artifacts/errors) problems.

  • @ahmedalshenoudy1766
    @ahmedalshenoudy1766 2 ปีที่แล้ว +27

    Thanks a lot for the thorough explanation!
    It's helping me figure out a topic for my master's degree.
    Much much appreciated ^^

  • @videowatching9576
    @videowatching9576 2 ปีที่แล้ว +1

    Fascinating, incredible video! Really appreciate the walkthrough! Such as the cosine vs linear approach to make sure each step in diffusion is useful - very interesting!

  • @sshatabda
    @sshatabda 3 ปีที่แล้ว +2

    Great video! I was surprised to see this after the latest paper just a fews days back! Thanks for the great explanations!

  • @underlecht
    @underlecht 3 ปีที่แล้ว +10

    Amazing review. Please do more as such, very interesting and thank you for sharing. Subscribed!

  • @impromptu3155
    @impromptu3155 2 ปีที่แล้ว

    Just Amazing. I guess I might read this paper for another whole day if I missed your video. Grateful!

  • @CosmiaNebula
    @CosmiaNebula 2 ปีที่แล้ว +34

    Summary: self-supervised learning. Given dataset of good images, keep adding Gaussian noise to it to create sequences of increasingly noisy images. Let the network learn to denoise images based on that. Then the network can "denoise" completely Gaussian random pictures into real pictures.
    To do: learn some latent space (like VAEGAN does) so that it can smoothly interpolate between generated pictures and create nightmare arts.

  • @andrewcarr3703
    @andrewcarr3703 3 ปีที่แล้ว +3

    Love it!! It's called the "number line" in english. Keep up the great work

  • @kxdy8yg8
    @kxdy8yg8 2 ปีที่แล้ว

    Great materials ! Honestly, I really enjoy your content !! Keep it up 👏👏

  • @Galinator9000
    @Galinator9000 ปีที่แล้ว

    Your videos are amazing Yannic, keep it up. Much love

  • @mariafernandadavila8332
    @mariafernandadavila8332 7 หลายเดือนก่อน

    Amazing explanation. Saved me a lot of time!! Thank you!

  • @scottmiller2591
    @scottmiller2591 3 ปีที่แล้ว +9

    That notation \mathcal{N}(x_t;sqrt{1-\beta_t}x_{t-1},\beta_t \mathbf{I}) sets my teeth on edge. Doing this with P, a general PDF, is fine, but I would always write x_t ~ \mathcal{N}(sqrt{1-\beta_t}x_{t-1},\beta_t \mathbf{I}), since \mathcal{N} is the Gaussian _distribution_ with a defined parameterization. BTW, the reason for sqrt{1-\beta_t}x_{t-1} is to keep the energy of x_{t-1} approximately the same as the energy for x_t; otherwise, the image would explode to a variance of T*\beta after T iterations. It's probably a good idea to keep the neural network inputs to about the same range every time.

    • @cedricvillani8502
      @cedricvillani8502 2 ปีที่แล้ว

      Don’t forget to edit your text next time you paste it in😮

  • @ChunkyToChic
    @ChunkyToChic 3 ปีที่แล้ว +252

    My boyfriend wrote these papers. Go Alex Nichol!

    • @taylan5376
      @taylan5376 3 ปีที่แล้ว +9

      And i already felt sorry for your bf

    • @LatinDanceVideos
      @LatinDanceVideos 2 ปีที่แล้ว +19

      You’ll have to compete for his attention with all the coding fanbois. Either way, lucky girl. Hold onto that guy.

    • @luisfable
      @luisfable 2 ปีที่แล้ว +3

      With every great person, there is a great partner

    • @TheRyulord
      @TheRyulord 2 ปีที่แล้ว +10

      @@LatinDanceVideos TH-cam says her name is Samantha Nichol now so I guess she took your advice.

    • @cedricvillani8502
      @cedricvillani8502 2 ปีที่แล้ว

      Lose 10 pounds by cutting your head off??? 😂😂

  • @binjianxin7830
    @binjianxin7830 ปีที่แล้ว +1

    18:46 I guess it’s very likely to be related to Shannon’s Sampling theorem, reconstructing the data distribution by sampling with the well defined normal distribution. The number of time steps and Beta closely related to the band width of the data distribution.

  • @chaerinkong5303
    @chaerinkong5303 3 ปีที่แล้ว +1

    Thanks a lot for this awesome video. I really needed it

  • @pedrogorilla483
    @pedrogorilla483 5 หลายเดือนก่อน

    Historic video! Fun to see it now and compare it to the current state of image generation. I’ll check it again in two years to see how far we’ve got.

  • @princeofexcess
    @princeofexcess 3 ปีที่แล้ว +1

    Great video. Could you possibly up the volume level for the next video. I notice this video is much quieter than other videos I watch.

  • @luke2642
    @luke2642 3 ปีที่แล้ว +3

    I wonder if multiscale noise would work better. It'd fit more with convolutions. Instead of 0% to 100% noise, it could disturb from pixels to the whole image.

  • @jakubsvehla9698
    @jakubsvehla9698 2 ปีที่แล้ว

    awesome video, thanks!

  • @MrBOB-hj8jq
    @MrBOB-hj8jq 3 ปีที่แล้ว +8

    Can you please make a video about SNN's and latest research on SNN's?

  • @G12GilbertProduction
    @G12GilbertProduction 3 ปีที่แล้ว

    Diffusing noise with a foward sampling is really more entropian in context accumulation of sharing data by the transformer, but visual autoencoders is thinny for this Gaussian / or / Bayes-Gauss mixture, without a one transformer for a layer.
    EDIT: I thought is only the prescriptive sense of this upper statement, not evenmore.

  • @TechyBen
    @TechyBen 3 ปีที่แล้ว +1

    Detecting signal inside the noise. Wow. It's like a super cheat for cheat sheets. And it works! :D

  • @linminhtoo
    @linminhtoo 3 ปีที่แล้ว +49

    yannic, thanks for the video. the audio is a little soft even at max volume (unless I'm wearing my headphones). is it possible to make it a bit louder?

    • @YannicKilcher
      @YannicKilcher  3 ปีที่แล้ว +20

      Thanks a lot! Can't change this one, but I'll pay attention in the future

    • @abdalazizrashid
      @abdalazizrashid 2 ปีที่แล้ว +1

      Yup, correct most of your videos has a quite low volume

    • @JurekOK
      @JurekOK 2 ปีที่แล้ว +1

      maybe this is just correct -- it's a regular hifi audiophile loudness level. In here, there is no need for hyper compression filters like in commercials and cheap music videos.

    • @ShawnFumo
      @ShawnFumo ปีที่แล้ว

      @@JurekOK Maybe, but in practice using my laptop speakers with Windows and TH-cam volumes maxed out, it is still pretty low volume. I had to put subtitles on to make sure I didn't miss things here and there, and this was in a fairly quiet room.

  • @mohamedrashad7845
    @mohamedrashad7845 3 ปีที่แล้ว +1

    What software and hardware you use to make this video (drawing tables, adobe reader, others) ?

  • @natanielruiz818
    @natanielruiz818 2 ปีที่แล้ว

    Amazing video.

  • @JamesAwokeKnowing
    @JamesAwokeKnowing 3 ปีที่แล้ว +6

    This makes me think that instead of super res from lower res image it could be even more effective to store a sparse pixel array (with high res positioning). You could even have another net 'learn' a way of choosing eg which 1000 pivels of a high res image to store (pixels providing most information for reconstruction).

    • @vidret
      @vidret 3 ปีที่แล้ว

      yes... yeeeeeesssssssssssss

    • @Champignon1000
      @Champignon1000 2 ปีที่แล้ว

      wow thats a really great idea actually!

  • @bertobertoberto242
    @bertobertoberto242 ปีที่แล้ว

    I would say that the sqrt(1-B) is used to converge to a N(0,sigma), mainly in it's "mu", othersize adding gaussian noise would just (in expectation) have X0 as mu, instead of 0

  • @bg2junge
    @bg2junge 3 ปีที่แล้ว +12

    Any results(images) from generative models should be accompanied by the nearest neighbor(vgg latent, etc) from the training dataset. I am going to train it on mnist🏋

    • @alexnichol3138
      @alexnichol3138 3 ปีที่แล้ว +8

      There are nearest neighbors in the beginning of the appendix!

    • @bg2junge
      @bg2junge 3 ปีที่แล้ว +2

      @@alexnichol3138 i retract my statement.

    • @48956l
      @48956l 2 ปีที่แล้ว

      @@bg2junge I demand seppuku

  • @nisargshah467
    @nisargshah467 3 ปีที่แล้ว +2

    I was waiting for this.. so not read the paper.. thanks yannic

  • @Kerrosene
    @Kerrosene 3 ปีที่แล้ว +1

    Reminds of normalising flows..the direction of the flow leads to a normal form through multiple invertible transformations...

    • @PlancksOcean
      @PlancksOcean 2 ปีที่แล้ว +1

      It looks like it but the transformation (adding some noise) is stochastic and non invertible

  • @stephanebeauregard4083
    @stephanebeauregard4083 3 ปีที่แล้ว +9

    I´ve only listened to 11 minutes so far but DDPMs remind me a lot of Compressed (or Compressive) Sensing ...

  • @proinn2593
    @proinn2593 3 ปีที่แล้ว +3

    There is this step wise generation in GAN's, not based on steps from noise to image, but based on the size of the image, like in Pro-GAN and MSG-GAN. In these models you have discriminators for different sizes of the image, kind of.

    • @gustavboye6691
      @gustavboye6691 2 ปีที่แล้ว

      yes that should be the same right?

    • @cedricvillani8502
      @cedricvillani8502 2 ปีที่แล้ว

      Are you saying it’s not the size of your GAN that matters, but how You use it? 😂

  • @easyBob100
    @easyBob100 2 ปีที่แล้ว +1

    Another question. If the network is predicting the noise added to a noisy image, what do you then do with that prediction? Subtract it from the noisy image? Do you then run it back through the network to again, predict noise?
    When you train this network, do you train it to only predict the small amount of noise added to the image between the forward process steps? Or does it try to predict all the noise added to the image from that point?
    Or maybe it's more like the forward process? Starting with latent x_T as input to the network, the network gives you an 'image' that it thinks is on the manifold (x_T-1). At this point, it most likely isn't, but, you can move 1/T towards it like we did moving towards the Gaussian noise to get to x_T. Then, repeat....?
    More examples and less math always helps...

    • @furrry6056
      @furrry6056 ปีที่แล้ว

      Yes, it's a step by step approach. Thus, when 'destroying' the image, the image at Ti = image at Ti-1 + noise step. You just keep adding / stacking noise, adding a bit more noise (to the previous noise) at each new step. It isn't really 'constant' though. The variance / amount of noise added, depends on the time step and the schedule. A Linear schedule would be constant (adding same amount of noise at each Ti), but if you look at the images (de)generated doing so, you get a quite long tail of images that contain nearly only noise. Therefore a cosine schedule is used, meaning the variance differs per Ti, and also ending up with more information left in the images at the latter time steps.
      The timestep is actually encoded into the model. Thus, the parameters that are learned to predict the noise 'shift' depending on T. (At least.. In my understanding / words. I'm just a dumb linguist - I don't know any maths either 😅.) Perhaps a better way to explain it, is to imagine that at small Ti, the model can depend on all kinds of visual features (edges, corners, etc.) learned to predict noise. At large T, those features / params get less informative, thus you rely on other features to estimate where the noise is. (Thus its probably not the features that shift depended on T, but their weights.)
      When generating a new image, you start at Tmax. Thus, pure random noise only. The model first reconstructs to Tmax-1. Removing a little noise.. Then, taking this image, you again remove a bit more noise, etc. It's an iterative process.

  • @arnabdey7019
    @arnabdey7019 13 วันที่ผ่านมา

    Please make explanation videos on Yang Song's papers too

  • @johongo
    @johongo 3 ปีที่แล้ว

    This paper is really well-written.

  • @zephyrsails5871
    @zephyrsails5871 3 ปีที่แล้ว +1

    Thank you Yannic for the video. QQ: why would we adding Gaussian noise for image requires multivariate Gaussian instead of just 1d Gaussian? Is the extra dimension used for different color channel?

    • @PlancksOcean
      @PlancksOcean 2 ปีที่แล้ว +1

      1dimension per pixel 🙂

  • @JTchen-sq6gs
    @JTchen-sq6gs 2 ปีที่แล้ว +1

    It seems you used a tool to concatenate two paper PDFs togather? It is cool, would you mind telling me which tool?

    • @erniechu3254
      @erniechu3254 2 ปีที่แล้ว +1

      If you're on Mac, there's a native script for that. /System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py. Or you can just use Preview.app lol

  • @samernoureddine
    @samernoureddine 3 ปีที่แล้ว +8

    Lightning

  • @easyBob100
    @easyBob100 2 ปีที่แล้ว +1

    Can someone explain the noising process with some pseudocode? Is the noise constantly added(based on t) or blended (based on percent of T)? And of course, does it make a difference and why?
    EDIT: Nevermind. I always figure it out after asking. :) (I generate some noise, and either blend or lerp towards it, as they are the same)

  • @soumyanasipuri
    @soumyanasipuri ปีที่แล้ว

    Can anyone tell me what do we mean by x0 ~ q(x0)? In terms of pictures, what is x0 and what is the data distribution?
    Thank you.

  • @nomadow2423
    @nomadow2423 8 หลายเดือนก่อน

    16:55 denoising depends on the entire data distribution sizes because adding random noise in one step can be done independent of all previous steps; just add a bit of noise wherever you like. But removing noise (the reverse) has to assume there was noise added in some number of previous steps. Thus, in the example of denoising a small child's drawing, it's not that we're removing ALL the noise. Instead, The dependence problem arises in simply taking a single step towards a denoised picture.
    Can anyone clarify/confirm?

  • @romagluskin5133
    @romagluskin5133 10 หลายเดือนก่อน

    50:24 "Distribution shmistribution" 🤩

  • @daniilchesakov6010
    @daniilchesakov6010 2 ปีที่แล้ว +1

    Hi! Amazing video, thank you a lot!
    But I'm a bit confused with one detail and have a stupid question. As soon as we train our model to predict epsilon, using x_t and t, and also we have a formula x_t = \sqrt{\bar{\alpha}_t } * x_0 + \sqrt{1 - \bar{\alpha}_t }*eps
    We can get that x_0 = (x_t - \sqrt{1 - \bar{\alpha}_t }*eps) / \sqrt{\bar{\alpha}_t }
    And here we know alphas coz they are constants, also we know x_t (just some noise) and we know eps as it is the output of our model -- why can't we calculate the answer in just one step?
    Would be very grateful for answer!

    • @idenemmy
      @idenemmy 2 ปีที่แล้ว

      I have the same question. My hypothesis is that such x0 would be very bad. Have you found the answer to this question?

  • @CristianGarcia
    @CristianGarcia 3 ปีที่แล้ว +5

    This is me being lazy and not looking it up, but if they predict the noise instead of the image, to actually get the image they subtract the predicted noise from the noisy image iteratively until they get a clean image?

    • @YannicKilcher
      @YannicKilcher  3 ปีที่แล้ว +7

      Yes, pretty much, except doing this in a probabilistic way where you try to keep track of the distribution of the less and less noisy images.

  • @mikegro3138
    @mikegro3138 2 ปีที่แล้ว

    Hi, i watched the video, but this is not a topic i am familiar with. Could anyone pleas describe in a few sentences how this works. Especially how disco diffusion works. Where does it gets the graphical Elements for the images, how does it connect keywords from the prompt with the artists, the style etc. It seems i can use every Keyword i want, but if there is a database, it should be limited. Is it trained somehow to learn what the different styles look like? What if i pick an uncommon keyword? So much questions to understand this incredible Software. Thanks

  • @nahakuma
    @nahakuma 3 ปีที่แล้ว +5

    By the way, these DDPM models seem very related (a practical simplification?) to the Neural Autoregressive Flows, where each layer is invertible and each layer performs a small distrbution perturbation which vanishes with enough layers

    • @gooblepls3985
      @gooblepls3985 2 ปีที่แล้ว

      True! I think the important difference (implementational simplification) is that you have no a-priori restrictions on the DNN architecture here, i.e., the layers do not need to be invertible, and the idea is almost agnostic to what exact DNN architecture you use

  • @ProfessionalTycoons
    @ProfessionalTycoons 3 ปีที่แล้ว

    super dope

  • @johnsnow9925
    @johnsnow9925 3 ปีที่แล้ว +23

    It wouldn't be OpenAI if they actually released their pretrained models

    • @PaulanerStudios
      @PaulanerStudios 3 ปีที่แล้ว +14

      ClosedAI

    • @herp_derpingson
      @herp_derpingson 3 ปีที่แล้ว +1

      @@PaulanerStudios BURN

    • @ShawnFumo
      @ShawnFumo ปีที่แล้ว

      Well it's a bit of a moot point now that Stable Diffusion has released theirs. Maybe it isn't matching DALL-E 2 in all areas yet, but is coming pretty close, especially the 1.5 model (already on DreamStudio, though not available for download quite yet).

  • @JTMoustache
    @JTMoustache 3 ปีที่แล้ว +3

    44:14:
    p(a|b,c) = p(a,c|b) / p(c|b) = p(a|b) * p(c|b,a) / p(c|b) = Z * p(a|b) * p(c|a,b)
    and if c independant of b given a
    = Z * p(a|b) * p(c|a)
    But Z = p(c|b)
    So given that c independant of b given a, p(a|b,c) = p(a|b) * p(c|a) / p(c|b)
    Here a = xt, b = xt+1, c=y, Z= 1 / p(y|xt+1) ..
    Then they probably consider y independent of xt+1 given xt.
    Problem is, if they consider y indep of xt+1 given xt, they should probably consider y indepedent of xt given xt+1 which would basically say p(xt|xt+1,y) = p(xt|xt+1).
    But I guess it is the whole point to say that actually no, xt contains more information about y than xt+1 so it y is not independant of xt given a more noisy version of xt (xt+1).

    • @nahakuma
      @nahakuma 3 ปีที่แล้ว

      I think it is more natural to do your derivation with a=x_t, b=y, c=x_{t+1}. In this way, a fitting probabilistic graph model would be y -> x_{t} - > x_{t+1}. So, the class label y clearly determines the distribution of your image at any step, but given the current image x_{t} you already have a well defined noise process that tell you how x_{t+1} will be obtained from x_{t} and the label then becomes irrelevant.

  • @herp_derpingson
    @herp_derpingson 3 ปีที่แล้ว

    The audio is a bit quiet in this video.
    .
    0:00 I didnt realize any of these were generated. Totally fooled my brain's discriminator.
    .
    29:00 How can the noise be lesser than the accumulated noise upto that point? Are we taking into account that some noise added later might undo the previously added noise?
    .
    50:00 I am not sure how to take the learnings to GANs from diffusion models. The only thing I can think of is pre-training the discriminator with real image and noised real image, but that sounds so obvious I am sure 100s of papers have already done that.
    .
    All in all I would love to see more papers which make the neural networks output weird things like probability distributions instead of simple images or word tokens.

  • @brandomiranda6703
    @brandomiranda6703 3 ปีที่แล้ว +7

    What is the main take away?

    • @YannicKilcher
      @YannicKilcher  3 ปีที่แล้ว +21

      make data into noise, learn to revert that process

    • @herp_derpingson
      @herp_derpingson 3 ปีที่แล้ว +3

      Train a denoiser but dont add or remove all the noise in one step.

  • @CristianGarcia
    @CristianGarcia 3 ปีที่แล้ว +1

    Can you use this technique to erase adversarial attacks?

    • @herp_derpingson
      @herp_derpingson 3 ปีที่แล้ว +1

      Thats an interesting idea. Although I think we will have to train the network specifically on adversarial noise. Might not though. Not sure, but good idea regardless.

    • @jg9193
      @jg9193 2 ปีที่แล้ว

      You'd have to be careful, because this technique relies on neural networks that can potentially be attacked

  • @nahakuma
    @nahakuma 3 ปีที่แล้ว

    I wonder why they state that the undefined norm ||.|| of the covariance tends to 0. Doesn't it tend to whatever is the norm of a uniform covariance matrix?

    • @herp_derpingson
      @herp_derpingson 3 ปีที่แล้ว

      Isnt the norm of uniform cov matrix, with mean=0, std=1, zero?

    • @nahakuma
      @nahakuma 3 ปีที่แล้ว

      @@herp_derpingson As far as I know, the norm of a matrix A is typically defined as the maximum norm of the vector x^TAx, with x^Tx = 1. In the case of a normal distribution you would have x^TAx=1 for any x and so the norm of the covariance would be 1. Am I wrong?

    • @herp_derpingson
      @herp_derpingson 3 ปีที่แล้ว +1

      @@nahakuma Nah, I am a bit out of touch with math. You are probably right.

  • @user-lj8ic6zt1x
    @user-lj8ic6zt1x 3 ปีที่แล้ว

    22:31 Can someone explain how eq 12 acquired?

    • @hudewei7166
      @hudewei7166 3 ปีที่แล้ว

      It is approximated by the product of two Gaussian distribution q(x_t|x_{t-1}) and q(x_{t-1}|x_0). If the chain rule is applied on eq.(12), then you can get q(x_{t-1}|x_t,x_0)=q(x_t|x_{t-1},x_0)q(x_{t-1}|x_0)/q(x_t|x_0). They also approximate q(x_t|x_{t-1},x_0)=q(x_t|x_{t-1}). Then if the normalization term is ignored, you will get the expression of (10) and (11).

  • @user-lb1bs8iv3f
    @user-lb1bs8iv3f 2 ปีที่แล้ว

    how about explain the code of this paper

  • @user-vj2nw1ny5e
    @user-vj2nw1ny5e 3 ปีที่แล้ว +1

    Hi! Please do something with your mic, because the video is so silent

  • @Farhad6th
    @Farhad6th 3 ปีที่แล้ว

    The voice has a problem, it is very low. Please in the next videos fix that. Great video. Thank you.

  • @lllcinematography
    @lllcinematography ปีที่แล้ว

    your audio recording volume is too low. i have to increase my volume like 4x compared to other videos. thanks for the content.

  • @akashchadha6388
    @akashchadha6388 3 ปีที่แล้ว

    Schmiduber enters chat.

  • @piotr780
    @piotr780 11 หลายเดือนก่อน

    but this random image at the end does not contain any information !

  • @oleksandrskurzhanskyi2233
    @oleksandrskurzhanskyi2233 2 ปีที่แล้ว

    And now these models are used in DALL·E 2

  • @donfeto7636
    @donfeto7636 ปีที่แล้ว

    paper after updates become so complex to read with math

  • @bertchristiaens6355
    @bertchristiaens6355 3 ปีที่แล้ว +2

    If you add noise from a standard normal distribution thousands of times, isn't the average noise (expected value) added close to zero, resulting in the same image?

    • @samernoureddine
      @samernoureddine 3 ปีที่แล้ว +3

      Even if they were using standard Gaussians (they aren't), the sum of just two standard Gaussians X and Y is not a standard Gaussian (the variances add up)

    • @trevoryap7558
      @trevoryap7558 3 ปีที่แล้ว

      But the variance will increase so significantly that it will be just noise. (Assuming that the noise are all independent)

    • @bertchristiaens6355
      @bertchristiaens6355 3 ปีที่แล้ว

      @@samernoureddine Thank you! I assumed that it was equivalent of sampling 1000 times (for example) from the same distribution N(0,var). Since these samples approximate the distribution of N(0,var) the mean of these values were 0 I thought. But I should rather see it as a sample from N(0, var+var+..+var), right? (since we add up the samples)

    • @samernoureddine
      @samernoureddine 3 ปีที่แล้ว

      @@bertchristiaens6355 that would be right if they just wanted the noise distribution at some time t (and if the mean were zero: it isn't). But they want the noise distribution to evolve with time, and so the total noise at time t+1 is not independent from the total noise at time t

    • @cerebralm
      @cerebralm 3 ปีที่แล้ว +1

      It's like a random walk, the more random choices you make, the further you get from where you started (but unpredictably so)

  • @peterthegreat7125
    @peterthegreat7125 ปีที่แล้ว

    Still confused about the math theory

  • @shynie4986
    @shynie4986 ปีที่แล้ว

    what' the purpose of the covariance matrix? or covariance and why is it important to us?

  • @XX-vu5jo
    @XX-vu5jo 3 ปีที่แล้ว +2

    The problem with these solutions are their computing cost. I think they should focus more on that instead and they rely too much on data.

  • @amansinghal5908
    @amansinghal5908 7 หลายเดือนก่อน

    why even do it when you'd do it in such a hand wavy manner?

  • @fast_harmonic_psychedelic
    @fast_harmonic_psychedelic 3 ปีที่แล้ว

    why dont they just use clip as a classifier. does nobody know about this? lol

  • @twobob
    @twobob ปีที่แล้ว

    Not too long

  • @PeterIsza
    @PeterIsza 3 ปีที่แล้ว

    Video starts at 4:28.

  • @cedricvillani8502
    @cedricvillani8502 2 ปีที่แล้ว

    The better you are detecting bullshit, the better you are at creating bullshit😂 none of my work would ever be public facing until I was sure I could always identify it and manipulate it and I’m sure that’s true for any company or skilled researcher. ❤😢

  • @XX-vu5jo
    @XX-vu5jo 3 ปีที่แล้ว +1

    You can’t explain the equations. 🙄

  • @DanFrederiksen
    @DanFrederiksen 3 ปีที่แล้ว +1

    This seemed much too long. For instance you don't need to labor the notion of denoising for minutes. Noise reduction should be in people's vocabulary at this level. I'd suggest going directly to what diffusion models are and try to prepare succinct explanations instead of just going for an hour.

    • @nahakuma
      @nahakuma 3 ปีที่แล้ว +7

      Or you could simply skip the parts you already understand ;)

    • @frankd1156
      @frankd1156 3 ปีที่แล้ว +4

      This free knowledge...as try to criticize nicely or move on to another resource

    • @banknote501
      @banknote501 3 ปีที่แล้ว +3

      Ok, if it is so easy, just do a video yourself. We need videos about AI topics for viewers of all skill levels.

    • @mgostIH
      @mgostIH 3 ปีที่แล้ว +2

      This seems like fair criticism, I don't see why they are being hostile with you

    • @DanFrederiksen
      @DanFrederiksen 3 ปีที่แล้ว

      @@mgostIH I understand that some feel defensive, but it wasn't meant as an attack but empowering observation. communication is vastly more potent the more concise and clear it is.