After 2 years I made a new video explaining diffusion models from a different angle. I find this approach much better to understand: th-cam.com/video/B4oHJpEJBAA/w-d-xo.html
For those who are confused about the recursive expansion at 13:13 (like I did), it's "a property of Gaussian distributions, where the variance of the sum of two independent Gaussian variables is the sum of their variances. "
I'm confused about the notation q(Xt|Xt-1) and p(Xt-1|Xt). Never seen the result of a function presented as part of the argument before. Not even sure I understood which is which from his prose.
Seems to follow from uncorrelated noise variables at different steps, using the formula var(X1+X2)=var(X1)+var(X2)+2cov(X1,X2) where cov(X1,X2)=0. We don't seem to need to use normality here
This was the best ML paper review I have ever seen. You stopped making videos but I would really love to see you go through more of this for more research in the field man! Hatsoff to you.
I've watched a bunch of videos trying to understand Diffusion (Ari Seff, Assembly AI etc) and this one taught me the most by far. Please keep making videos!
thank you so much! actually it's not even animated with manim. It's all done in Premiere Pro haha. But I guess that I'll definitely do those things in manim in future videos....
@@outliier Thanks for sharing bit how do ppl.not get bored and frustrated during the math lart..even if you are a math genius..and if you don't think of the smweird step of taking out the first term of the sum..can't you still reach the same goal? So why do thst at all?
Hey, thanks very much for making this wonderful video! I just want to appreciate the fact that all notations are clearly explained before going into the math part. That helps a lot! Great work!
I really like your video because instead of using a bloated set of terminologies like conditional, marginal, prior, posterior blah blah, u just nailed it down to "function". You're like the p function that denoises these condiffusion jargons :))
Superb work. 1. Gone through the history of diffusion of models by explaining all the previous papers. 2. Giving an intuition of whole idea. 3. Explaining math behind it. 4. Also incorporating future prospects
16:24 I don't understand how you rewrote the KL divergence as the log ratio. Specifically, I don't understand how D_KL (q || p) = log(q / p). This is different from the definition of the KL divergence, which would suggest that D_KL (q || p) = integral q * log(q / p). Could someone please explain why D_KL (q || p) = log(q / p) in this case? Thank you! This was a fantastic video and your efforts are greatly appreciated!
See the original paper arxiv.org/pdf/2006.11239.pdf page 2. The objective is to maximum the "expected" negative log likelihood. Since the expectation is calculated as integral over x_1...T rather than x_0, it'll be 1. You can think that everything the video talks about happen inside the E_q[ ... ] bracket
Nicely explained. Most of the people leave these derivatives thinking it would make the tutorial boring but without these derivativation we don't understand how was the methodology evolved. Great job reasearching and explaining.
This is the first source I was able to find that explained the math behind diffusion models in a comprehensible way instead of glossing over it. Thanks a lot, you have earned my like and subscribe with just this video alone!
What an amazing video!! I looked everywhere for a comprehensible video about Diffusion Models and yours was simply the best… Please keep up the effort and the great content :)
Thank you so much for delving deep into the math. I'm an engineer (not software) and self-learning AI. The papers are unfortunately not written in the most explainable way, and even though I've taken high level math courses for my degree, the notation and terminology in the papers make it pretty inaccessible and frustrating to follow. Thanks for going through this paper, I hope you continue to make more videos.
Thank you so much. I actually just recently worked out a lot of this math a couple weeks ago for a model I'm building and this video would've saved me so much time. Very clear. Thank you 🙏
This video is amazing. I think the format of your video was incredible, you went over the literature and told us how we got there, you went over the high-level explanation then got into the nitty-gritty detail and then just in case we miss something you gave an amazing recap. This is how all videos on deep learning should be. Especially as we're getting into more Niche topics.
Wow……. Haven’t read math in a while, this was explained excellently. I have a masters degree in physics but don’t do much math anymore since my degree in 2017. I really like how much detail you went into with the derivations and the pausing to ground what we are doing with some intuition. Well done man 🎉
This is an amazing video. I've gone through many videos to get the intuition behind the diffusion model, but nothing never helped. You did a great job simplifying the entire process.
Thanks for the video. Can someone explain why we can do the KL divergence step at 19:55? To me you haven't taken the integral of the expression across all samples and there's no q(x_T|x_0) in front of the first term for example, so why can we do this?
Best explanation I've seen so far. Though notation in math derivation section is still poorly explained... I understand every step in derivation, but don't always understand what each term logically means.
I started reading articles and looking for learning content on diffusion modelling and the notation seemed a bit difficult. However, I am only half way through this video and I can assure you that this video is a must watch. Very clear explanation, I will recommend it to anyone interested in exploring this field, congratulations on your work!
This is my first time leaving a comments under a ML tutorial YT channel. The explanation is amazing intuitive, thanks for sharing your knowledge and creating this video!
Appreciate the effort you put into this. You definitely can teach. If only I have a brain to understand math... still got some bits here and there. Thanks
This is the first ever video of you that I get to see. Congrats, truly amazing. I believe you are among the first people on YT to dig into the math equations of ML papers like this, and I believe it's truly valuable. Keep it up!
Kudos to you. Hats off to explain such a topic with so much ease even though the math equations looks scary at first. You made it real easy. Great work
Would have upvoted several times. Yours is the first video I found that actually goes into the math. Others just slap it onto the screen as fact, dazzling and confusing the viewer.
This is one of the rare videos i wanted to like twice. Learning this in uni but im struggeling so hard, i think i am a mathy person but all those unexplained choices and variables, calculation stepps without knowing why... it made it so hard to more deeply understand the material. But your video is just perfect, referencing the sam papers but now its all more childs play and fun to stop and follow. Its almost sad you only have so few videos but at least the quality is through the roof.
Thank you so much! I’m currently in my bachelors studying AI (it’s a real major in Germany). Apart from that I started 4 years ago and been mostly active in the generative field for the past now.
Thanks for the video, very neat explanation. May I suggest, when you explain the forward process the second equation in 13:02 is q(x_{t}|x_{t-2}) ... up to q(x_{t}|x_{0}) for the final formula. Also the derivation of the chain rule is not entirely obvious, it took me some time to find the answer. The answer is that the variance of summation of the two normal gaussians is equal to the sum of variances. This is how you get rid of the square root and the sum of variances give the expected result of 1 - a_{t}a_{t-1}...a_{1}.
Man, this is incredible. When I saw these equations in the paper and other sources I was like "no way I am gonna understand that".. but with this video it all makes sense. Brilliantly done, thank you so much for your work. Instant subscribe and I am going to check other content on your channel :D
I just watched your video on diffusion models, and I am incredibly impressed with the depth of information you provided. Your explanation was clear, concise, and immensely helpful. Thank you for sharing your knowledge on this topic. I learned a lot from your video and I truly appreciate your efforts in creating such valuable content.
Thank you for making such a high quality video explaining the math. Often, other channels do not emphasize on the math and this video is perfectly putting light on how exactly the math fits in diffusion models. Thank you for your amazing work. Please, make more such content!
I remember coming at this video a month ago to understand diffusion models, getting overwhelmed and lost by te scary tons of maths formulae, Now after reviewing the necesary math concept, Realized how beautifully you've put it all together....Amazing
OMG this is insanely complex thing i've ever learned yet in ML/AI and tho I see I still gotta spend some time in it but kuddos u've done a super amazing job!
@@outliier brother there's a slight confusion. In Algo#2 , we already sampled a random noise x_t , and remove a predicted noise to obtain x_t-1, then why do we add another random noise z and what is even that z for ?
@@curiousseeker3784 when you have x_t and you predict the noise you get an approximation for x0. This however doesn’t look so good, thats why you add noise again until x_t-1 and then repeat the process. So you have an iterative sampling process.
A few comments below, Outlier posted: ``` Im going to cite a friend here: "During training we sample a batch of data from a distribution with probability p. So the global function to be optimized is a summation over the dataset p*log(p/q), which is an expectation of log(p/q) by definition." ```
The math derivation part was amazing. really good. If I could have just one note, I would've wished you spoke a bit slower, just a tiny bit. But truly great work, much appreciated and waiting for more content.
I was just using those tools to generate images but due to this video i got a lot more interested in understanding how they work. I hope you keep doing this kind of videos.
Thank you for this amazing and helpful video! It was a good entry point for me on my way to move from GANs to Diffusion Models for my future research during my PhD.
Many thanks for this. I'm an artist with very limited math skills and though I can't say I understood the whole, your teaching gave me a solid basis and an understanding of this I've been wanting. You have another fan.
this video is *by far* the best video on diffusion models i've seen on youtube. this was very pleasant to watch and you made everything really clear. brilliant!! i subscribed and turned on notifications :) have an amazing day :)
Great video! However, I think one more reason at 18:23 is that the conditioning on x0 should have been there from the start, even without the following derivation. It was somehow dropped since 17:23.
I think at 18:23 the conditioning on x0 in Baysian formula holds for q(xt|xt-1, x0) in general cases. However, it is probably that by Markovian property, q(xt|xt-1, x0) = q(xt|xt-1).
Just want to say thank you. I believe this is one of the most high-quality videos I have ever seen given on diffusion models! Keep it going. I have subscribed!
you should create more of this videos...they are just so good... It must been time consuming. Maybe consider make some smaller topics or split one big topic into more videos. AMAZING JOB. I believe a high school can get the main points from this! GJ!
Thanks for such amazing illustration for Diffusion. One question is about the equation in slice @ 13:16, how to get t-2 and t-3? x_t=sqrt(a_t)*x_t-1+sqrt(1-a_t)*e x_t-1=sqrt(a_t-1)*x_t-2+sqrt(1-a_t-1)*e x_t=sqrt(a_t)*[sqrt(a_t-1)*x_t-2+sqrt(1-a_t-1)*e]+sqrt(1-a_t)*e=sqrt(a_t*a_t-1)*x_t-2+[sqrt(a_t-a_t*a_t-1)+sqrt(1-a_t)]*e The rightmost term doesn't equal or close to sqrt(1-a_t*a_t-1)*e Dis I misunderstand something? Thanks again. @Outlier
Hey, really awesome question! subscribed! But I have a problem I can't wrap my head around: at 13:10 when we go from x_t-1 to x_t-2: I understand the left hand side of the equation but can someone explain me why the right hand side is sqrt(1 - alpha_t * alpha_t-1) * epsilon? If you just substitute x_t-1 in the equation above I thought we would end up with : (sqrt(1 - alpha_t) * epsilon + sqrt(1 - alpha_t-1) * epsilon). I understand that its supposed to "merge" the variance of two gaussian distributions but I just dont understand how you end up with the right hand side, if anyone could explain this to me I would be so thankful!!!
In this part, I also tried to derive the formula but can't get it too. My derivation of the right hand side (the epsilon part) ended up to (sqrt(alpha_t - alpha_t alpha_{t-1} + sqrt(1-alpha_t)) epsilon Unless sqrt(a)+sqrt(b) = sqrt(a+b) (which is not true), I also can't get the sqrt(1-alpha_t alpha_{t-1}). I wonder what I am missing
@@marcella.astrid this is the first time for me having a discussion over math on youtube. I will try to look into it. I found some rule empirically that shows that this acctualy is true, if you merge two gaussians, the second expectation is just sampled from the first gaussian with a certain factor, then the factor goes into the veriance of the new distribution. I actually made a jupyter notebook to try it with all kind of values I could send it to you if you want, but I still did not found the underlying rule that explains it. asked a lot of math students in real life but either they are too busy or dont know this rule too.
I have explained it in another comment. The thing is that the epsilons are different normal distributions and cannot be threated as the same. You have to use some propertied of the normal distribution to end up with the formula.
@@marcella.astrid Recall that when we merge two Gaussians with different variance, e.g. sigma1^2 and sigma2^2, the variance of new distribution is (sigma1^2+sigma2^2). In this example, the right hand side equals to sqrt(alpha_t - alpha_t alpha_{t-1}) epsilon + sqrt(1-alpha_t) epsilon, which are two Gasussians merged together. The new variance is therefore, alpha_t - alpha_t alpha_{t-1} + 1 - alpha_t = 1 - alpha_t alpha_{t-1}
Thank you so much for making this video! It was very clear and I really appreciate how you walked through the math and the reasoning for how they went from the initial loss to writing it in terms of predicting the noise. Everything was well made. I look forward to watching your other videos!
After 2 years I made a new video explaining diffusion models from a different angle. I find this approach much better to understand: th-cam.com/video/B4oHJpEJBAA/w-d-xo.html
Are diffusion models really so hard to understand?
@@Тима-щ2ю you tell me
after spending 2 hours taking notes and understanding the 30 mins video, and scrolled down to find this x))
Explaining the notations is a game changer... more educational content channels should do this.
Understanding math easier than its notation used 😂😂😂😂😂
For those who are confused about the recursive expansion at 13:13 (like I did), it's "a property of Gaussian distributions, where the variance of the sum of two independent Gaussian variables is the sum of their variances. "
I'm confused about the notation q(Xt|Xt-1) and p(Xt-1|Xt).
Never seen the result of a function presented as part of the argument before.
Not even sure I understood which is which from his prose.
Seems to follow from uncorrelated noise variables at different steps, using the formula var(X1+X2)=var(X1)+var(X2)+2cov(X1,X2) where cov(X1,X2)=0. We don't seem to need to use normality here
This was the best ML paper review I have ever seen. You stopped making videos but I would really love to see you go through more of this for more research in the field man! Hatsoff to you.
music and ML >>>
Wow, this is absolutely brilliant. Massive kudos for making quite the complex topic significantly more digestible!
I've watched a bunch of videos trying to understand Diffusion (Ari Seff, Assembly AI etc) and this one taught me the most by far. Please keep making videos!
This is incredible! Did not see a video with the math explanations of diffusion models yet. And you animated it in manim! Just great. 😎
thank you so much! actually it's not even animated with manim. It's all done in Premiere Pro haha. But I guess that I'll definitely do those things in manim in future videos....
@@outliier Thanks for sharing bit how do ppl.not get bored and frustrated during the math lart..even if you are a math genius..and if you don't think of the smweird step of taking out the first term of the sum..can't you still reach the same goal? So why do thst at all?
So satisfied to know that we just need to predict the noise!!! After so many formulars...🙏🙏🙏
After going through 4 different YT videos, yours was the only one that was clear enough for me to understand. Thank you very much!
Hey, thanks very much for making this wonderful video! I just want to appreciate the fact that all notations are clearly explained before going into the math part. That helps a lot! Great work!
I really like your video because instead of using a bloated set of terminologies like conditional, marginal, prior, posterior blah blah, u just nailed it down to "function". You're like the p function that denoises these condiffusion jargons :))
@@BritskNguyen Thank you! Take a look at my latest video. I think this approach to diffusion models is even better
Here is the implementation video in PyTorch: th-cam.com/video/TBCRlnwJtZU/w-d-xo.html
Q
Hello, How did you make the animations in your video?
Superb work.
1. Gone through the history of diffusion of models by explaining all the previous papers.
2. Giving an intuition of whole idea.
3. Explaining math behind it.
4. Also incorporating future prospects
16:24 I don't understand how you rewrote the KL divergence as the log ratio. Specifically, I don't understand how D_KL (q || p) = log(q / p). This is different from the definition of the KL divergence, which would suggest that D_KL (q || p) = integral q * log(q / p). Could someone please explain why D_KL (q || p) = log(q / p) in this case? Thank you! This was a fantastic video and your efforts are greatly appreciated!
You are right! To be precise, he should be talking about the expected value of the log ratio.
See the original paper arxiv.org/pdf/2006.11239.pdf page 2. The objective is to maximum the "expected" negative log likelihood. Since the expectation is calculated as integral over x_1...T rather than x_0, it'll be 1. You can think that everything the video talks about happen inside the E_q[ ... ] bracket
hey did you understand why was it done! i have the same question.Could you please share it if found?
Nicely explained. Most of the people leave these derivatives thinking it would make the tutorial boring but without these derivativation we don't understand how was the methodology evolved. Great job reasearching and explaining.
Very well explained! You made sure to include a lot of important points others either omit or simply skim over. Thank you very much.
Your explanations are simply great! I do recommend you to return back to TH-cam covering latest papers in this field :)
This is the first source I was able to find that explained the math behind diffusion models in a comprehensible way instead of glossing over it. Thanks a lot, you have earned my like and subscribe with just this video alone!
What an amazing video!! I looked everywhere for a comprehensible video about Diffusion Models and yours was simply the best… Please keep up the effort and the great content :)
Thank you so much for delving deep into the math. I'm an engineer (not software) and self-learning AI. The papers are unfortunately not written in the most explainable way, and even though I've taken high level math courses for my degree, the notation and terminology in the papers make it pretty inaccessible and frustrating to follow. Thanks for going through this paper, I hope you continue to make more videos.
Most videos do not going into the mathematics, or are explained in a dry slideshow manner. This is really something else.
Absolute king! Your work is such an important part of this community
This is the best explanation I have found so far. Thank you.
Thank u for the detailed explaination, looking forward for your pytorch implementation video!
Thank you so much. I actually just recently worked out a lot of this math a couple weeks ago for a model I'm building and this video would've saved me so much time. Very clear. Thank you 🙏
This video is amazing. I think the format of your video was incredible, you went over the literature and told us how we got there, you went over the high-level explanation then got into the nitty-gritty detail and then just in case we miss something you gave an amazing recap. This is how all videos on deep learning should be. Especially as we're getting into more Niche topics.
Brilliant approach of lining up equations into a story, great work, thanks!
Wow……. Haven’t read math in a while, this was explained excellently. I have a masters degree in physics but don’t do much math anymore since my degree in 2017.
I really like how much detail you went into with the derivations and the pausing to ground what we are doing with some intuition. Well done man 🎉
Wow! Amazing job explaining diffusion models and why they use the math they do.
Best video on diffusion model right now because of the math derivation of everything. Thank you!
Explaining the mathematical reasoning and formulas behind the model in such detailed fashion is amazing , keep up your good work
This is an amazing video. I've gone through many videos to get the intuition behind the diffusion model, but nothing never helped. You did a great job simplifying the entire process.
You're the GOAT man, very great summary of diffusion
Thanks for the video. Can someone explain why we can do the KL divergence step at 19:55? To me you haven't taken the integral of the expression across all samples and there's no q(x_T|x_0) in front of the first term for example, so why can we do this?
Best explanation I've seen so far. Though notation in math derivation section is still poorly explained... I understand every step in derivation, but don't always understand what each term logically means.
Can you give some examples? :3
Thanks for the simple but detailed explanation! I wouldn't be able to understand the topic without your video.
The most clear explanation I’ve seen on YT. Much more clear than that from MIT lectures lol
Many thanks
I started reading articles and looking for learning content on diffusion modelling and the notation seemed a bit difficult. However, I am only half way through this video and I can assure you that this video is a must watch. Very clear explanation, I will recommend it to anyone interested in exploring this field, congratulations on your work!
This is my first time leaving a comments under a ML tutorial YT channel. The explanation is amazing intuitive, thanks for sharing your knowledge and creating this video!
So nice to hear that thank you!
Easily the best video on Diffusion models. Great work!
i just watched like 5 of these videos on this subject, specifically the math. This was the best one by far. You should teach.
what a wonderful and thoughtful way to deliver the whole langscape of the diffusion model! Nice video! 👍
Appreciate the effort you put into this. You definitely can teach. If only I have a brain to understand math... still got some bits here and there. Thanks
This is the first ever video of you that I get to see. Congrats, truly amazing. I believe you are among the first people on YT to dig into the math equations of ML papers like this, and I believe it's truly valuable. Keep it up!
U really liked that you showed the derivation in an understandable way
Kudos to you. Hats off to explain such a topic with so much ease even though the math equations looks scary at first. You made it real easy. Great work
Would have upvoted several times. Yours is the first video I found that actually goes into the math. Others just slap it onto the screen as fact, dazzling and confusing the viewer.
just the best expanation by far I have seen in days of searching. congrats
The explaination about loss function, especially the part of KL divergence, is amazing! I love your video!
You are the Outlier we cannot miss! Real gem. Thanks for the explanation man!
Excellent presentation. Great balance between depth and succinctness. Thank you!
this is by far the best video on diffusion models that explains the math clearly, great job!
This is one of the rare videos i wanted to like twice. Learning this in uni but im struggeling so hard, i think i am a mathy person but all those unexplained choices and variables, calculation stepps without knowing why... it made it so hard to more deeply understand the material.
But your video is just perfect, referencing the sam papers but now its all more childs play and fun to stop and follow. Its almost sad you only have so few videos but at least the quality is through the roof.
Great video!! Keep them coming thank so much! I’m curious what’s your background?
Thank you so much! I’m currently in my bachelors studying AI (it’s a real major in Germany). Apart from that I started 4 years ago and been mostly active in the generative field for the past now.
This is the best video I have ever watched that can explain diffusion models so clear even to someone like me :P
Just the video that I needed, thanks so much!!!
Amazing video; thanks a lot for going in depth on the math with simplified animations!
Really great video. We need more videos like this. Helped me understand cryptic papers which can be very frustrating...
Thanks for the video, very neat explanation. May I suggest, when you explain the forward process the second equation in 13:02 is q(x_{t}|x_{t-2}) ... up to q(x_{t}|x_{0}) for the final formula. Also the derivation of the chain rule is not entirely obvious, it took me some time to find the answer. The answer is that the variance of summation of the two normal gaussians is equal to the sum of variances. This is how you get rid of the square root and the sum of variances give the expected result of 1 - a_{t}a_{t-1}...a_{1}.
Man, this is incredible. When I saw these equations in the paper and other sources I was like "no way I am gonna understand that".. but with this video it all makes sense. Brilliantly done, thank you so much for your work. Instant subscribe and I am going to check other content on your channel :D
Excellent video! Very clear derivation, and good animation. You are a good teacher with loads of patience, and guided us step by step!
I just watched your video on diffusion models, and I am incredibly impressed with the depth of information you provided. Your explanation was clear, concise, and immensely helpful. Thank you for sharing your knowledge on this topic. I learned a lot from your video and I truly appreciate your efforts in creating such valuable content.
Thank you for making such a high quality video explaining the math. Often, other channels do not emphasize on the math and this video is perfectly putting light on how exactly the math fits in diffusion models. Thank you for your amazing work. Please, make more such content!
You have a superpower of explaining math. Really enjoyed it.
Thank you. Your explanation has been profoundly enlightening and exceptionally lucid, providing me with a comprehensive understanding.
Greatly explained the papers and it's depend topics 👏👏👏
Video is really well made. You did well to summarize to keep things simple and explanatory.
I remember coming at this video a month ago to understand diffusion models, getting overwhelmed and lost by te scary tons of maths formulae, Now after reviewing the necesary math concept, Realized how beautifully you've put it all together....Amazing
OMG this is insanely complex thing i've ever learned yet in ML/AI and tho I see I still gotta spend some time in it but kuddos u've done a super amazing job!
Thank you so much, super happy the video helped you!!!
@@outliier brother there's a slight confusion. In Algo#2 , we already sampled a random noise x_t , and remove a predicted noise to obtain x_t-1, then why do we add another random noise z and what is even that z for ?
@@curiousseeker3784 when you have x_t and you predict the noise you get an approximation for x0. This however doesn’t look so good, thats why you add noise again until x_t-1 and then repeat the process. So you have an iterative sampling process.
@19:57 How do we get the KL divergence terms? Isn't there supposed to be a expectation integral/sum somewhere?
A few comments below, Outlier posted:
```
Im going to cite a friend here: "During training we sample a batch of data from a distribution with probability p. So the global function to be optimized is a summation over the dataset p*log(p/q), which is an expectation of log(p/q) by definition."
```
Could you elaborate more the chanining of alphas in the forward process, q(x_t|x_{t - 1}), from 13:13 onwards?
Nice explaination in Math. Rarely see a such detailed diffusion model explaination video. Good job and thanks
Amazing! The visualization is great and easy to follow.
I really like your math part! Please keep going amazing work!
One of the best explanations here on TH-cam - thank you very much! 🥳
Thanks, the video was really helpful, it gave me such a great time in understanding diffusion models, kudos and keep on making such quality content!
I highly recommend his videos. He has a KISS-style presentation. KISS = keep it simple and straightforward.
Great expalanation, but at (16:27) there's taking an integral over dq missing when rewriting KL(q || p) ! Same at (16:57)
As a PhD student who also struggles with notations, THANK YOU!!
The math derivation part was amazing. really good.
If I could have just one note, I would've wished you spoke a bit slower, just a tiny bit.
But truly great work, much appreciated and waiting for more content.
Wow this is such a fantastic explanation. I love how you describe the intuitions behind the authors' mathematical choices.
love to see more good explaination for other model, your explaination is soo good
I was just using those tools to generate images but due to this video i got a lot more interested in understanding how they work. I hope you keep doing this kind of videos.
This is a really great video, thanks for your big effort explaining!
Thank you for this amazing and helpful video! It was a good entry point for me on my way to move from GANs to Diffusion Models for my future research during my PhD.
I love to hear that! Good luck with your PhD!
Great Video! Hands down the best explanation of DDPM’s math
Many thanks for this. I'm an artist with very limited math skills and though I can't say I understood the whole, your teaching gave me a solid basis and an understanding of this I've been wanting. You have another fan.
this video is *by far* the best video on diffusion models i've seen on youtube. this was very pleasant to watch and you made everything really clear. brilliant!! i subscribed and turned on notifications :)
have an amazing day :)
Great video! However, I think one more reason at 18:23 is that the conditioning on x0 should have been there from the start, even without the following derivation. It was somehow dropped since 17:23.
I think at 18:23 the conditioning on x0 in Baysian formula holds for q(xt|xt-1, x0) in general cases. However, it is probably that by Markovian property, q(xt|xt-1, x0) = q(xt|xt-1).
I appreciate your effort
It will pay you back one day
The video is perfect! Thank you so much. You helped me to understand better all the formulation! Thanks again!!
Just want to say thank you. I believe this is one of the most high-quality videos I have ever seen given on diffusion models! Keep it going. I have subscribed!
thank you so much!
you should create more of this videos...they are just so good... It must been time consuming. Maybe consider make some smaller topics or split one big topic into more videos. AMAZING JOB. I believe a high school can get the main points from this! GJ!
Thank you so much! The next video is on the way!
Hopping for more great contents .
Thanks for such amazing illustration for Diffusion. One question is about the equation in slice @ 13:16, how to get t-2 and t-3?
x_t=sqrt(a_t)*x_t-1+sqrt(1-a_t)*e
x_t-1=sqrt(a_t-1)*x_t-2+sqrt(1-a_t-1)*e
x_t=sqrt(a_t)*[sqrt(a_t-1)*x_t-2+sqrt(1-a_t-1)*e]+sqrt(1-a_t)*e=sqrt(a_t*a_t-1)*x_t-2+[sqrt(a_t-a_t*a_t-1)+sqrt(1-a_t)]*e
The rightmost term doesn't equal or close to sqrt(1-a_t*a_t-1)*e
Dis I misunderstand something? Thanks again. @Outlier
The detailed explanation is mindblowing. I learned a lot today. Thank You.❣
A slight mistake at 12:40 should write alpha_s instead of a_s. Thank you for this nice video.
Dang, this video is priceless, beating all the guru's wordy explanatory videos.
Hey, really awesome question! subscribed! But I have a problem I can't wrap my head around: at 13:10 when we go from x_t-1 to x_t-2: I understand the left hand side of the equation but can someone explain me why the right hand side is sqrt(1 - alpha_t * alpha_t-1) * epsilon? If you just substitute x_t-1 in the equation above I thought we would end up with : (sqrt(1 - alpha_t) * epsilon + sqrt(1 - alpha_t-1) * epsilon).
I understand that its supposed to "merge" the variance of two gaussian distributions but I just dont understand how you end up with the right hand side, if anyone could explain this to me I would be so thankful!!!
In this part, I also tried to derive the formula but can't get it too.
My derivation of the right hand side (the epsilon part) ended up to
(sqrt(alpha_t - alpha_t alpha_{t-1} + sqrt(1-alpha_t)) epsilon
Unless sqrt(a)+sqrt(b) = sqrt(a+b) (which is not true), I also can't get the sqrt(1-alpha_t alpha_{t-1}).
I wonder what I am missing
@@marcella.astrid this is the first time for me having a discussion over math on youtube. I will try to look into it. I found some rule empirically that shows that this acctualy is true, if you merge two gaussians, the second expectation is just sampled from the first gaussian with a certain factor, then the factor goes into the veriance of the new distribution. I actually made a jupyter notebook to try it with all kind of values I could send it to you if you want, but I still did not found the underlying rule that explains it. asked a lot of math students in real life but either they are too busy or dont know this rule too.
@@glatteraal2678 Is this derivation from the original paper ? Cause it seems odd if not wrong
I have explained it in another comment. The thing is that the epsilons are different normal distributions and cannot be threated as the same. You have to use some propertied of the normal distribution to end up with the formula.
@@marcella.astrid Recall that when we merge two Gaussians with different variance, e.g. sigma1^2 and sigma2^2, the variance of new distribution is (sigma1^2+sigma2^2).
In this example, the right hand side equals to
sqrt(alpha_t - alpha_t alpha_{t-1}) epsilon + sqrt(1-alpha_t) epsilon, which are two Gasussians merged together. The new variance is therefore, alpha_t - alpha_t alpha_{t-1} + 1 - alpha_t = 1 - alpha_t alpha_{t-1}
Awesome! Right what I was looking for. Thank you for the explanation !)
Thank you so much for making this video! It was very clear and I really appreciate how you walked through the math and the reasoning for how they went from the initial loss to writing it in terms of predicting the noise. Everything was well made. I look forward to watching your other videos!