Thank you Jonathan! This is so amazing. Really appreciate the way you break it down into easy-to-digest pieces. The walkthrough is truly a badass black magic.
This is a great dive in after Jeremy's live whiteboard drawings to convey the key ideas, thanks. I am starting to "grok" things a bit better after two sessions. Fascinating.
This was indeed a run-through haha. I think I'll need to walk-through this notebook line by line :) Nevertheless, this video and especially the sampling portion helped a lot with my intuition behind some of the concepts so thank you!
Errata: there should be some scaling done to the model inputs for the unet demo in cell 49 (19 minutes in) - see scheduler.scale_model_input in all the loops for the code that is missing.
Oh and in the autoencoder part the 'compression' isn't exactly 64 times since there are 4 channels in the latent representation and only 3 in the input :)
I was also thinking that the input would be 3 channels of bytes (24 bit colour), while the latent might use 4 channels of 32 bit floats. That would be 12 times compression overall. Still pretty good, and I think we can optionally use float16 with the latents which would be 24 times compression. I read that JPEG typically achieves 10 times compression.
Outstanding explanation, cleared the mist around process. However, I am unable to find roots for q(xₜ|xₜ₋₁) = N(xₜ; sqrt{1-βₜ}xₜ, βₜI). All discussions start with it as a base, but I would like to understand how mean and variance were derived. Could you please demystify this formula.
hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help
WOW, Johno's video is great. This video deserves at least a 1 million views.
Thank you Jonathan! This is so amazing. Really appreciate the way you break it down into easy-to-digest pieces. The walkthrough is truly a badass black magic.
This is a great dive in after Jeremy's live whiteboard drawings to convey the key ideas, thanks. I am starting to "grok" things a bit better after two sessions. Fascinating.
Wow! Wow!! This was a hell of a walkthrough! I rarely comment on any video, but I just wanted to stop here and say a thanks for making this
This was indeed a run-through haha. I think I'll need to walk-through this notebook line by line :) Nevertheless, this video and especially the sampling portion helped a lot with my intuition behind some of the concepts so thank you!
Awesome explanations. Great work. Thanks.
Errata: there should be some scaling done to the model inputs for the unet demo in cell 49 (19 minutes in) - see scheduler.scale_model_input in all the loops for the code that is missing.
Oh and in the autoencoder part the 'compression' isn't exactly 64 times since there are 4 channels in the latent representation and only 3 in the input :)
I was also thinking that the input would be 3 channels of bytes (24 bit colour), while the latent might use 4 channels of 32 bit floats. That would be 12 times compression overall. Still pretty good, and I think we can optionally use float16 with the latents which would be 24 times compression. I read that JPEG typically achieves 10 times compression.
Very Good.
Outstanding explanation, cleared the mist around process. However, I am unable to find roots for q(xₜ|xₜ₋₁) = N(xₜ; sqrt{1-βₜ}xₜ, βₜI). All discussions start with it as a base, but I would like to understand how mean and variance were derived. Could you please demystify this formula.
when approaching a manifold, what would happen if the approach was aligned the norm of the manifold surface?
👍
why he moves screen every 3 seconds, there's no need to do this
hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help