I have a preliminary chapters. Let's see if TH-cam let me add them so that it is easier to improve on them. Chapters 00:00 - Intro 00:30 - Cosine Schedule (22_cosine) 06:05 - Sampling 09:37 - Summary / Notation 10:42 - Pedicting the noise level of noisy Fashion MNIST images (22_noise-pred) 12:57 - Why .logit() when predicting alpha bar t 14:50 - Random baseline 16:40 - mse_loss why .flatten() 17:30 - Model & results 19:03 - Why are we trying to predict the noise level? 20:10 - Training diffiusion without t - first attempt 22:58 - Why it isn’t working? 27:02 - Debugging (summmary) 29:29 - Bug in ddpm - paper that cast some light on the issue 38:40 - Kerras (Elucidating the Design Space of Diffusion - Based Generative Models) 49:47 - Picture of target images 52:48 - Scaling problem - (scalings) 59:42 - Training and predictions of modified model 1:03:49 - Sampling 1:06:05 - Sampling: Problems of composition 1:07:40 - Sampling: Rationale for rho selection 1:09:40 - Sampling: Denosing 1:15:26 - Sampling: Heun’s method fid: 0.972 1:19:00 - Sampling: LMS sampler 1:20:00 - Kerras Summary 1:23:00 - Comparison of different approaches 1:25:00 - Next lessons
@6:04 (th-cam.com/video/6Bta1tXRUfM/w-d-xo.html). Just to seek clarification, the 'denoise' function essentially calculates for x_0_hat (which is the unbiased estimate of x_0) at any given timestep, right? It's equivalent to making the best unbiased estimate for x_0 (i.e. completely denoised image in the original data distribution) in a single denoising step, as opposed to the recursive approach in the multi-step reverse diffusion process, which essentially iterates through a series of x_0_hat estimation and weigh average against the noisy image x_t at each step of the reverse/denoising process. A single denoising step, albeit unbiased, would produce pretty unsatisfactory outcome that's way off of the original data distribution, given the high variance of this approach.
@th-cam.com/video/6Bta1tXRUfM/w-d-xo.html Hope someone can shed some light and wisdom. Does the modified model incorporating c-skip look like it's consistently not able to denoise an image on the noisy end of the spectrum? Given the modified model's objective function towards noisier images is to emphasize on finding the original image, so does that mean it's not really doing what it's set out to do?
Great lecture, especially about the different samplers, thank you!
I have a preliminary chapters. Let's see if TH-cam let me add them so that it is easier to improve on them.
Chapters
00:00 - Intro
00:30 - Cosine Schedule (22_cosine)
06:05 - Sampling
09:37 - Summary / Notation
10:42 - Pedicting the noise level of noisy Fashion MNIST images (22_noise-pred)
12:57 - Why .logit() when predicting alpha bar t
14:50 - Random baseline
16:40 - mse_loss why .flatten()
17:30 - Model & results
19:03 - Why are we trying to predict the noise level?
20:10 - Training diffiusion without t - first attempt
22:58 - Why it isn’t working?
27:02 - Debugging (summmary)
29:29 - Bug in ddpm - paper that cast some light on the issue
38:40 - Kerras (Elucidating the Design Space of Diffusion - Based Generative Models)
49:47 - Picture of target images
52:48 - Scaling problem - (scalings)
59:42 - Training and predictions of modified model
1:03:49 - Sampling
1:06:05 - Sampling: Problems of composition
1:07:40 - Sampling: Rationale for rho selection
1:09:40 - Sampling: Denosing
1:15:26 - Sampling: Heun’s method fid: 0.972
1:19:00 - Sampling: LMS sampler
1:20:00 - Kerras Summary
1:23:00 - Comparison of different approaches
1:25:00 - Next lessons
@6:04 (th-cam.com/video/6Bta1tXRUfM/w-d-xo.html). Just to seek clarification, the 'denoise' function essentially calculates for x_0_hat (which is the unbiased estimate of x_0) at any given timestep, right? It's equivalent to making the best unbiased estimate for x_0 (i.e. completely denoised image in the original data distribution) in a single denoising step, as opposed to the recursive approach in the multi-step reverse diffusion process, which essentially iterates through a series of x_0_hat estimation and weigh average against the noisy image x_t at each step of the reverse/denoising process. A single denoising step, albeit unbiased, would produce pretty unsatisfactory outcome that's way off of the original data distribution, given the high variance of this approach.
@th-cam.com/video/6Bta1tXRUfM/w-d-xo.html
Hope someone can shed some light and wisdom. Does the modified model incorporating c-skip look like it's consistently not able to denoise an image on the noisy end of the spectrum? Given the modified model's objective function towards noisier images is to emphasize on finding the original image, so does that mean it's not really doing what it's set out to do?
17:00 falttened mse