Fantastic talk, and a really good paper. Over 400 citations in two years... a lot to sift through! Would love to see a follow up! I need to dive into the guidance next, this just leaves me wondering why we don't use clip to guide the diffusion of a low resolution semantic map first, that captures the structure, meaning, long range patterns easily, maybe even a course depth map too, and then use that to guide the diffusion of a latent, which then gets intelligently upscaled guided by the semantics.
this video is criminally underrated, thanks for your insights!
Fantastic talk, and a really good paper. Over 400 citations in two years... a lot to sift through! Would love to see a follow up! I need to dive into the guidance next, this just leaves me wondering why we don't use clip to guide the diffusion of a low resolution semantic map first, that captures the structure, meaning, long range patterns easily, maybe even a course depth map too, and then use that to guide the diffusion of a latent, which then gets intelligently upscaled guided by the semantics.
Really clear explanation of how the diffusion network works !! Thanks
Thanks for creating this video it's amazing
Intelligent insights.
Nice talk