Stable Diffusion Deep Dive Notebook Run-through

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ม.ค. 2025
  • เกม

ความคิดเห็น • 15

  • @yunijkarki9088
    @yunijkarki9088 ปีที่แล้ว +1

    WOW, Johno's video is great. This video deserves at least a 1 million views.

  • @reneeliu1648
    @reneeliu1648 2 ปีที่แล้ว +9

    Thank you Jonathan! This is so amazing. Really appreciate the way you break it down into easy-to-digest pieces. The walkthrough is truly a badass black magic.

  • @jean-michelperraud4899
    @jean-michelperraud4899 2 ปีที่แล้ว +6

    This is a great dive in after Jeremy's live whiteboard drawings to convey the key ideas, thanks. I am starting to "grok" things a bit better after two sessions. Fascinating.

  • @sheikhshafayat6984
    @sheikhshafayat6984 ปีที่แล้ว

    Wow! Wow!! This was a hell of a walkthrough! I rarely comment on any video, but I just wanted to stop here and say a thanks for making this

  • @californiaxfresh
    @californiaxfresh ปีที่แล้ว +1

    This was indeed a run-through haha. I think I'll need to walk-through this notebook line by line :) Nevertheless, this video and especially the sampling portion helped a lot with my intuition behind some of the concepts so thank you!

  • @melonkernel
    @melonkernel 2 ปีที่แล้ว +2

    Awesome explanations. Great work. Thanks.

  • @datasciencecastnet
    @datasciencecastnet  2 ปีที่แล้ว +4

    Errata: there should be some scaling done to the model inputs for the unet demo in cell 49 (19 minutes in) - see scheduler.scale_model_input in all the loops for the code that is missing.

    • @datasciencecastnet
      @datasciencecastnet  2 ปีที่แล้ว +2

      Oh and in the autoencoder part the 'compression' isn't exactly 64 times since there are 4 channels in the latent representation and only 3 in the input :)

    • @ssw4m
      @ssw4m 2 ปีที่แล้ว

      I was also thinking that the input would be 3 channels of bytes (24 bit colour), while the latent might use 4 channels of 32 bit floats. That would be 12 times compression overall. Still pretty good, and I think we can optionally use float16 with the latents which would be 24 times compression. I read that JPEG typically achieves 10 times compression.

  • @philtoa334
    @philtoa334 ปีที่แล้ว

    Very Good.

  • @asheeshmathur
    @asheeshmathur ปีที่แล้ว +1

    Outstanding explanation, cleared the mist around process. However, I am unable to find roots for q(xₜ|xₜ₋₁) = N(xₜ; sqrt{1-βₜ}xₜ, βₜI). All discussions start with it as a base, but I would like to understand how mean and variance were derived. Could you please demystify this formula.

  • @seanriley3121
    @seanriley3121 11 หลายเดือนก่อน

    when approaching a manifold, what would happen if the approach was aligned the norm of the manifold surface?

  • @zhshen7981
    @zhshen7981 2 ปีที่แล้ว

    👍

  • @paperboi__
    @paperboi__ 2 หลายเดือนก่อน

    why he moves screen every 3 seconds, there's no need to do this

  • @manindermaan3695
    @manindermaan3695 9 หลายเดือนก่อน

    hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help