Stable Diffusion - How to build amazing images with AI

Serrano.Academy

มุมมอง 19 307

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 ต.ค. 2024

ความคิดเห็น • 48

@krajanna 8 หลายเดือนก่อน ⁺²
I am a fan of your work. I read your "Grokking Machine Learning". It's awesome. I am totally impressed. I stopped watching other AI videos and following you for most of the stuff. Simple and practical explanation. Thanks a lot and grateful for spreading the knowledge.
@thebigFIDDLES 10 หลายเดือนก่อน ⁺⁷
These videos are always incredibly helpful, informative, and understandable. Very grateful
@vipulsonawane7508 วันที่ผ่านมา
I am sharing this video to my students here in India, excellent work luis!🎉
@shafiqahmed3246 4 หลายเดือนก่อน ⁺¹
Serrano you are a genius bro your channel is so underrated
@MikeTon 7 หลายเดือนก่อน
Really incredible job of stepping through the HELLO WORLD of image generation, especially how the video compresses the key output a 4x4 pixel grid and clearly hand computes each step of the way!
@jasekraft430 8 หลายเดือนก่อน
Always impressed with how understandable, but detailed your videos are. Thank you!
@wanggogo1979 10 หลายเดือนก่อน ⁺²
Amazing, I hope to truly understand the mechanism of stable diffusion through this video!
@amirkidwai6451 9 หลายเดือนก่อน ⁺⁴
Arguably the greatest teacher alive
@SerranoAcademy 9 หลายเดือนก่อน
Thank you :)
@avijitsen8096 9 หลายเดือนก่อน ⁺¹
Superb, so elegant explanation. Big thanks Sir!
@anthonymalagutti3517 10 หลายเดือนก่อน ⁺³
excellent explanation - thank you so much
@enginnakus9550 10 หลายเดือนก่อน ⁺²
I respect your concise explaination
@kyn-ss4kc 10 หลายเดือนก่อน ⁺¹
Amazing!! Thanks for this high level overview. It was really helpful and fun 👍
@abhaymishra-uj6jp 7 หลายเดือนก่อน
Really amazing work easy to understand and grasp doing a great deal for the community thanks alot..
@NigusBasicEnglish 5 หลายเดือนก่อน
You are the best expainer ever. You are amazing.
@skytoin 9 หลายเดือนก่อน
Great video, it gives good intuition to deep network architecture. Thanks
@olesik 10 หลายเดือนก่อน ⁺²
So can we just use the diffusion model to denoise low quality or night time shots?
@SerranoAcademy 10 หลายเดือนก่อน ⁺¹
Yes absolutely, they can be used to denoise already existing images.
@hamidalavi5595 4 หลายเดือนก่อน
thank you for your amazing educational videos!
I have a questions though, is there any transformers (+ attention mechanism) involved in the text2image generator (the diffusion model)?
If no, then how the semantic in the text is captured??
@aswinosbalaji4224 4 หลายเดือนก่อน ⁺¹
In intermediate result it is said that after sigmoid, we will not get sharp image of ball and bat. How can there be fractional pixel values. Since it is monochromatic, it should be either in 0 or 1 right. Rounding off to nearest integer will give same result as before sigmoid. Even if it's not monochrome, pixels can't be in fractions right?
@BigAsciiHappyStar 5 หลายเดือนก่อน
Muy BALL-issimo 😄 Loved the puns!!!!!😋😋😋
@AravindUkrd 9 หลายเดือนก่อน
Thank you for such wonderful visualization that conveys an overview of complex mathematical concepts.
Can you please do a video detailing the underlying architecture of the neural network that forms the diffusion model?
Also, are Generative Adversarial Networks (GANs) not used anymore for image generation?
@melihozcan8676 8 หลายเดือนก่อน
Serrano Academy: The art of Understanding
Luis Serrano: The GOD of Understanding
@SerranoAcademy 8 หลายเดือนก่อน ⁺¹
Thank you so much, what an honour! :)
@melihozcan8676 8 หลายเดือนก่อน
@@SerranoAcademy Thank you, the honour is ours! :)
@samirelzein1095 10 หลายเดือนก่อน
Amazing deep dismantling job of complex structures. that s real ML/AI democratization.
@ASdASd-kr1ft 10 หลายเดือนก่อน
Could be that the diffusion model is trained to learn what amount of noise have to be removed from the input image instead the image with less noise? That is what i understended from others sources, cause they say that that is more easy for the model. Thank you, and good video, very enlightening
@olesik 10 หลายเดือนก่อน ⁺²
Thanks for teaching Mr Luis! I still remember fondly you teaching me machine learning basics over drinks in SF
@SerranoAcademy 10 หลายเดือนก่อน
Thanks Jon!!! Great to hear from you! How’s it going?
@reyhanehhashempour7157 10 หลายเดือนก่อน
Amazing as always!
@abhishek-zm7tx 7 หลายเดือนก่อน
Hi @Louis. Your videos are very informative and I love them. Thank you so much for sharing your knowledge with us.
I wanted to know if "Fourier Transforms in AI" is in your pipeline. I request you to please give some intuitions around that in a video. Thanks in advance.
@SerranoAcademy 7 หลายเดือนก่อน
Thanks for the suggestion! It's definitely a great idea. In the meantime, 3blue1brown has great videos on Fourier transformations, take a look!
@maxxu8818 7 หลายเดือนก่อน
Hello Serrano, is there paper like attention is all you need for Stable diffusion?
@SerranoAcademy 7 หลายเดือนก่อน ⁺¹
Good question, I'm not fully aware. There's this but I'm not 100% sure if it's the original: stability.ai/news/stable-diffusion-public-release
I always use this explanation as reference, there may be some good leads there jalammar.github.io/illustrated-stable-diffusion/
@maxxu8818 7 หลายเดือนก่อน
thanks @@SerranoAcademy 🙂
@NVHdoc 10 หลายเดือนก่อน
(at 17:25), the image on the right, baseball and bat should have 3 gray squares right? Very nice channel, I just subscribed.
@SerranoAcademy 10 หลายเดือนก่อน
Thank you! Yes, ball and bat should be three gray or black squares. Since these images are not so exact, there could also be dark gray, or some variations.
@AI_ML_DL_LLM 10 หลายเดือนก่อน
Finally the diffusion penny dropped for me, many thanks
@priyankavarma1054 10 หลายเดือนก่อน
Thank you so much!!!
@jojo-jay1 หลายเดือนก่อน
Thanks ❤
@850mph 5 หลายเดือนก่อน
This is wonderful…
Perhaps the best low-level description of the diffusion process I’ve seen….
But discrete images of bats and balls represented as single pixels- are a long way away from a PHOTO REALISTIC pirate standing on a ship at sunrise.
What I can’t get my head around is how these discrete images (which actually exist in the multi-dimensional data set space) are combined, really, grafted together (parts pulled from each existing image) into a single image with correct composition, scaling, coloring, shadows, etc.
If I lay even a specifically chosen (by the NN) bat and ball pictures over each other to produce a “fuzzy” combined image (composition) and then use another NN to sharpen the fuzzy image into a crisp composition with all the attributes defined in the prompt and pointed to by the embeddings….
There’s still too much magic inside the DIFFUSION black box which I just don’t understand…. Even understanding the denoising and self-attention processes.
@850mph 5 หลายเดือนก่อน
I guess what I have not been able to determine after watching maybe 30-35 hours of Diffusion videos.. is specifically how the black box COMPOSES a complicated scene BEFORE the process begins which “tightens” the image up by removing noise between the given and target in successive passes of the decoder.
I get the fact (one) that the prompts correspond to embeddings, and the embeddings point to some point in multi-dimensional space which contains all sorts of related info and perhaps a close image representation of the prompted request….. or perhaps not.
I get the fact (two) that the diffusion process is able to generate virtually any complicated scene starting from random noise when gently persuaded to a target by the prompt….
What I don’t understand is how the black box builds a complicated FUZZY image once the various “parts” of the composition are identified.
Does the composing process start with a single image if available in the dataset and scale individual attributes to correspond with the prompt…?
-or-
Does the composing process start with segmented attributes, scale all appropriately, and combine into a single image…?
A closer look at how the scene COMPOSITION works would be a great addition to your very helpful library of vids, thnx.
@850mph 5 หลายเดือนก่อน
Ok… for those with the same “problem…”
The missing part, at least for me, is the “classifier” portion of the model which I have NOT seen explained in the high-level Diffusion explanation vids.
This tripped me up…
Here is good vid and corresponding paper which helps understand the “feature” set extraction within the image convolution process which penultimately creates an “area/segment aware” data-set (image) which can be directed to include the visual requirements described in a text prompt.
th-cam.com/video/N15mjfAEPqw/w-d-xo.htmlsi=6sZxibtFvjrVNHeE
In a nutshell… the features extracted from each image are MUCH more descriptive than I had pictured allowing for much better interpolation, composition and reconstruction of multiple complex forms in each image.
Of course the queues to build these complex images all happen as the model interpolates its learned data, converging on the visual representation of the text prompt, somewhere in the multi-dimensional space which we can not comprehend… so in a sense it’s still all a black box.
I don’t pretend to understand it all… but it does give the gist of how certain abstract features within the models convolutional layers blow themselves up into full blown objects.
@850mph 5 หลายเดือนก่อน
Another good short vid which shows how diffusion accomplishes image COMPOSITION:
th-cam.com/video/xtlxCz349WU/w-d-xo.htmlsi=PJl_vWueiQdZxLn1
@850mph 5 หลายเดือนก่อน
Another good vid which gets into composition:
th-cam.com/video/3b7kMvrPZX8/w-d-xo.htmlsi=AwNQJAjABKn-iV4F
@850mph 4 หลายเดือนก่อน
Another good set of vids which get into IMAGE COMPOSITION:
th-cam.com/video/vyfq3SgXQyU/w-d-xo.htmlsi=ShiOXaQH_0baU8Z-
Especially helpful is the last vid.. url posted above.
@qwertyntarantino1937 9 หลายเดือนก่อน
thank you
@parmarsuraj99 10 หลายเดือนก่อน ⁺¹
🙏

ต่อไป

เล่นอัตโนมัติ

How Stable Diffusion Works (AI Image Generation)