Why Does Diffusion Work Better than Auto-Regression?

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ธ.ค. 2024

ความคิดเห็น • 395

  • @algorithmicsimplicity
    @algorithmicsimplicity  10 หลายเดือนก่อน +300

    Next video will be on Mamba/SSM/Linear RNNs!

    • @benjamindilorenzo
      @benjamindilorenzo 9 หลายเดือนก่อน +2

      great! Also maybe think about the Tradeoff between scaling and incremental improvements, in case your perspective is, that LLM´s also always approximate the data set and therefore memorize rather than any "emergent capabilities". So that ChatGPT also does "only" curve fitting.

    • @harshvardhanv3873
      @harshvardhanv3873 7 หลายเดือนก่อน +3

      I am student who is pursuing a degree in ai and we want more of your videos for even simplest of the concepts in ai, trust me this channel will be a huge deal in the near future, good luck!!

    • @QuantenMagier
      @QuantenMagier 7 หลายเดือนก่อน +1

      Well take my subscription then!!1111

    • @atishayjain1141
      @atishayjain1141 6 หลายเดือนก่อน

      From where did you learn, all these also have to tried to code for the same?

  • @doku7335
    @doku7335 7 หลายเดือนก่อน +599

    At first I thought "oh, another random video explaining the same basics and not adding anything new", but I was so wrong. It's an incredibly clear explanation of diffusion, and the start with the basic makes the full picture much clearer. Thank you for the video!

    • @gonfpv
      @gonfpv 6 หลายเดือนก่อน +7

      You should check the rest of his videos. All are of sublime quality

    • @pvic6959
      @pvic6959 6 หลายเดือนก่อน +4

      > makes the full picture much clearer
      hehe did it help denoise

    • @MinoriMirari-fans
      @MinoriMirari-fans 6 หลายเดือนก่อน

      I mean it's a bit over simplified...

    • @MinoriMirari-fans
      @MinoriMirari-fans 6 หลายเดือนก่อน

      Diffusion these days for example could implement any number of methods.

    • @MinoriMirari-fans
      @MinoriMirari-fans 6 หลายเดือนก่อน

      To know more of an advanced technical perspective you could join this server where we research and study on all forms of ai aspecialy generative ai prompting, theoretical ways to run computation of ai neutral networks and tandems such as quantum networks. We help also suggest and invent theoretical applications of the ai and also ways in which to enhance the systems ect.

  • @Paplu-i5t
    @Paplu-i5t 10 หลายเดือนก่อน +74

    This genius only makes videos occassionally, that are not to be missed.

  • @jupiterbjy
    @jupiterbjy 7 หลายเดือนก่อน +225

    kinda sorry to my professors and seniors but this is the single best explanation of logics behind each models. About dozen min vid > 2 years of confusion in univ

    • @talkingbirb2808
      @talkingbirb2808 2 หลายเดือนก่อน +1

      Yeah, it's great, but you also gotta understand that it's easier to digest such a great video after learning machine learning for some time. I learned machine learning 1,5 years ago and now I relearn it and everything seems so easy, while it was so confusing during my education at uni

  • @user-my3dd4lu2k
    @user-my3dd4lu2k 8 หลายเดือนก่อน +175

    Man I love the fact that you present the fundamental idea with an Intuitionistic approach, and then discuss the optimization.

    • @paperxplane1
      @paperxplane1 4 หลายเดือนก่อน +1

      I enjoyed the presentation for these aspects as well. My learning experience at university was similar to his approach so it made understanding the content very easy.

  • @rafa_br34
    @rafa_br34 7 หลายเดือนก่อน +32

    Such an underrated video, I love how you went from the basic concepts to complex ones and didn't just explain how it works but also the reason why other methods are not as good/efficient.
    I will definitely be looking forward to more of your content!

  • @jcorey333
    @jcorey333 10 หลายเดือนก่อน +9

    This is an amazing quality video! The best conceptual video on diffusion in AI I've ever seen.
    Thanks for making it!
    I'd love to see you cover RNNs.

  • @GianlucaTruda
    @GianlucaTruda 6 หลายเดือนก่อน +17

    Holy shit, at 11:03 I suddenly realised what you were cooking! I've been trying to find a way to articulate this interesting relationship between autoregression and diffusion for ages (my thesis developed diffusion models for tabular data). This is such a brilliantly-visualised and intuitively explained video! Well done. And the classifier-free guidance explanation you threw in at the end has got to be some of the most high-ROI intuition pumping I've seen on TH-cam.

  • @alexandergin8517
    @alexandergin8517 21 วันที่ผ่านมา +3

    THE best explanation of the motivation of diffusion models i have ever watched

  • @RicardoRamirez-dr6gc
    @RicardoRamirez-dr6gc 7 หลายเดือนก่อน +13

    This is seriously one of the best explainer videos i've ever seen. I've spent a long time trying to understand diffusion models and not a single video has come close to this one

  • @yqisq6966
    @yqisq6966 7 หลายเดือนก่อน +72

    The clearest and most concise explanation of diffusion model I've seen so far. Well done.

  • @pw7225
    @pw7225 7 หลายเดือนก่อน +32

    The way you tell the story is fantastic! I am surprised that all AI/ML books are so terrible at didactics. We should always start at the intuition, the big picture, the motivation. The math comes later when the intuition is clear.

    • @dustinandrews89019
      @dustinandrews89019 6 หลายเดือนก่อน +8

      I have seen the "math-first, intuition later or never" approach in a lot of teaching. High school and college math, physics and programming classes are rife with this approach. I agree it's sub-optimal for most students. I have some vague ideas about why this approach perpetuates itself and I have seen a lot of gatekeeping around learning in a bottom up way. It's lovely to see some educators like AlgorithmicSiplicity and Three Blue One Brown break things down in much more intuitive way that then allows us to understand the maths.

    • @fog1257
      @fog1257 6 หลายเดือนก่อน

      ​​@@dustinandrews89019I think the main reason is time. Most university courses are 8 weeks in my case and there simply isn't enough time to explain all the details in theory behind electronics or math for example. My learning is terrible when I am just given a formula for a particular problem, it's useless to me. Instead I end up spending days understanding who came up with the formula and why before I derive it myself and then I will never forget it since it becomes part of my intuition.
      Another reason I've noticed is sadly lack of deeper understanding from some teachers. They themselves only memoriesed the solution for the problem but they don't really fully understand the problem or the solution, in my opinion they are unfit for teaching. A teacher should never be worried about a student asking why.

  • @Gabr1elStark
    @Gabr1elStark 28 วันที่ผ่านมา +2

    This video really explains diffusion very clearly and the animation is really intuitive.

  • @erfanasgari21
    @erfanasgari21 6 หลายเดือนก่อน +8

    This is literally the best explanation of the diffusion models I have ever seen.

  • @santiagoarce5672
    @santiagoarce5672 5 หลายเดือนก่อน +3

    This is a beautiful work of explanation. You show why diffusion is better than the autoregression by deconstructing autoregression and gradually adding optimisations and ideas to end up with a basic diffusion model. (which is also meta, as deconstruction and reconstruction is what these networks do to learn too!)

  • @sobhhi
    @sobhhi 7 หลายเดือนก่อน +4

    I think it would help to mention that the auto-regressors may be viewing the image as a sequence of pixels (RGB vectors). Overall excellent video, extremely intuitive.

    • @algorithmicsimplicity
      @algorithmicsimplicity  7 หลายเดือนก่อน +3

      In general, auto-regressors do not view images as a sequence. For example, PixelCNN uses convolutional layers and treats inputs as 2d images. Only sequential models such as recurrent neural networks would view the image as a sequence.

    • @sobhhi
      @sobhhi 7 หลายเดือนก่อน +2

      @@algorithmicsimplicity of course, but I feel mentioning it may help with intuition as you’re walking through pixel by pixel image generation

  • @epiphenomenon
    @epiphenomenon 5 หลายเดือนก่อน +2

    Great video! One interesting point about diffusion models that I haven't seen discussed enough is that the noising process can be replaced with other (even deterministic!) image degradation transforms. See the 2022 paper by Bansal et. al, "Cold Diffusion." For example, they train a model using an "animorph" transform that interpolates between training images and random images from an animal photo dataset. Models trained on these quirky transforms still give very decent results.

    • @algorithmicsimplicity
      @algorithmicsimplicity  5 หลายเดือนก่อน +1

      Absolutely agreed, that paper is amazing. Also recently there was a paper using upscaling/downscaling as the information degrading transformation and it seemed to achieve very good results ( arxiv.org/abs/2404.02905 ).

  • @jasdeepsinghgrover2470
    @jasdeepsinghgrover2470 7 หลายเดือนก่อน +50

    This is a much better explanation than the diffusion paper itself. They just went all around variational inference to get the same result!

  • @jamesking2439
    @jamesking2439 4 หลายเดือนก่อน +3

    I really appreciate you taking the time to explain the motive for an approach rather than just explaining how it works.

  • @Jack-gl2xw
    @Jack-gl2xw 7 หลายเดือนก่อน +31

    I have trained my own diffusion models and it required me to do a deep dive of the literature. This is hands down the best video on the subject and covers so much helpful context that makes understanding diffusion models so much easier. I applaud your hard work, you have earned a subscriber!

    • @Real-HumanBeing
      @Real-HumanBeing 6 หลายเดือนก่อน

      You realize these models contain their dataset, right? And that’s the only way they can work.

  • @leeris19
    @leeris19 3 หลายเดือนก่อน +2

    This is by far the best explanation out there

  • @riddhimanmoulick3407
    @riddhimanmoulick3407 6 หลายเดือนก่อน +6

    Kudos for an incredibly intuitive explanation! Really loved the visual representations too!!

  • @Veptis
    @Veptis 7 หลายเดือนก่อน +9

    This is a great explanation on how image decoders work. I haven't seen this approach and narrative direction yet.
    This now makes my reference for explaining it to people that got no idea.!

  • @poipoi300
    @poipoi300 6 หลายเดือนก่อน +3

    This is refreshing to watch in a sea of people who don't know what they're talking about and decide to make "educational" videos on the subject anyways. The simplifications are often harmful.

  • @nasseral-bess564
    @nasseral-bess564 6 หลายเดือนก่อน +3

    This is actually one of the best if not the best deep learning related video on TH-cam
    Thanks for your efforts

  • @1.4142
    @1.4142 10 หลายเดือนก่อน +4

    Some2 really brought out some good channels

  • @gnorts_mr_alien
    @gnorts_mr_alien 4 หลายเดือนก่อน +2

    what an amazing explanation! world needs more "from first principles" explanations for everything, but for that we need people that understand in the first place. you are doing a huge service.

  • @TheParkitny
    @TheParkitny 4 หลายเดือนก่อน +3

    Great explanation. Please keep making more videos

  • @TTminh-wh8me
    @TTminh-wh8me 5 หลายเดือนก่อน +2

    Bro casually drops some of the most high quality machine learning contents out there.

  • @chloefourte3413
    @chloefourte3413 5 หลายเดือนก่อน +2

    watched this after reading the 2017 distill blogpost on Feature Visualisation. Extremely helpful in filling in the gaps of parts of the process that went over my head. Thank you!

  • @Frdyan
    @Frdyan 7 หลายเดือนก่อน +7

    I have a graduate degree in this shit and this is by far the clearest explanation of diffusion I've seen. Have you thought about doing a video running over the NN Zoo? I've used that as a starting point for lectures on NN and people seem to really connect with that paradigm

  • @MichaelBrown-gt4qi
    @MichaelBrown-gt4qi 6 หลายเดือนก่อน +1

    This is a great video. I have watched videos in the past (years ago) talk about auto-regression and more lately talk about diffusion. But it's nice to see why and how there was such a jump between the two. Amazing! However, I feel this video is a little incomplete when there was no mention of the enhancer model that "cleans up" the final generated image. This enhancing model is able to create a larger image while cleaning up the six fingers gen AI is so famous for. While not technically a part of the diffusion process (because it has no random noise) it is a valuable addition to image gen if anyone is trying to build their own model.

  • @MeriaDuck
    @MeriaDuck 7 หลายเดือนก่อน +2

    This must be one of the best and concise explanations I've seen!

  • @alenqquin4509
    @alenqquin4509 6 หลายเดือนก่อน +3

    A very good job, I have deepened my understanding of generative AI

  • @vineetgundecha7872
    @vineetgundecha7872 หลายเดือนก่อน

    Insightful video! I'd like to point out that generating images auto-regressively is also a feasible approach and has been done in multiple techniques, most notable in DALL-E 1. However, auto-regression happens in a compressed latent space instead of in the pixel space.

  • @mattshannon5111
    @mattshannon5111 6 หลายเดือนก่อน +2

    Wow, it requires really deep understanding and a lot of work to make videos this clear that are also so correct and insightful. Very impressive!

  • @arseniykuznetsov1265
    @arseniykuznetsov1265 5 หลายเดือนก่อน +2

    Very clear and concise explanation, bravo!

  • @yuelinxin3684
    @yuelinxin3684 4 หลายเดือนก่อน +2

    Best explanation video on diffusion, hats off.

  • @shivamkaushik6637
    @shivamkaushik6637 7 หลายเดือนก่อน +1

    Never knew youtube could give random suggestion to videos like these. This was mind blowing. The way you teach is work of art.

  • @istoleyourfridgecall911
    @istoleyourfridgecall911 6 หลายเดือนก่อน +1

    Hands down the best video that explains how these models work. I love that you explain these topics in a way that resembles how the researchers created these models. Your video shows the thinking process behind these models, combined with great animated examples, it is so easy to understand. You really went all out. Only if youtube promoted these kinds of videos instead of brainrot low quality videos made by inexperienced teenagers.

  • @benjamin6729
    @benjamin6729 5 หลายเดือนก่อน +2

    Such a clear video, I was researching this before it was well documented in videos like these. Liked and subscribed!

  • @cust-qd8kn
    @cust-qd8kn 2 หลายเดือนก่อน +1

    You answered so many questions I had in my head. That’s the coolest explanation video I’ve ever seen!

  • @londonl.5892
    @londonl.5892 6 หลายเดือนก่อน +2

    So glad this came across my recommended feed! Fantastic explanation and definitely cleared up a lot of confusion I had around diffusion models.

  • @kaushaljani814
    @kaushaljani814 4 หลายเดือนก่อน +2

    nice explanation of diffusion process apart from classic physics driven intuition.Great work!!!

  • @updated_autopsy_report
    @updated_autopsy_report 6 หลายเดือนก่อน +2

    I really enjoyed this video!! took a lot of notes while watching it too. you have a god tier ability to explain concepts in an easy to follow way

  • @wormjuice7772
    @wormjuice7772 6 หลายเดือนก่อน +2

    This has helped me so much wrapping my head around this whole subject!
    Thank you for now, and the future!

  • @themodernshoe2466
    @themodernshoe2466 6 หลายเดือนก่อน +2

    This has been on my watch later for 3 months. Finally got to watching it, glad I did. This is an exceptional explanation of the technologies at play here.

  • @pseudolimao
    @pseudolimao 7 หลายเดือนก่อน +23

    this is insane. I feel bad for getting this level of content for free

  • @karlnikolasalcala8208
    @karlnikolasalcala8208 7 หลายเดือนก่อน +5

    This channel is gold, I'm glad I've randomly stumbled across one of your vids

  • @akashmody9954
    @akashmody9954 10 หลายเดือนก่อน +2

    Great video....already waiting for your next video

  • @project_sayo
    @project_sayo 6 หลายเดือนก่อน +2

    wow, this is such an amazing resource. I'm glad I stuck around. This is literally the first time this is all making sense to me.

  • @HD-Grand-Scheme-Unfolds
    @HD-Grand-Scheme-Unfolds 7 หลายเดือนก่อน +9

    You truly understand how to simplify... to engage our imagination... to employ naive thought or ideas to make comparisons to bring across a deeper more core principles and concepts to make the subject for more easier to grasp and get an intuition for. Algorithmic Simplicity indeed... thank you for your style of presentation and teaching. love it love it... you make me know what question I want to ask but didn't know I wanted to ask. TH-cam needs your contribution in ML education. please don't forget that.

  • @neonelll
    @neonelll 6 หลายเดือนก่อน +2

    The best explanation I've seen. Great work.

  • @JordanMetroidManiac
    @JordanMetroidManiac 7 หลายเดือนก่อน +2

    I finally understand how models like Stable Diffusion work now! I tried understanding them before but got lost at the equation (17:50), but this video describes that equation very simply. Thank you!

  • @TheTwober
    @TheTwober 6 หลายเดือนก่อน +2

    The best explanation I have found on the internet so far. 👍

  • @aloufin
    @aloufin 3 หลายเดือนก่อน +2

    audio is mentioned very briefly in 0:24, would love to have a video showing how text can be transformed not into pictures, but audio of songs... and somehow get us guitar solos, saxaphone, standup comedy routines, etc... I'm thinking of the wild stuff we see on udio or suno

    • @algorithmicsimplicity
      @algorithmicsimplicity  3 หลายเดือนก่อน +2

      Udio and Suno don't say publicly how their models work, but there are basically 2 approaches to generating audio: 1) is you use an encoder module to map sound waves into a sequence of discrete tokens, and then training an auto-regressive transformer on those tokens. 2) is you just apply diffusion to the frequency spectrogram of the audio (use Fourier transform to convert sound waves into frequency images, then do diffusion in exactly the same way as image diffusion). In either case, the generative mechanism is identical to the auto-regression or diffusion covered in this video, so I don't feel like its worth covering separately. If there's anything unique to audio that you are aware of, I would be interested in hearing it.

  • @banana_lemon_melon
    @banana_lemon_melon 7 หลายเดือนก่อน +2

    bruh, I loved your contents. Other channel/video usually explain general knowledge that can be easily found on internet. But you're going deeper to the intrinsic aspects of how the stuff works. This video, and one of your video about transformer, are really good.

  • @ecla141
    @ecla141 7 หลายเดือนก่อน +3

    Awesome video! I would love to see a video about graph neural networks

  • @Matyanson
    @Matyanson 7 หลายเดือนก่อน +4

    Thank you for the explanation. I already knew a little bit about diffusion but this is exactly the way I'd hope to learn. Start from the simplest examples(usually historical) and progresivelly advance, explaining each optimisation!

  • @chocobelly
    @chocobelly 3 หลายเดือนก่อน +2

    This dude just helped me understand what I couldn't From reading a couple of papers.

  • @justanotherbee7777
    @justanotherbee7777 10 หลายเดือนก่อน +4

    A person with very less background can understand what he describes here.. commenting to make youtube so it gets recommended for other ..
    wonderful video! really good one

  • @李勇-x2s
    @李勇-x2s 6 หลายเดือนก่อน +2

    Very good video. I get to konw the straigforward reason: why diffusion idea emerges and why diffusion is intrinsically better than autogression algorithm.

  • @kkordik
    @kkordik 6 หลายเดือนก่อน +2

    Bro, this is amazing!!! Your explanation is so clear, like it

  • @Lexxxco1
    @Lexxxco1 17 วันที่ผ่านมา +1

    Simple and great explanation! Interesting to see about Diffusion-transformer architectures like Flux1. Your visualizations are great

  • @deep.space.12
    @deep.space.12 6 หลายเดือนก่อน +3

    If there will be a longer version of this video, it might be worth mentioning VAE as well.

  • @siliconhawk
    @siliconhawk 4 หลายเดือนก่อน +2

    subbed 👍👍 keep bringing more technical videos i love em

  • @julienducrey1472
    @julienducrey1472 หลายเดือนก่อน +1

    Excellente vidéo, les explications sont claires et parfaitement imagées. Les concepts et les idées clés sont bien ammenés et forment un cheminement entièrement cohérent, ce qui aide vraiment à suivre facilement. Le contenu est très complet. Merci et encore Bravo !

  • @mrdr9534
    @mrdr9534 7 หลายเดือนก่อน +2

    Thanks for taking the time and effort of making and sharing these videos and Your knowledge.
    Kudos and best regards

  • @yk4r2
    @yk4r2 7 หลายเดือนก่อน +2

    Hey, could you kindly recommend more on causal architectures?

    • @algorithmicsimplicity
      @algorithmicsimplicity  7 หลายเดือนก่อน

      I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference.
      Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.

  • @agustinbs
    @agustinbs 7 หลายเดือนก่อน +2

    This video is better than go to the MIT for machine learning degree. Man this is gold, thank you so much

  • @iestynne
    @iestynne 7 หลายเดือนก่อน +2

    Wow, fantastic video. Such clear explanations. I learned a great deal from this. Thank you so much!

  • @IsaOzer-lx7sn
    @IsaOzer-lx7sn 7 หลายเดือนก่อน +2

    I want to learn more about the causal architecture idea for auto regressors, but I can't seem to find anything about them anywhere. Do you know where I can read more about this topic?

    • @algorithmicsimplicity
      @algorithmicsimplicity  7 หลายเดือนก่อน +2

      I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference.
      Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.

  • @BooBar2521
    @BooBar2521 6 หลายเดือนก่อน +2

    Boah what a good explanation. I alwa6was wondering how these big NN like chatgpt and dalle are working. Thank you

  • @Yala_yala_joonom_yala
    @Yala_yala_joonom_yala หลายเดือนก่อน +1

    Such a perfect video! Thanks for the good work. Please keep doing it.

  • @Keytotransition
    @Keytotransition 6 หลายเดือนก่อน +2

    You’re him 🙌🏽. Thank you so much. Getting this kind of information or well explanation is not easy with all the “BREAKING AI NEWS !😮‼️” on TH-cam now.

  • @snippletrap
    @snippletrap 6 หลายเดือนก่อน +1

    Fantastic explanation. Very intuitive

  • @vidishapurohit4709
    @vidishapurohit4709 6 หลายเดือนก่อน +1

    very nice visual explanations

  • @alirezaghazanfary
    @alirezaghazanfary 7 หลายเดือนก่อน +2

    thanks to very good video
    I have a question:
    can't we make a model that decrease the resolution of a picture (for example a 4*4 picture to a 2*2 and to 1*1 picture) and run it reverse (generate a 2*2 from 1*1 and 4*4 from 2*2) ?
    would this model works?

    • @algorithmicsimplicity
      @algorithmicsimplicity  7 หลายเดือนก่อน +2

      Yes you absolutely could, and according to this paper: arxiv.org/abs/2404.02905v1 it works pretty well.

  • @ComunidadLATAMAI
    @ComunidadLATAMAI 2 หลายเดือนก่อน +1

    Excellent video and explanation!!

  • @dmitrii.zyrianov
    @dmitrii.zyrianov 6 หลายเดือนก่อน +1

    Hey! Thanks for the video, it is very informative! I have a question. At 18:17 you say that an average of a bunch of noise is still a valid noise. I'm not sure why it is true here. I'd expect the average of a bunch of noise to be just 0.5 value (if we map rgb values to 0..1 range)

    • @algorithmicsimplicity
      @algorithmicsimplicity  6 หลายเดือนก่อน

      Right, the average is just the center of the noise distribution which, let's say the color values are mapped from -1 to 1, is 0. This average doesn't look like noise (it is just a solid grey image), but if you ask what is the probability of this image under the noise distribution, it actually has the highest probability. The noise distribution is a normal distribution centered at 0, so the input which is all 0 has the highest probability. So the average image still lies within the noise distribution, as opposed to natural images where the average moves outside the data distribution

    • @dmitrii.zyrianov
      @dmitrii.zyrianov 6 หลายเดือนก่อน

      Thank you for the reply, I think I got it now

  • @not_a_human_being
    @not_a_human_being 5 หลายเดือนก่อน +1

    Makes perfect sense! Perfect kind of tutorial! :)

  • @iwaniw55
    @iwaniw55 7 หลายเดือนก่อน +1

    Hi @algorithmicsimplicity, I am curious which papers/material did you reference for the general autogressor? I cannot seem to find any info on using random spaced out pixels to predict the next batch of pixels. Any help would be appreciated. Also great videos!!!

    • @algorithmicsimplicity
      @algorithmicsimplicity  7 หลายเดือนก่อน +1

      It is more widely known as "any-order autoregression", see e.g. this paper arxiv.org/abs/2205.13554

    • @iwaniw55
      @iwaniw55 7 หลายเดือนก่อน

      @@algorithmicsimplicity Thank you so much! This is exactly what I was missing.

  • @recklessroges
    @recklessroges 7 หลายเดือนก่อน +1

    Could you explain why the YOLO image classify is/was so effective? Thank you.

  • @abdelhakkhalil7684
    @abdelhakkhalil7684 7 หลายเดือนก่อน +1

    This was a good watch, thank you :)

  • @CodeMonkeyNo42
    @CodeMonkeyNo42 7 หลายเดือนก่อน

    Great video. Love the pacing and how you distiled the material into such an easy to watch video. Great job!

  • @ikechianyanwu8993
    @ikechianyanwu8993 2 หลายเดือนก่อน +1

    I really liked this conclusion

  • @xaidopoulianou6577
    @xaidopoulianou6577 7 หลายเดือนก่อน +1

    Very nicely and simply explained! Keep it up

  • @jayantdubey3025
    @jayantdubey3025 6 หลายเดือนก่อน +1

    In your neural network animations, the traveling highlight starts from the image, goes through the neural net, then to the output pixel. I understand this as information traveling forward. When the highlights reverse direction, does this represent back propagation at the regressed value of the pixel? Great video by the way!

    • @algorithmicsimplicity
      @algorithmicsimplicity  6 หลายเดือนก่อน

      Yep it's just meant to demonstrate the weights in the network changing based on the error in the predicted value.

  • @zephilde
    @zephilde 7 หลายเดือนก่อน +3

    Great visualisation! Good job!
    Maybe next video on LoRA or ControlNet ?

  • @sanjeev.rao3791
    @sanjeev.rao3791 7 หลายเดือนก่อน +1

    Wow, that was a fantastic explanation.

  • @PaulG106
    @PaulG106 3 หลายเดือนก่อน +1

    Thanks!

    • @algorithmicsimplicity
      @algorithmicsimplicity  3 หลายเดือนก่อน

      Thank you so much!

    • @PaulG106
      @PaulG106 3 หลายเดือนก่อน +1

      @@algorithmicsimplicity Thank YOU! This is the first video that finally explained this to me. Everywhere else they mostly cover the forward process without explaining why, where it came from it and what is intuition behind it - but instead they have a lot of math with KL divergence, gaussians, etc. So I usually understand everything until we get to pure noise. And then I completely lose the thought line. You really broke this down into easy to understand pieces. Thanks again!

  • @RobotProctor
    @RobotProctor 7 หลายเดือนก่อน +3

    I like to think of ML as a funky calculator. Instead of a calculator where you give it inputs and an operation and it gives you an output, you give it inputs and outputs and it gives you an operation.
    You said it's like curve fitting, which is the same thing, but I like thinking the words funky calculator because why not

  • @aaronhandleman7277
    @aaronhandleman7277 6 หลายเดือนก่อน +1

    A paper about doing autoregression with images that seems to work pretty well dropped after this video - would be interested in your thoughts:
    Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

    • @algorithmicsimplicity
      @algorithmicsimplicity  6 หลายเดือนก่อน

      Yep I read that paper recently. Seems like a really solid idea: instead of using noise to remove information, down sample (i.e. blur) the image to remove information. This also has the property that it removes information from everywhere in the image, so it should give near optimal compute-vs-quality trade off, but it has the advantage that the image size is smaller for the generation steps. I would wait to see a few more reproductions of it before claiming that it is better than diffusion, though.

  • @benjamindilorenzo
    @benjamindilorenzo 9 หลายเดือนก่อน +11

    Very good job.
    My suggestion is that you explain more about how it actually works, that the model learns to understand complete sceneries just from text prompts.
    This could fill its own video.
    Also it would be very nice to have a video about Diffusion Transformers like OpenAIs Sora probably is.
    Also it could be great to have a Video about the paper "Learning in High Dimension Always Amounts to Extrapolation".
    best wishes

    • @algorithmicsimplicity
      @algorithmicsimplicity  9 หลายเดือนก่อน +9

      Thanks for the suggestions, I was planning to make a video about why neural networks generalize outside their training set from the perspective of algorithmic complexity. That paper "Learning in High Dimension Always Amounts to Extrapolation" essentially argues that the interpolation vs extrapolation distinction is meaningless for high dimensional data, and I agree, I don't think it is worth talking about interpolation/extrapolation at all when explaining neural network generalization.

    • @benjamindilorenzo
      @benjamindilorenzo 9 หลายเดือนก่อน +3

      @@algorithmicsimplicity yes true. It would be great also because this links back to the LLM´s discussions, wether scaling up Transformers actually brings up "emergent capabilities", or if this is simple and less magical explainable by extrapolation.
      Or in other words: either people tend to believe, that Deep Learning Architectures like Transformers only approximating their training data set, or people tend to believe, that seemingly unexplainable or unexpected capabilities emerge while scaling.
      I believe, that extrapolation alone explains really good why LLM´s work so well, especially when scaled up AND that LLM´s "just" approximate their training data (curve fitting). This is why i brought this up ;)

  • @lialkalo4093
    @lialkalo4093 6 หลายเดือนก่อน +1

    very good explanation

  • @CppExpedition
    @CppExpedition 4 หลายเดือนก่อน +1

    WONDERFUL EXPLANATION!
    -> PLEASE PEOPLE HOLD PATIENCE UNTIL MINUTE 7:40 🤯

  • @arnauds3161
    @arnauds3161 4 หลายเดือนก่อน +1

    Amazing video ! First time I saw it explained in such a comprehensible way :D. I was really wondering from where the idea of diffusion came from. Thanks for this explanation.
    I'm still not sure how the fact that predicting the noise at each steps gets away with the issue mentioned for auto-regression. Like would the model not just output the average noise seen during training like the auto-regressor would the average ?

    • @algorithmicsimplicity
      @algorithmicsimplicity  4 หลายเดือนก่อน

      The model outputs the average of valid labels for the input. At the early stages of generation, the input is almost entirely noise. At this point, there is only one valid label for the noise (which is essentially just the input itself). Later on, as the image becomes clearer, there is more uncertainty in what the noise label is, so the model will average over possible noise values. But the average of a bunch of different nose is just the zero vector (more generally the canter of the normal distribution from which they are sampled). And the zero vector is itself a valid noise input. So when you average a bunch of noise, the result is still within the noise distribution. When you average a bunch of images you get a blurry mess (which is not part of the valid image distribution).

  • @iancallegariaragao
    @iancallegariaragao 10 หลายเดือนก่อน +2

    Great video and amazing content quality!

  • @zlatanonkovic2424
    @zlatanonkovic2424 6 หลายเดือนก่อน +1

    What a great explanation!

  • @anatolyr3589
    @anatolyr3589 8 หลายเดือนก่อน +2

    Great explanation!👍👍, I personally would like to see a video observing all major types of neural nets with their distinctions, specifics, advantages, disadvantages etc. the author explains very well 👏👏