Why do Convolutional Neural Networks work so well?

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 มิ.ย. 2024
  • While deep learning has existed since the 1970s, it wasn't until 2010 that deep learning exploded in popularity, to the point that deep neural networks are now used ubiquitously for all machine learning tasks. The reason for this explosion is the invention of the convolutional neural network. This remarkably simple architecture allowed neural networks to be trained on new kinds of data which were previously thought impossible.
    In this video I discuss what a convolutional neural network is, why it is needed, what it can and cannot do, and why it works so damn well.
    00:00 Intro
    01:18 The curse of dimensionality
    06:39 Convolutional neural networks
    13:09 The spatial structure of images
    15:06 Conclusion

ความคิดเห็น • 90

  • @dradic9452
    @dradic9452 ปีที่แล้ว +43

    Please make more videos. I've been watching countless neural networks videos and until I saw your two videos I was still lost. You explained it so clearly and concisely. I hope you make more videos.

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +10

      Thanks for the comment, it's great to hear you found the videos useful. I was unexpectedly busy with my job the past few months, but rest assured I am still working on the transformer video.

  • @ozachar
    @ozachar 9 หลายเดือนก่อน +11

    As a physicist, I recognize this process as "real space renormalization group" procedure in statistical mechanics. So each layer is equivalent to a renormalization step (a coarse graining). The renormalization flows are then the gradual flow towards a resolution decision of the neural net. It makes the whole "magic" very clear conceptually, and also automatically points the way for less trivial renormalization procedures known in theoretical physics (not just simple real space coarse graining). The clarity of videos like yours is so stimulating! Thanks

  • @IllIl
    @IllIl ปีที่แล้ว +41

    Dude, your teaching style is absolutely superb! Thank you so much for these. This surpasses any of the explanations I've come across in online courses. Please make more! The way you demystify these concepts is just in a league of its own!

  • @algorithmicsimplicity
    @algorithmicsimplicity  ปีที่แล้ว +52

    Transformer video coming next! I'm still getting the hang of animating, but the transformer video probably won't take as long to make as this one. I haven't decided what I will do after that, so if you have any suggestions/requests for computer science, mathematics or physics topics let me know.

    • @bassemmansour3163
      @bassemmansour3163 ปีที่แล้ว +1

      what program are you using for animation? thanks!

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +4

      I'm using the python package manim: github.com/ManimCommunity/manim

    • @davidmurphy563
      @davidmurphy563 ปีที่แล้ว +6

      I'd say probably RNNs would flow nicely from this [excellent] video. GANs too I guess. Autoencoders for sure. Oh, LSTMs, the memory problem is a fascinating one. Oh and Deep Q-Networks.
      Meh, the field is so broad you can't help but hit. I'd say RNNs first as going from images to text seems a natural progression.

    • @wissemrouin4814
      @wissemrouin4814 ปีที่แล้ว +2

      @@davidmurphy563 yess please, I guess RNNs needs to be presented even before transformers

    • @davidmurphy563
      @davidmurphy563 ปีที่แล้ว +2

      @@wissemrouin4814 Yeah, I would agree with you there. RNNs serve as a good introduction to a lot of the approaches you'll see for sequence-vector problems and its drawbacks explains the development of transformers.
      I'd suggest RNNs then LSTNs then transformers.
      That said, this channel has done sterling work explaining everything so far so I'm sure he'll do a great job even if he dives straight into the deep end.

  • @Number_Cruncher
    @Number_Cruncher ปีที่แล้ว +9

    This was a very cool twist in the end with the rearranged pixels. Thx, for this nice experiment.

  • @nananou1687
    @nananou1687 หลายเดือนก่อน +2

    This is genuinely one of the best videos I have ever seen! No matter the type of content. You have somehow made one of the most complicated topic, and simply distilled it to this. Brilliant!

  • @rohithpokala
    @rohithpokala 6 หลายเดือนก่อน +2

    Bro ,you are real super man.This video gave so many deep insights in just 15 mintues providing so much strong foundation. I can confidently say,this video single handedly throwed 1000's of neural networks videos present on the internet.You raised the bar so high for others to compete.Thanks.

  • @neilosborne8682
    @neilosborne8682 9 หลายเดือนก่อน +5

    @4:17 Why is it 9^N points required to densely fill N dimensions? Where is 9 being derived from? Is it for the purpose of the example given - or a more general constraint?

    • @algorithmicsimplicity
      @algorithmicsimplicity  9 หลายเดือนก่อน +4

      It is a completely arbitrary number just for demonstration purposes. In general, in order to fill a 1d interval of length 1 to a desired density d you need d evenly spaced points. To maintain that density for n-d volume you need d^n points. I just chose d=9 for the example. And the more densely filled the input space is with training examples, the lower the test error of a model will be.

    • @senurahansaja3287
      @senurahansaja3287 9 หลายเดือนก่อน

      @@algorithmicsimplicity thank you for ur explanation but in here th-cam.com/video/8iIdWHjleIs/w-d-xo.html dimensional points mean the input dimension right ?

    • @algorithmicsimplicity
      @algorithmicsimplicity  9 หลายเดือนก่อน +1

      @@senurahansaja3287 Yes that's correct.

  • @illeto
    @illeto หลายเดือนก่อน +3

    Fantastic videos.
    Here before you inevitably hit 100k subscribers.

  • @j.j.maverick9252
    @j.j.maverick9252 ปีที่แล้ว +7

    another superb summary and visualisation, thank you!

  • @khoakirokun217
    @khoakirokun217 หลายเดือนก่อน +2

    I love that you point out that we have "super human capability" because we are pre trained with assumption about the spatial information :D TLDR: "we are sucked" :D

  • @PotatoMan1491
    @PotatoMan1491 14 วันที่ผ่านมา +1

    Best video I found for explaining this topic

  • @Embassy_of_Jupiter
    @Embassy_of_Jupiter ปีที่แล้ว +6

    Your videos are extremely good, especially for such a small channel

  • @escesc1
    @escesc1 หลายเดือนก่อน +1

    This channel is top notch quality. Congratulations!

  • @bassemmansour3163
    @bassemmansour3163 ปีที่แล้ว +6

    best illustrations in the subject. thank you for your work!

  • @jollyrogererVF84
    @jollyrogererVF84 9 หลายเดือนก่อน +1

    A brilliant introduction to the subject. Very clear and informative. A good base for further investigation.👍

  • @jcorey333
    @jcorey333 4 หลายเดือนก่อน +1

    This is one of the best explanations I've seen! Thanks for making videos

  • @djenning90
    @djenning90 9 หลายเดือนก่อน +2

    Both this and the transformers video are outstanding. I find your teaching style very interesting to learn from. And the visuals and animations you include are very descriptive and illustrative! I’m your newest fan. Thank you!

  • @sergiysergiy8875
    @sergiysergiy8875 8 หลายเดือนก่อน +1

    This was great. Please, continue your content

  • @manthanpatki146
    @manthanpatki146 5 หลายเดือนก่อน +1

    Man, keep making more videos, this is a brilliant video

  • @thomassynths
    @thomassynths 7 หลายเดือนก่อน

    This is by far the best explanation of CNNs I have ever come across. The motivational examples and the presentation are superb.

  • @connorgoosen2468
    @connorgoosen2468 ปีที่แล้ว +1

    How has the TH-cam Algorithm not suggested you sooner? This is such a great video, just subscribed and keen to see how the channel explodes!

  • @anangelsdiaries
    @anangelsdiaries 4 หลายเดือนก่อน +2

    Fam, your videos are absolutely amazing. I finally understand what the heck a CNN is. Thanks a lot!

  • @user-sg4lw7cb6k
    @user-sg4lw7cb6k 9 หลายเดือนก่อน +1

    Your videos are extremely good, especially for such a small channel. Great video! Can do one in Recurrent Neural Networks please .

  • @GaryBernstein
    @GaryBernstein 9 หลายเดือนก่อน

    Can you explain how the NN produces the important-word-pair information-scores method described after 12:15 from the sentence problem raised at 10:17? Can you recommend any tg groups for this Q & topic?

  • @5_inchc594
    @5_inchc594 ปีที่แล้ว +2

    amazing content thanks for sharing!

  • @benjamindilorenzo
    @benjamindilorenzo 3 หลายเดือนก่อน +1

    The best video on CNN´s. Please make a video about V-Jepa, the proposed SSL Architecture from Yann LeCun.
    Also it would be nice to have a deeper look at Diffusion Transformers or Diffusion in general.
    Really really good work man!

  • @nadaelnokaly4950
    @nadaelnokaly4950 2 หลายเดือนก่อน +2

    wow!! ur channel is a treasure

  • @jorgesolorio620
    @jorgesolorio620 ปีที่แล้ว +3

    Great video! Can do one in Recurrent Neural Networks please 🙏🏽

  • @reubenkuhnert6870
    @reubenkuhnert6870 9 หลายเดือนก่อน

    Excellent content!

  • @pedromartins9889
    @pedromartins9889 4 หลายเดือนก่อน

    Great video. You explain things really well. My only complain is that you don't cite references. Citing references (which can be made simply as a list in the description) makes your less obvious statements more sound (like the fact that the quantity of significant outputs of a layer is more or less constant and small, I understand it would be very hard to explain it maintaining the flow of the video, but if there was in the description some link to that explanations or at least to a practical demonstration, the viewer could, if wanted, understand it better or at least be more sure that it is really true). Citing references also helps the viewer a lot if he wants to further study the topic (and this is fair, since you already made the rersearch for the video, so it costs you way less to show your sources than to the viewer to rediscover them). In summary: citing references gives you more credibility (in a digital world filled with so much bullshit) and gives a great deal of help to interested viewers to go deeper on the topic. Don't be mistaken, I really like your channel.

  • @terjeoseberg990
    @terjeoseberg990 10 หลายเดือนก่อน +2

    I believe that the main advantage to convolutional neural networks over fully connected neural networks is the computational savings and the increased training data.
    A convolutional neural network is basically a tiny fully connected network that’s being trained on every NxN square on every imaginable. This means that a 256x256 image is effectively turned into 254x254 or 64,516 tiny images. If you start with 1 million images in your training data, you now have 64.5 billion 3x3 images that you’re going to train the tiny neural network on.
    You can then create 100 of these tiny neural networks for the first layer, another 100 for the second layer, and another 100 for the third layer, and so on for 10 to 20 layers.

    • @algorithmicsimplicity
      @algorithmicsimplicity  10 หลายเดือนก่อน

      I think that these 2 reasons are the most commonly cited reasons for the success of CNNs (along with translation invariance, which is absolutely incorrect), but I don't think that these 2 things are sufficient to explain the success of the CNN.
      It is true that CNN uses much less computation than fully connected neural networks, but there are other ways to make deep neural networks which are just as computationally efficient as CNNs. For example, using a MLP-Mixer style architecture in which a linear transform is first applied independently across channels to all spatial locations, and then a linear transform is applied independently across spatial locations to all channels. In fact, this is exactly what I used when making this video! The "Deep Neural Network" I used was precisely this, it would have taken too long to train a deep fully connected neural network. This MLP-Mixer variant uses the same computation as CNN, but allows each layer to see the entire input. Which is why it achieves less accuracy than a CNN.
      As for the increased training data size, it is possible this helps but even if you multiply your dataset size by 100,000, it is still nowhere near the amount of data you would expect to need to learn in 256*256 dimensional space. Also, if it was merely the increased training data, then I would expect CNNs to perform better than DNNs even on shuffled data (after all, having more data should still help in this case). But in fact we observe the opposite, CNNs perform worse than DNN when the spatial structure is destroyed.
      For these reasons I believe that the fact that each layer sees a low effective dimensional input is necessary and sufficient to explain the success of CNNs.

    • @terjeoseberg990
      @terjeoseberg990 10 หลายเดือนก่อน

      @@algorithmicsimplicity, It’s a combination of multiplying the dataset size by 64,500 and reducing the network size from 256x256 to 3x3. In fact it’s the reduction of the network size to 3x3 that’s allowing the effective 64,500 times increase in dataset size. It’s not one or the other, but both. Each weight gets a whole lot more training/gradient following.
      You should do a video on the MLP-Mixer, and how it compares to CNN.

  • @jamespogg
    @jamespogg 9 หลายเดือนก่อน

    amazing vid good job man

  • @justchary
    @justchary 9 หลายเดือนก่อน

    I do not know who you are, but please continue! You definitely have a wast knowledge on the subject, because you can explain complex things simply.

  • @joshmouch
    @joshmouch 8 หลายเดือนก่อน

    Yeah. Jaw dropped. This is an amazing explanation. More please.

  • @user-to4hq2nm1m
    @user-to4hq2nm1m 8 หลายเดือนก่อน

    Really nice! What tool did you use to do those awesome animations?

    • @algorithmicsimplicity
      @algorithmicsimplicity  8 หลายเดือนก่อน

      This was done using Manim ( www.manim.community/ )

  • @neithanm
    @neithanm 9 หลายเดือนก่อน +1

    I feel like I missed a step. The layers on top of the horse looked like an homogeneous color. Where's the information? I was expecting to see features from small parts to recognizing the horse, but ...

  • @Isaacmellojr
    @Isaacmellojr 5 หลายเดือนก่อน

    Mais videos por favor! Vc tem o dom!!

  • @HD-Grand-Scheme-Unfolds
    @HD-Grand-Scheme-Unfolds ปีที่แล้ว

    @AlgorithmicSimplicity greetings, may I ask: in you video presentation could you please specify in what sense do you mean by "Randomly re-order the pixels" (13:55)? let me explain my question. Although I know you mean reshuffling the permutation order of the set of input pixels; when I said in what sense I meant: is it (A-> a unique random re-order seed for each training example (as in for every picture) |OR| (B-> the same random re-order seed for each training example?
    if you meant in the sense of "A" I would be amazed the convolution-net can get that 62.9% accuracy you mentioned earlier. That 62.9% would be more believable for me if you meant in the sense of "B".

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +3

      I meant in the B sense, same shuffle applied to every image in the dataset (training and test). If it was a different random shuffle for each input then no machine learning model (or human) would ever get above 10% accuracy. If you have some experience with machine learning, this operation is equivalent to shuffling the columns of a tabular dataset which of course all standard machine learning algorithms are invariant to.

    • @HD-Grand-Scheme-Unfolds
      @HD-Grand-Scheme-Unfolds ปีที่แล้ว

      @@algorithmicsimplicity lol, speaking from in hindsight, your point in now taken. Dwl lmao😄🤣. Which human or person.... but let me be the devil's advocate for entertainment and curiosity purposes a bit: I it was somehow in sense "A", then I'd imagine that imply a phenomenon we all may call pure memorization at its finest.
      But to go back on main track, I love that you went out of the way to make that clear in you presentation, yours is the second video that mentioned, but you were to first to settle the question the big question (that I already asked you, thanks again).
      by the way in the name of opportunity sake I would like to ask: Do you know where a non-programmer person may source a intuitive interactive GUI based executable program that simulate and implement recurrent neural networks (especially if its the simple RNN, prefer against but will accept LSTMs or GRUs)? Github for example mostly accommodates those who meet coding knowledge prerequisite. "MemBrain" (meets concept but its RNN is still puzzling for me to figure out, and train test etc) (but its the most promising one to try work with so far) and "Neuroph Studio" (meets the concept but have no RNN support) and "Knime Analytics Platform" is likened onto coding skills, in disguise as GUI with click and parameter controls. rules for arrangments are too complex, and counter intuitive. IBM Watson studio seems similar and matlab is a puzzlebox too.

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +1

      I'm afraid I don't know of any GUI programs that simulate RNNs explicitly, but I do know that RNNs are a subset of feedforward NNs. That is, it should be possible to implement an RNN in any of those programs you suggested. All you would need to do is have a bunch of neurons in each layer that copy the input directly (i.e. the i'th copy neuron should have a weight of 1 connected to the i'th input and 0 for all other connections), and then force all neuron weights to be the same in every layer. That will be equivalent to an RNN.
      I would also recommend you just try and program such an app yourself. Even if you have no experience programming, you can just ask ChatGPT to write the code for you 😄.

  • @Emma2-cg5jh
    @Emma2-cg5jh 29 วันที่ผ่านมา

    Where does the Performance value for the rearranged images come from? Did you made Them By Yourself or is there a paper for that?

    • @algorithmicsimplicity
      @algorithmicsimplicity  29 วันที่ผ่านมา

      All of the accuracy scores in this video are from models I trained myself on CIFAR10.

  • @pypypy4228
    @pypypy4228 8 หลายเดือนก่อน

    It's brilliant!

  • @scarletsence
    @scarletsence ปีที่แล้ว

    This god like visualizations thanks.

  • @ThankYouESM
    @ThankYouESM 8 หลายเดือนก่อน

    Seems like the bag-of-words algorithm can do a faster job at image recognition since it doesn't need to read a pixel more than once.

  • @bobuilder4444
    @bobuilder4444 หลายเดือนก่อน

    13:09 How would you know which numbers to remove?

    • @algorithmicsimplicity
      @algorithmicsimplicity  29 วันที่ผ่านมา

      You can simply order the weights by absolute value and remove the smallest weights (the ones closest to 0). This probably isn't the best way to prune weights, but it already allows you to prune about 90% of them without any loss in accuracy: arxiv.org/abs/1803.03635

    • @bobuilder4444
      @bobuilder4444 29 วันที่ผ่านมา

      @@algorithmicsimplicity Thank you

  • @uplink-on-yt
    @uplink-on-yt 9 หลายเดือนก่อน

    12:58 Wait a minute... Did you just describe neural pruning, which has been observed in young human brains?

  • @Walczyk
    @Walczyk 4 หลายเดือนก่อน

    7:04 this is just like the boost library from microsoft

  • @aydink7739
    @aydink7739 ปีที่แล้ว +1

    This is wow, finally understand the „magic“ behind CNNs. Bravo, please continue 👍🏽

  • @nageswarkv
    @nageswarkv 8 หลายเดือนก่อน

    definitely good video, not fluff video

  • @yourfutureself4327
    @yourfutureself4327 9 หลายเดือนก่อน

    💚

  • @solaokusanya955
    @solaokusanya955 ปีที่แล้ว +1

    So technically, what the computer sees or not is high dependent on "whatever" we has humans dictate it to be...

  • @peki_ooooooo
    @peki_ooooooo ปีที่แล้ว

    Hi, how's the next video?

  • @blonkasnootch7850
    @blonkasnootch7850 8 หลายเดือนก่อน

    Thank you for the video. I am not sure if it is right to say that humans have build knowledge into the brain about how the world works from birth.. accepting vision input for data processing, detecting objects or separating regions of interest is something every baby has to clearly learn. I have seen that with my children it is remarkable but not there from beginning.

    • @algorithmicsimplicity
      @algorithmicsimplicity  8 หลายเดือนก่อน +1

      Of course children still need to learn how to do visual processing, but the fact that children can learn to do visual processing implies that the brain already has some structure about the physical world built into it. It is quite literally impossible to learn from visual inputs alone, without any prior knowledge.

  • @lolikobob
    @lolikobob 6 หลายเดือนก่อน

    Make more good videos!

  • @thechoosen4240
    @thechoosen4240 9 หลายเดือนก่อน

    Good job bro, JESUS IS COMING BACK VERY SOON;WATCH AND PREPARE