Visualizing Convolutional Neural Networks | Layer by Layer

far1din

มุมมอง 72 746

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 27 ก.ค. 2024
Visualizing convolutional neural networks layer by layer. We are using a model pretrained on the mnist dataset.
► SUPPORT THE CHANNEL
➡ Paypal: www.paypal.com/donate/?hosted...
These videos can take several weeks to make. Any donations towards the channel will be highly appreciated! 😄
► SOCIALS
X: x.com/far1din_
Github: github.com/far1din
Manim code: github.com/far1din/manim#visu...
---------- Content ----------
0:00 - Introduction
00:30 - The Model
02:03 - Input and Convolution | Layer 1
02:46 - Max Pooling | Layer 1
03:30 - Convolution | Layer 2
04:29 - Max Pooling and Flattening | Layer 2
05:10 - The Output Layer (Prediction)
---------- Contributions ----------
Intro Music: ‪@randomtypebeats7348‬
Background music: pixabay.com/users/balancebay-...
#computervision #convolutionalneuralnetwork #ai #neuralnetwork #neuralnetworksformachinelearning #neuralnetworksexplained #neuralnetworkstutorial #neuralnetworksdemystified #computervisionandai #animation #animationvideo

ความคิดเห็น • 87

@r0cketRacoon 7 วันที่ผ่านมา
WWOWW! I need a video visualizing CNN from layer to layer like that, and I encountered ur channel. The best CNN visualization for me now. Tks u!
@sugurufoundation4127 23 วันที่ผ่านมา ⁺¹
Thank you very much for this explanation. I have been struggling to understand how exactly convolutions work, and you explained it perfectly. Thanks keep up the good work bro.
@Purely_Andy ปีที่แล้ว ⁺³⁵
I've been wanting a video like this for a long time, you nailed the balance between showing and explaining. I subscribed and liked the video.
@far1din ปีที่แล้ว ⁺²
Thank you my friend. Glad you got value out of it! 🔥
@terencechengde ปีที่แล้ว ⁺⁶
This is an excellent video explaining how convolutional layers work at a glance! Fantastically well done!
@AnaS-lp4mf 3 หลายเดือนก่อน
Literally thank you so much for your video. You don’t know how much relief you brought me in understanding it! Your voice is so clear and your explanations were just spot on and easy to understand. I’ve spent like weeks trying to figure out how the pieces go together and you have completed the puzzle of confusion in my mind. Thank you so much !!!
@dndr4818 ปีที่แล้ว ⁺¹
One of the best video explaining this topic. What a great job and thank you!
@Democracy_Manifest ปีที่แล้ว ⁺²
Excellent video. We needed this, thank you. Keep it up!
@boramin3077 19 วันที่ผ่านมา
Amazing work! Please continue to make more videos on other ML topics. I find your videos are really helpful to understand the concepts.
@joaogustavo6268 2 หลายเดือนก่อน ⁺¹
This is art! Never delete this video bro!
@maxedin7910 ปีที่แล้ว ⁺¹
Amazing visualization! I wish for more short vids like this!
@lapusikrid475 ปีที่แล้ว ⁺²
Great video! So good and affordable explanation! Thank you, it helps me a lot!)
@RAHUL1181995 ปีที่แล้ว ⁺¹
This was really helpful....Thank you so much for the vizualization...Keep up the good work...Looking forward to your future uploads.
@fdshdsfdsqq 8 หลายเดือนก่อน ⁺²
Man, you're amazing! This is a perfect visualization, thank you God give you health!
@far1din 8 หลายเดือนก่อน
Haha, thank you for the kind words my friend :D
@websciencenl7994 ปีที่แล้ว ⁺¹
Great video, Thank you!
@alfashow2439 ปีที่แล้ว ⁺¹
Great video! you got me interseted!
@user-bx7xh3wy1b 8 หลายเดือนก่อน ⁺²
You are fantastic! I look forward to watching your next videos. please produce mooore videos!
@far1din 8 หลายเดือนก่อน
Thank you brother. Working on it! 😃🚀
@fiya.. ปีที่แล้ว ⁺¹
Nice video! Thank you!
@teofilciobanu 6 หลายเดือนก่อน ⁺²
Hi there! Pretty cool video :D
One suggestion I would have would be to show how the output looks if we change the input.
For example, at 2:50 and 4:30 a 2x10 and 4x10 grid respectively where the ith column is the layer output for the ith digit.
@hackeymabel8296 10 หลายเดือนก่อน ⁺²
thank you really needed this
@khayyamnaeem5601 ปีที่แล้ว ⁺³
This is my favourite channel. I especially like it when you explain everything so nicely. I wish you a lot of success with the channel and happy life.
@far1din ปีที่แล้ว ⁺¹
That’s great to hear. I wish you all the best! 😃
@sivaladeepak1538 ปีที่แล้ว ⁺¹
Great Viz!!!
@_aitalks_ ปีที่แล้ว ⁺²
Nice walkthrough!
@far1din ปีที่แล้ว
I am glad you enjoyed the content!
@emadilabid หลายเดือนก่อน
Bravo dude!
@KhanhTran-qp5pm ปีที่แล้ว
Love your video.
@IvoVacca ปีที่แล้ว ⁺¹
Very cool.
@locngoxuan1329 ปีที่แล้ว ⁺¹
it's really informative for me. Tks bro!!
@far1din ปีที่แล้ว
Thank you brother. I will try to post a more in depth video soon!
@mattialoszach6405 3 หลายเดือนก่อน
Awesome video🫶
@user-zn1hm6tk1o 5 หลายเดือนก่อน ⁺³
thank you from Silicon Valley Startup
@Sowmya_codes 3 หลายเดือนก่อน
Hello user-zn1hm6tk1o Do let me know if you're in need of an intern !
@randomtypebeats7348 ปีที่แล้ว ⁺²
keep em coming brodie!
@far1din ปีที่แล้ว
For sure! :)
@mehdismaeili3743 ปีที่แล้ว ⁺¹
excellent. 3 video and 1.89 k subscriber !!! it's amazing, and you are amazing .
@far1din ปีที่แล้ว
Thank you my friend! 💯
@JetJV ปีที่แล้ว ⁺¹
Wow. Such a high quality content with great effort. U deserve million subscribers.. Hope soon u ll get popular and make more videos, 🙌
@far1din ปีที่แล้ว
Haha, stay tuned! 🚀🔥
@programmingwithchatti4063 ปีที่แล้ว ⁺¹
why is the background music soo soothing xD I want to focus not lost in thoughts xD
@far1din ปีที่แล้ว ⁺¹
Haha, next time myfriend. I will turn down the music 🚀
@EpromLee 4 หลายเดือนก่อน
thank u!!
@AutobahnBuilder 10 หลายเดือนก่อน ⁺⁴
I gotta say your animation is great and the way you explain it is really good. It feels like 3blue 1brown and just as well explained. Keep the good work!
@far1din 9 หลายเดือนก่อน
Thank you my friend 🔥
@Waferdicing ปีที่แล้ว
🧡🖤
@richardfoldesi3980 ปีที่แล้ว ⁺¹
best
@far1din ปีที่แล้ว
🚀
@emotional_stuff ปีที่แล้ว
Love it! Sub+
@vivek65522 ปีที่แล้ว
Great Video, a quick question, does kernel values (in every conv. layer) changes during training of CNN? Does the kernel changes it self so as to only extract exactly the shapes and pattern required for classification ?
@kneehaw8093 ปีที่แล้ว
Convolutional kernels will train their weights such that they “learn” whatever feature detection is most desirable.
I believe that is a “yes” to your question.
@far1din 11 หลายเดือนก่อน
1. Does the kernel values change? It depends. Normally it does, but you can also freeze the network and train for example only the last 3 layers. This is common for transfer learning: th-cam.com/video/FQM13HkEfBk/w-d-xo.html
2. It definetly looks like that is the case if you look at a network post training. Why it extract certain shapes and patterns are still unknown for me.
@ratfuk9340 ปีที่แล้ว ⁺³
4:10 How are the outputs of the two convolutions for each filter combied to get 1 filter map / filter?
@far1din ปีที่แล้ว ⁺²
So, you multiply the values from one filter with the corresponding input layer. Then you add together the outputs + the bias. Then you pass them through an activation funtion (in this case the sigmoid). After this you move the filter one step to the right (since the stride is one) and do the same thing until everything is covered.
This is hard to explain in text. I will do my best to get out an in depth video by this weekend where everything will be explained in detail! 💯
@ratfuk9340 ปีที่แล้ว ⁺²
@@far1din I think I get it. Anyway thanks for making these videos, they help a lot.
@far1din ปีที่แล้ว ⁺¹
@@ratfuk9340 th-cam.com/video/jDe5BAsT2-Y/w-d-xo.html
@ahamedismail1630 8 หลายเดือนก่อน ⁺¹
So we use filters to catch the features, and maxpooling to capture the most active feature? Thus if we use maxpooling of higher dimensions, we loose the other less significant features to reduce the model complexity but results in loss of information?
@far1din 8 หลายเดือนก่อน
That is the idea and it seems to be working in practice. However, people use average-pooling and min-pooling also. It all comes down to what yields the best result in the end! :D
A lot of this is just trial and error until the best performance is acchieved.
@abdullahhashmi654 ปีที่แล้ว ⁺²
For the convolution, you have two filters through which you pass the original image and you get two images which have been passed by the filter. But for the second convolution, you have 4 filters each for the two images, so a total of 8 filters for the two images. 4 for one and 4 for the other. But in the visualization, dissimilar to the first convolution, you run both the filters through both the images in parallel and then combine them. So instead of getting 8 images from the second convolution, we get 4. Why?
@far1din ปีที่แล้ว ⁺⁷
Great question my friend. This is one of the reasons why I felt I had to make this animation.
First of all, in the second convolution, we have one “image” consisting of two channels. You can think of an RGB (colored) image. These images consists of three channels that you stack together. See reference 1 for image. This basically implies that if you have a 28 x 28 pixel RGB image, when you extract the values for the three channels (Red, Green and Blue) you will have three matrices. The dimensions are therefore actually 28 x 28 x 3, where 3 is the number of channels.
Now, when you build a convolutional neural network, you choose how many filters you want in each layer. If you have X amount of channels in the inputted image, each of the filters will have X amount of channels. Examples:
Input = 28 x 28 x 3 and you choose five filters with dimensions 5 x 5, you will technically have five filters that are 5 x 5 x 3. Let’s say you run the convolution. Your outputted activation layer will have the dimensions 24 x 24 x 5. Where 5 is the number of channels.
This implies that if you choose to have a convolutional layer next with two filter with the dimensions 7 x 7, these will actually have the dimensions 7 x 7 x 5.
I hope this made sense and I know it can be a bit confusing. I highly recommend you watch Andrew NG’s video on convolutions over volumes (Reference 2) as he explains the concept very well.
Reference 1: media.geeksforgeeks.org/wp-content/uploads/Pixel.jpg
Reference 2: th-cam.com/video/KTB_OFoAQcc/w-d-xo.html
@nithyaprakash5812 10 หลายเดือนก่อน
Great job.. U got a new subscriber 🎉😊
@Leuel48Fan ปีที่แล้ว
4:24 kind of wild how after this layer there's no resemblance or perceived logic to our eyes how it's characterizing the digit. Can you show maybe 10 examples each of the 10 different digits at this step (just the output)? I wonder if they will look similar in pattern to each other right before the fully connected final layer.
@far1din ปีที่แล้ว
It is very strange how all og this just «works». I will try to do that for one of my next videos! 😊
@triplemmm3 ปีที่แล้ว ⁺²
What do use use for making videos ?
@far1din ปีที่แล้ว ⁺¹
Manim!
www.manim.community
@ritwiksharma4359 ปีที่แล้ว
Why there are 2 parts of each filter in layer 2? Is this just for the different color channels?
@far1din ปีที่แล้ว
Yes my friend. If there where four channels that were convoluted over, then each filter would have had four channels as well.
The amount of channels in the filter will mirror that of the input! 💯
@user-zs9cl2to3x 11 หลายเดือนก่อน
I didn't get why in 1st convolution after passing through 5*5 filter image of size 28*28 we've ended up with image of size 24*24. Stride is 1 hence filter should perfectly marsh till the end of the image with no bypassing certain pixels. Probably I miss smth but what?
@far1din 11 หลายเดือนก่อน
Brother, for example, let’s say you have an image with the size of 7 x 7 and a filter that is 5 x 5. You start at the left, where the left side of the filter touch the left side of the image. You slide it to the end where the filters **right** side touch the right side of the image. Since the filter is 5 x 5, you can only slide it two times to the right. Essentially, 5 «squares» of the image is already covered when you start, and there is only two more left.
I will highly recommend you draw this out and «scale down» the image to 7 x 7 or something to get some clarification. I’ll also link a video that show the process.
Watch the animations in this video for clarification: th-cam.com/video/tQYZaDn_kSg/w-d-xo.html
@user-zs9cl2to3x 11 หลายเดือนก่อน
@@far1din Bro, I've got it !!! Thank You so much for a great clarification and a link. That helped a lot.
@figloalds ปีที่แล้ว
If you start from the last layer with a 1 on the 7 and 0 on the rest then do the reverse process, how would the image look like?
@far1din ปีที่แล้ว
My friend, I do not understand. Could you explain in detail? :)
@figloalds ปีที่แล้ว
@@far1din i mean trying to reverse the process from a given output all the way to a generated image and see a visualization of what the network "understands" of a 7 for example. I was seeing some videos of people reversing AI outputs in this way for deep dream and gpt and they found very weird stuff, like for example very weird tokens in GPT that causes the model to completely break
@Leuel48Fan ปีที่แล้ว ⁺¹
@@figloalds I believe I've seen this tried somewhere and it'll look nothing like a '7'. Basically it can't generate images, you'd just get random gibberish static looking pixels turned on. I also think if you input a smiley face or letter, unless specifically trained for "not a number" scenario, you may get an inconclusive output or falsely confident on one number.
Even though the math of the process checks out, nothing about this whole concept makes any intuitive sense, which is why I find it fascinating tbh lol
@far1din ปีที่แล้ว
@@Leuel48Fan Yes haha. I believe I have seen this been done on a dumbbell. It was very noisy, but you could still see the outer line. However, I was unable to find it now 🤷🏾‍♀️
@far1din ปีที่แล้ว ⁺²
@@figloalds Ahaa, yes I know what you are talking about. This is a great idea. This would be what the computer perceives as the ideal «7» or ideal «1». I will try to make a video on that! 🚀
@paedrufernando2351 ปีที่แล้ว ⁺¹
why use 2nd convolution with 8 filters
@far1din ปีที่แล้ว
Hello my friend, the number of filters are typically choosen. A lot of deep learning is trial and error. I used 4 filters in the second convilution and these filters had two channels. Therefore, technically four filters. There is no underlying reason for this as this video was just meant to visualize the process. I could have gone with 10 filters, but the video would have been impossible to render.
@leesohaeng ปีที่แล้ว ⁺¹
hi, how to make 5*5 kernel? just random?
@far1din ปีที่แล้ว
It is initialized randomly if that is what you were asking! :)
@phanstudio ปีที่แล้ว
How do you create the visualizations
@far1din ปีที่แล้ว
Manim!
docs.manim.community/en/stable/
@gamercatsz5441 ปีที่แล้ว ⁺¹
Then you verify it’s a 7,and the AI algorithm gets better. Does the machine learn what is a 7? or does it learn how to get positive feedback from us. If we use advanced AI to ask advanced questions. Do we actually know for certain, it is not lying to us to get good feedback? Advanced AI like chat gtp 4 which is rapidly advancing will already is much smarter than 99% of humans, potentially is a lying manipulating horrible thing. It’s better to make 100% sure it is truthfull and fully understand intentions before making it smarter and more powerfull, train it woth more data, more computers, ect…
@far1din ปีที่แล้ว
Hello, my friend. Here is my take on this topic. First of all, these kinds of AI models are mathematical models that learn to get the "correct" response. This is called supervised learning, where the model is trained on labelled data. It is hard to define whether or not a model is lying, as you have to define lying. Is it lying if the answers are correct? The concerns are valid, as models can converge towards outputting responses that are not correct but perceived as such. This is something researchers and users should be wary about.
On the other hand, human intelligence and deep neural networks seem to take different paths. Even though deep neural networks are inspired by how the brain works and are capable of many of the "intelligent tasks" we throw at them, the learning process is not the same. Modern deep learning models use backpropagation while humans don't (or at least, that is the perception for now). Although we come to the same answer, the consciousness factor might differ. Therefore, it is hard to tell if a model is lying because lying is something I would like to think of as a conscious act.
You may find it beneficial to watch an interview with Geoffrey Hinton, as he discusses some of the concerns you've raised: th-cam.com/video/qpoRO378qRY/w-d-xo.html
@justinnejbauer หลายเดือนก่อน
@@far1dinI would argue that humans do use back propagation. When you make a mistake in life, how many times have you looked back to ask yourself what you would have done differently to minimize the negative effects.

ต่อไป

เล่นอัตโนมัติ

Convolutional Neural Networks from Scratch | In Depth