The intro just rocked, as to why CNN. "Humans can do object detection quickly and machines can't" and hence that's where it begins. Amazing... Thanks...
Have been watching several videos to get a high level understanding of CNN, but no luck. However, this is a very good explanation ! Cleared lots of doubt in few minutes. Thank you
In my eyes , the goal of Convolution is to make the signal invariant to scaling and translation. It acts as a pre-processor of the raw input signal. You could also first pre-process your training set and store it in a file. Then you can use this file and feed it directly to the deep neural network. You don't need the Convolution anymore at training. Another way of making your signal (picture) invariant is to first Fourier Transform it to make it scaling and translation invariant. Next you transform the signal from cartesian to polar coordinates to make it rotational invariant. Finally you Fourier Transform that signal and end up with a fully invariant signal that you can store as a pre-processed Training set.
But CNN makes it possible to sequentially apply more abstract filters that fit the specific objects in the image. I'm not sure if those transformations you named are able to do that, which is taking very complex and abstract patterns into account.
Its an easy solution actually. The video is recorded from the other side of the glass board. The video is then flipped horizontally. You can observe the watch appears to be on his right hand but its actually left.
I was looking to understand how to represent a CNN in a way that clearly shows the difference to just dense neural networks. This really helped! thanks!
Amazing explanation! Two quick questions: 1. If each layer of a neural network can recognize more complex / abstract objects, does that mean that deeper neural networks (neural networks with more layers) will always be more powerful, or at least have the potential to be more powerful? 2. Could one say the same about the width of neural networks? Would a neural network with more nodes per layer be able to recognize a larger variety of images?
Both those assumptions are valid, with some caveats. If you have too many nodes in a layer, you're looking for too many features in the data, and you'd virtually memorise the training data after some point, because you're not reducing the dimensionality anymore. If you use too many layers, you're risking vanishing/exploding gradients, and you're making features needlessly complex, which may also lead to overfitting. Besides, there need to be sufficiently complex activation functions between layers to leverage the feature-extracting prowess prowess of each node. If the activation functions are too non-linear, the individual weights become less meaningful, and harder to train. If the activation function is not sufficiently non-linear, you're essentially obtaining the result of single matrix multiplication operation with the computational overhead of multiple operations.
Hi! Have I assumed correctly that in case of using CNNs for image recognition, the deeper the filters go, the more they zoom out on the image? Next logical question is - what type of software is used to analyze test cases (e.g. real houses) and create those filters?
The filter is no more than just a matrix. The discrete convolution is performed in each layer (this is where the name CNN comes from). The filter is refined using training data, just like how you would train a perception, you train the matrix to behave as desired.
Martin, how are the filters for a CNN created? Random? stored in some database? Might there be advantage from specifying filters yourself, particularly if you have expertise with the domain the images are from ?
So I take the key to building a CNN is on how to build the filters? also, given that the first layer is fragmented, does it mean that the first layer could be of general usage, while the later layers are more application oriented?
so by combining the other video of yours. At the end of the the CNN there will be a discriminator which has been trained to know what a house looks like, what an apartment looks like, what a skyscraper looks like and therefore tells you that is a house ?
certo, curiosidade: Se tratando de pessoas gêmeas ou sei lá trigêmeas univitelinos como diferencia-las pela CNN? Outro detalhe com relação aos filtros, suponhamos que temos objetos sobre as retas por exemplo como identifica-las neste processo com tão vastas imagens possíveis de armazena-las?
Hi ,I'm a maths student and I need to do a project. the theme is games and sport. I saw your video and thought why not apply this technique to the world of sports? to discover from the analysis of the players' movements if one is sick. Can you help me to apply CNN and use it well please.
Don't ask him. His explaination is sloppy and incomplete. The convolution operations with the filters produce matrix channels building the tensor. For example after four convolution operations, you should have four matrix channels. The next operation would be a max pooling operation on each matrix channel in the tensor. Please let me know if you have a question.
This is too low level and vague for people who need it and too high level and complicated for children, I believe that you should go more in depth to provide more information such as how the convolution works, different activation methods and different types of layers
These videos are for 2 demographics, young adults/teenagers who find AI technology fascinating and want to understand how it works. And for children to spark the flame of the scientist inside them towards AI development when they grow up. The Second reason is the most important.
That's the neat part - you don't manually make those filters. Those filters are learned by the network based on bounding boxes in the annotated training images.
perfect explanantion. I hate it when people throw difficult terms around. Why can't it be precise and clear such as using a house as an analogy. Well done!
Unbelievably clear and succinct explanations
Thanks for the appreciation, Sunny, that's what we strive for! 🙂
Well said
L.
.
מצורף .
...
❤, . מחלת תינו@@JockGeez
The intro just rocked, as to why CNN. "Humans can do object detection quickly and machines can't" and hence that's where it begins. Amazing... Thanks...
Explained in a very simple way that's easy to understand! Great video!
Have been watching several videos to get a high level understanding of CNN, but no luck. However, this is a very good explanation ! Cleared lots of doubt in few minutes. Thank you
Bro this dude just wrote mirrored wth. Also thanks for the video! The concept of CNN is a lot more clear to me now. :))
Glad this was useful to you! 👍 As for writing mirrored, here is how we do it 👉 ibm.co/3jnq1st 😉
In my eyes , the goal of Convolution is to make the signal invariant to scaling and translation. It acts as a pre-processor of the raw input signal. You could also first pre-process your training set and store it in a file. Then you can use this file and feed it directly to the deep neural network. You don't need the Convolution anymore at training.
Another way of making your signal (picture) invariant is to first Fourier Transform it to make it scaling and translation invariant. Next you transform the signal from cartesian to polar coordinates to make it rotational invariant. Finally you Fourier Transform that signal and end up with a fully invariant signal that you can store as a pre-processed Training set.
Any citations for elaborating what you said.
But CNN makes it possible to sequentially apply more abstract filters that fit the specific objects in the image. I'm not sure if those transformations you named are able to do that, which is taking very complex and abstract patterns into account.
man i like how you clearly explain your videos
I was smiling to myself the whole time. So simple and succinct! Thank you
Mans just wrote in perfect handwriting BACKWARDS on the glass and no one is talking about it what the heck
um actually the video
is mirrored
The magic of video editing, he’s a wizard
If you look around, you'll find a video they made to address just this question, everyone who watches IBM videos asks exactly that, I know I did :)
Its an easy solution actually. The video is recorded from the other side of the glass board. The video is then flipped horizontally. You can observe the watch appears to be on his right hand but its actually left.
This channel has some of the best CompSci explanations ! Never been disappointed!
0:42 I cannot get over the fact that this dude just wrote the term CNN backwards so easily and so fast :O
Or maybe he just inverted the video horizontally in post edition
try looking at the video using a mirror ...
He inverted the video. That's why he's writing with his left hand and wearing his clock on the right arm.
@@badbud804yeah, I also mentioned that but it would be very impressive if he could actually do that
the knob/button on the watch (which is typically to the right of the dial/screen) is the most unambiguous clue establishing the video is mirrored.
Such a likeable person explaining so well, much appreciated! :)
I was looking to understand how to represent a CNN in a way that clearly shows the difference to just dense neural networks. This really helped! thanks!
This is probably the best explained video i've ever watched, you're a great tutor!!!!!😍😍
Amazing explanation!
Two quick questions:
1. If each layer of a neural network can recognize more complex / abstract objects, does that mean that deeper neural networks (neural networks with more layers) will always be more powerful, or at least have the potential to be more powerful?
2. Could one say the same about the width of neural networks? Would a neural network with more nodes per layer be able to recognize a larger variety of images?
Both those assumptions are valid, with some caveats.
If you have too many nodes in a layer, you're looking for too many features in the data, and you'd virtually memorise the training data after some point, because you're not reducing the dimensionality anymore.
If you use too many layers, you're risking vanishing/exploding gradients, and you're making features needlessly complex, which may also lead to overfitting.
Besides, there need to be sufficiently complex activation functions between layers to leverage the feature-extracting prowess prowess of each node. If the activation functions are too non-linear, the individual weights become less meaningful, and harder to train. If the activation function is not sufficiently non-linear, you're essentially obtaining the result of single matrix multiplication operation with the computational overhead of multiple operations.
Dear lord this is perfectly chunked information.
Fantastic explanation! Very pedagogical and easy to follow. Thank you!
Martin, you are a superb teacher. You make learning easy and fun.
there should be a full course on this neural network taught by Martin
This explanation was so good. Currently using CNNs for remote sensing applications.
Hi! Have I assumed correctly that in case of using CNNs for image recognition, the deeper the filters go, the more they zoom out on the image?
Next logical question is - what type of software is used to analyze test cases (e.g. real houses) and create those filters?
The filter is no more than just a matrix. The discrete convolution is performed in each layer (this is where the name CNN comes from). The filter is refined using training data, just like how you would train a perception, you train the matrix to behave as desired.
you are more and more better than my clg faculty thank you for a great a explanation 😍
best teacher!! 👏
Nice series Marvin 😁
Fantastic Video. Is Martin always writing mirrored? I am fastinated by how your video recording works!
I understood it very well, in case som1 didn't, watch this video after watching 3b1b video on neural networks
Very excellent explanation ❤
Martin, how are the filters for a CNN created? Random? stored in some database? Might there be advantage from specifying filters yourself, particularly if you have expertise with the domain the images are from ?
So I take the key to building a CNN is on how to build the filters? also, given that the first layer is fragmented, does it mean that the first layer could be of general usage, while the later layers are more application oriented?
amazing as usual.
This was easy to understand and very concise...Thank you
Hello, thank you for the explanation but I still don't understand how the filters are made.
What would be the difference between the standard convolutional networks and something newer like CLIP?
At last a video that is useful!
I have a question how are the levels of filters are defined ?
Very good explaination. Thank you.
Can we implement this CNN to determine micro-level profiles, i.e., micrometer level?
Well if the beer videos ever stop Martin you have a career in IT Vlogging 😁
love this explanation ...
You made it easy to understand. Very helpful. Thanks a lot :)
so by combining the other video of yours. At the end of the the CNN there will be a discriminator which has been trained to know what a house looks like, what an apartment looks like, what a skyscraper looks like and therefore tells you that is a house ?
Thanks. Great learning Video.
certo, curiosidade: Se tratando de pessoas gêmeas ou sei lá trigêmeas univitelinos como diferencia-las pela CNN? Outro detalhe com relação aos filtros, suponhamos que temos objetos sobre as retas por exemplo como identifica-las neste processo com tão vastas imagens possíveis de armazena-las?
Hi ,I'm a maths student and I need to do a project. the theme is games and sport. I saw your video and thought why not apply this technique to the world of sports? to discover from the analysis of the players' movements if one is sick. Can you help me to apply CNN and use it well please.
Don't ask him. His explaination is sloppy and incomplete. The convolution operations with the filters produce matrix channels building the tensor. For example after four convolution operations, you should have four matrix channels. The next operation would be a max pooling operation on each matrix channel in the tensor. Please let me know if you have a question.
Very clear and right-to-the-point explanation! Thank you!
This is too low level and vague for people who need it and too high level and complicated for children, I believe that you should go more in depth to provide more information such as how the convolution works, different activation methods and different types of layers
It is just an introduction. If one wants to learn the details, they can search for textbooks, I believe there are countless available.
Then actually go and study CNNs. This is a brief overview of how they work.
These videos are for 2 demographics, young adults/teenagers who find AI technology fascinating and want to understand how it works. And for children to spark the flame of the scientist inside them towards AI development when they grow up. The Second reason is the most important.
I genuinely needed a 2 minute explanation of this term and a few others. I guess I'm the target audience.
Damn that was crystal clear.
This guy gives crystal clear explanations. Supremely Clear!
Doesn't it require a lot of manual work to make all those filters? Isn't it better to just run everything through a regular neural network?
That's the neat part - you don't manually make those filters. Those filters are learned by the network based on bounding boxes in the annotated training images.
Utterly well done, our IBM ML specialist!
Is he writing backwards...! impressive
No, obviously.
Identifying, organizing and reaping to thought.
Your tv CAN communicate with you via your neurons producing electromagnetic waves
such an easy, clear and to the point explanation! thanks a lot
is this what the vision pro uses?
this video hits different if you are currently taking digital image processing course. I feel smart lol
Machine learning is truly amazing yet it pales into insignificance when compared to the ability of this chap to write backwards.
I cant tell whether you're joking, but I think the video is flipped horizontally
What kind of bord do u use to write
See ibm.biz/write-backwards
thanks martin for the clear explanations
you are amazing
The best explanation ever.
Highly insightful
The volume is a bit quiet here.
clearly understandable 🙏🙏🙏
finally ! bravo. clear and concise
Explained this video very well - highly recommend! Thank you
Will the Activation Functions video come?
Great explanation! Great job; thanks!
Thanks really helpful
This explanation is good. Thanks. 😊
can you help me regarding my project "human pose estimation"
Hi Rasel! What sort of help would you need? 🙂
@@IBMTechnology i have to detect human pose estimation through skeletal data extracted from it
oh my god, thankyou for the explanation. Easy to understand
Superb explaination
Application of successive Convolutional Filters well presented but at a high level only
great work explaining!
Great video! Thanks 👍🏼
This was so great thank you
Thank you
perfect explanantion. I hate it when people throw difficult terms around. Why can't it be precise and clear such as using a house as an analogy. Well done!
Wow such a comprehensive content on CNN!
Very good explanation!
Awesome explanations ! ... thank you for sharing your knowledge ;))
amazing work. thank u!
Waiting to learn more from you
Great video 🔥
Thank you :)
More please ☺️☺️
Definitely what we're planning! 😀 In the meantime, feel free to subscribe to get notified of when we post more videos.
Great content
It's just like our brain recognises objects. Can we make conscious using this technique? Probably yes in future
thank you :)
All I can think of is... that how good he is in writing everything mirrored....
AWESOME! Thanks :)
Funny guy. Love him
thanks
Thanks a lot!
This man rocks 🤘
clear and concise bigger picture of CNN
Thank you..!!
Wait, that's a house? I thought it was the head of a tin robot.
Master Inventor. Cool :)
that was a simple wow,,,,
Clifton Well