- 11
- 210 764
Animated AI
United States
เข้าร่วมเมื่อ 9 มิ.ย. 2022
Demystifying artificial intelligence and neural networks by replacing equations with animations.
Multihead Attention's Impossible Efficiency Explained
If the claims in my last video sound too good to be true, check out this video to see how the Multihead Attention layer can act like a linear layer with so much less computation and parameters.
Patreon: www.patreon.com/Animated_AI
Animations: animatedai.github.io/
Patreon: www.patreon.com/Animated_AI
Animations: animatedai.github.io/
มุมมอง: 5 196
วีดีโอ
What's So Special About Attention? (Neural Networks)
มุมมอง 6K7 หลายเดือนก่อน
Find out why the multihead attention layer is showing up in all kinds of machine learning architectures. What does it do that other layers can't? Patreon: www.patreon.com/Animated_AI Animations: animatedai.github.io/
Pixel Shuffle - Changing Resolution with Style
มุมมอง 9Kปีที่แล้ว
Patreon: www.patreon.com/Animated_AI Animations: animatedai.github.io/#pixel-shuffle
Source of confusion! Neural Nets vs Image Processing Convolution
มุมมอง 4.6Kปีที่แล้ว
Patreon: www.patreon.com/Animated_AI All Convolution Animations are Wrong: th-cam.com/video/w4kNHKcBGzA/w-d-xo.html My Animations: animatedai.github.io/ Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/ Bee image: catalyststuff on Freepik www.freepik.com/free-vector/cute-bee-flying-cartoon-vector-icon-illustration-animal-nature-icon-concept-isolated-premium-vector_31641108.htm#q...
Groups, Depthwise, and Depthwise-Separable Convolution (Neural Networks)
มุมมอง 37Kปีที่แล้ว
Patreon: www.patreon.com/Animated_AI Fully animated explanation of the groups option in convolutional neural networks followed by an explanation of depthwise and depthwise-separable convolution in neural networks. Animations: animatedai.github.io/ Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/
Stride - Convolution in Neural Networks
มุมมอง 8Kปีที่แล้ว
Patreon: www.patreon.com/Animated_AI A brief introduction to the stride option in neural network convolution followed by some best practices. Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/
Convolution Padding - Neural Networks
มุมมอง 9K2 ปีที่แล้ว
Patreon: www.patreon.com/Animated_AI A brief introduction to the padding option in neural network convolution followed by an explanation of why the default is named "VALID". Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/
All Convolution Animations Are Wrong (Neural Networks)
มุมมอง 65K2 ปีที่แล้ว
Patreon: www.patreon.com/Animated_AI All the neural network 2d convolution animations you've seen are wrong. Check out my animations: animatedai.github.io/
Filter Count - Convolutional Neural Networks
มุมมอง 16K2 ปีที่แล้ว
Patreon: www.patreon.com/Animated_AI Learn about filter count and the realistic methods of finding the best values My Udemy course on High-resolution GANs: www.udemy.com/course/high-resolution-generative-adversarial-networks/?referralCode=496CFB7F680D78F02798
Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution
มุมมอง 29K2 ปีที่แล้ว
Patreon: www.patreon.com/Animated_AI Find out what the Kernel Size option controls and which values you should use in your neural network.
Fundamental Algorithm of Convolution in Neural Networks
มุมมอง 22K2 ปีที่แล้ว
Patreon: www.patreon.com/Animated_AI See convolution in action like never before!
This is such an amazing video. Thank you.
straight heat i can't even lie good stuff bro
fye
I feel silly for asking, but the different colored blocks (in the middle) correspond to convolutions over different channels of the original matrix right?
i finally understood why convolution makes more channels. thank you so so much
So the point he is trying to make is that an "image" is represented as a 3d object/array , that is height(pixels) , width (pixels) and RGB components, but when we talk about 2d images we usually means a "grayscale image" , which doesn't require the RGB part mentioning explicitly, although it is still a 3d image , . So for a colored image of 128x128 pixels , tenser shape would be (128,128,3) & for a grayscale image it would be (128,128,1)
Is feature and filter count the same as PyTorch channel size?
The convolution is the same, you're just using different dimensions and then using the same dimension label for the operation. You're comparing apples to oranges.
these are the best animations i've seen about neural nets, i hope we can get a such a clear video like the one in separable dephwise convolutions but for attention
An absolute pinnacle of online education materials in the field, when it comes to giving a real gut intution of what do operation look like 🙌 its a real talent you got there, thank you on behalf of the rest of the internet for using it well
This helped so much, you can't understand how thankful I am
game changer for anyone learning neural networks
Well explained.
Nice video, you should post more .
Thanks for that. It was really confusing before your animation came up!
So in case of a feature map input, 2d conv just replicate each 2d filter along the feature dimension and do multiplication wise? In the video, the filters are 2d really just replicate to fill in the the number of features? or does each 2d filter is in reality a 3d tensor to match the feature dimension?
I have been struggling to mentally visualize convolutions, specially going from one dimension to others. I was reading the book Understanding Deep Learning by Simon Prince and I realized what i thought i looked like was wrong ( The 2D to 2D animations from the beginning). I wish I would have stumbled upon yours before having to imagine what was explained in the books XD (Good book tho)
Such a beautiful video
Thank you. Great job
wanna bet e by e would be somehow mathematically optimal?
Coming here after "All convolution animations are wrong", brilliant work! thank you very much!
The other drawings and visuals can't keep up with this! Great content! I love the visualizations!
My second favourites 3Brown1Blue channel
Please do a video on backpropagation (since its another convolution)
thx
Why is your input tensor so many dimensions? Shouldn’t the depth be only 3 (1 for each color channel)?
These videos are outstanding! Finally, true visualisations that get it right. I'm sharing these with my ML Masters students. Thank you for your considerable effort putting these together.
I like it, really, love it! But... I don't see what's wrong with other illustrations and peculiarly I think yours just iterates what they already clearly illustrate. I was even expecting CNN representations in XYZ visuals. Am I missing some points here? Honest question, would appreciate any enlightenment! (btw, thank you for sharing the world with your own version of splendid animation!) PS: If you're up for the challenge, do Spiking NN, I'll buy you a beer in Bali!
thx!!
Its funny you mention that the number of kernels is the least exciting part, my thesis was an attempt on finding a systematic way to reduce the number kernels by correlating them and discarding kernels that “extract roughly the same features”. Great video!
jiff
The reason the filter at 3:00 being 2D gets glossed over is because most image signal processing is taught in grayscale
2:13 how does it stay at the same size? Padding the edges of the original image?
What actually is the 3rd dimension in this context for the source giant cube? Is that multiple colors? A batch of multiple images?
Lol grayscale is a real thing still! Medical and microscopy imaging
this is awesome and is inspiring me to learn blender!
0:16 If filters are stored in a 4-dimensional tensor and one of them represents the number of filters, then what does the depth represent?
it represents the depth of the input tensor
I have some hobbyist signal processing experience of a few decades, and these new methods seem so amateurish compared to what we had in the past. FFT, FHT, DCT, MDCT, FIR filters, IIR filters, FIR design based on frequency response, edge adapted filters (so no need for smaller outputs), filter banks, biorthogonal filter banks, window functions, wavelets, wavelet transforms, laplacian pyramids, curvelets, counterlets, non-separable wavelets, multiresolution analysis, compressive sensing, sparse reconstruction, SIFT, SURF, BRISK, FREAK, yadda yadda. Yes we even had even length filters, and different filters for analysis than for synthesis.
There are Equivalents in AI Model Development and Inference for those though. Many of these signal processing techniques have analogs or are directly applicable in AI and machine learning: - **FFT, FHT, DCT, and MDCT:** Used in feature extraction and preprocessing steps for machine learning models, especially in audio and image processing. - **FIR and IIR Filters:** Used in preprocessing steps to filter and clean data before feeding it into models. - **Wavelets and Wavelet Transforms:** Applied for feature extraction and data compression, useful in handling time-series data. - **Compressive Sensing and Sparse Reconstruction:** Important in developing models that can work with limited data and in reducing the dimensionality of data. - **SIFT, SURF, BRISK, and FREAK:** Feature detection and description techniques that are foundational in computer vision tasks like object recognition and image matching. In AI, techniques like convolutional neural networks (CNNs) often use concepts from signal processing (like filtering and convolutions) to process data in a way that mimics these traditional methods. Signal processing principles help in designing more efficient algorithms and models, improving performance in tasks such as image recognition, speech processing, and time-series analysis.
Incridibly helpful, keep up the good work!
I'm confused because you make it look like an attention layer could be used as a drop-in replacement for a linear layer but GPT-4o says: "No, an attention layer cannot be used as a direct drop-in replacement for a linear layer due to the fundamental differences in their functionalities and operations."?
That’s correct that an attention layer is not functionally equivalent to a linear layer. This efficiency comes with its own trade-offs. But it’s going to make more sense to talk about those trade-offs a couple more videos down the line in this series, so I didn’t go over them in this video.
@@animatedai Thanks for clearing that up, also I ran some quick tests comparing the performance of a pytorch MultiheadAttention layer with a Linear layer and the linear layer is significantly faster on CPU and GPU in every test i can run so i hope that's something you could clarify in a future video. Looking forward to the next one!
Sir, Thanks for doing god's work!!! I wonder why this channel has so few viewers; it deserves to be known by more people. Deep Learning is much simpler if learned from this guy. Honestly, I truly admire you for taking the time to research and visualize something so complex, making it easy for everyone to understand.
This is so unfairly underrated, I have never seen such a good video about CNNs.
Which is correct?????
Stride value was 2 pixel
You did not introduce convolution in any informative way, nor define any terms for your argument, and you didn't explain the purpose of 3D convolution or why 2D convolution is inaccurate in the first place. There is also no closing argument for what appears to be your proposition for the proper illustration of CNN. This whole video is completely open ended and thus ambiguous.
Loved the animation thank you!!
Excelent!!
cool even better add names to the objects like kernel etc would be helpful to new people
I adore your content, genuinely can’t wait for more videos of your visualizations. Feels like I’m building real intuition about what I’m doing watching you :)
keep it up great videos