Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution

Stride - Convolution in Neural Networks

Kolmogorov Arnold Networks (KAN) Paper Explained - An exciting new paradigm for Deep Learning?

มี พลัง 4 อย่างคุณจะเลือกอะไร? เลือกเลย!

Total ok LOL😂😂#trend #magic #tutorial #backstage #creative#shorts

คนขับมีวันไนท์สแตนกับมหาเศรษฐีและถูกเธอปฏิเสธ แต่จู่ๆ เขากลายเป็นจักรพรรดิที่เดินทางโดยไม่ระบุตัวตน

Multihead Attention's Impossible Efficiency Explained

Animated AI

มุมมอง 5 221

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 30 พ.ย. 2024
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 20

@Azanixu 6 หลายเดือนก่อน ⁺²⁰
How the hell do you have so little views? This is both one of the best and factually correct animations out there.
@ShadeAKAhayate 6 หลายเดือนก่อน ⁺¹
He'll get there. Quality informational content always picks up slowly, but as long as quality is not declining, its growth is exponential. To a limit, obviously, since this information is specialized, but this limit is high.
So as more teachers discover these illustrations and pass these on to their students, it will grow.
@mMaximus56789 หลายเดือนก่อน
these are the best animations i've seen about neural nets, i hope we can get a such a clear video like the one in separable dephwise convolutions but for attention
@thecodegobbler2179 3 หลายเดือนก่อน
The other drawings and visuals can't keep up with this!
Great content! I love the visualizations!
@clutchplayz1180 11 วันที่ผ่านมา
straight heat i can't even lie good stuff bro
@coryfan5872 6 หลายเดือนก่อน ⁺²
Saying that Multihead Attention has less parameter than a token-wise linear is true for NLP models but not true for ViT. Additionally, simply creating a mechanism which incorporates the entirety of the features does not explain away the success of attention mechanisms -- looking again at computer vision tasks, MLP Mixer also incorporates the entirety of the features in its computations, but is still less successful than the attention based ViTs. One part of the strength of the attention layer is its adaptability -- which you can see the value of in things like GAT. Otherwise, it could just be replaced with a generic low-rank linear.
@vastabyss6496 6 หลายเดือนก่อน ⁺²
4:27 I'm definitely judging the animation of the recurrent layer...
@jaredtweed7826 6 หลายเดือนก่อน ⁺¹
I have been waiting for this video! Very much worth the wait!
@rafa_br34 5 หลายเดือนก่อน ⁺¹
This is so unfairly underrated, I have never seen such a good video about CNNs.
@__-de6he 6 หลายเดือนก่อน
It would be good to know the rational behind such way of calculation besides computational efficiency.
@zukofire6424 6 หลายเดือนก่อน
Thanks! great explanation :)
@mdnaseif7599 6 หลายเดือนก่อน
You are a legend keep it up!
@KayNg-o9n 6 หลายเดือนก่อน
I felt like a lot of work has been put into making the animations for the series and I should leave learning something, but somehow I am left more confused after watching the entire series than before I started.
Not sure if this is due to a need to visualize something that cannot be represented in 3D space, knowledge gap created due to assumptions used during the explanation process, or I am simply too stupid.
@commanderlake7997 5 หลายเดือนก่อน ⁺¹
I'm confused because you make it look like an attention layer could be used as a drop-in replacement for a linear layer but GPT-4o says: "No, an attention layer cannot be used as a direct drop-in replacement for a linear layer due to the fundamental differences in their functionalities and operations."?
@animatedai 5 หลายเดือนก่อน ⁺¹
That’s correct that an attention layer is not functionally equivalent to a linear layer. This efficiency comes with its own trade-offs. But it’s going to make more sense to talk about those trade-offs a couple more videos down the line in this series, so I didn’t go over them in this video.
@commanderlake7997 5 หลายเดือนก่อน
@@animatedai Thanks for clearing that up, also I ran some quick tests comparing the performance of a pytorch MultiheadAttention layer with a Linear layer and the linear layer is significantly faster on CPU and GPU in every test i can run so i hope that's something you could clarify in a future video. Looking forward to the next one!
@BooleanDisorder 6 หลายเดือนก่อน
Computational efficiency is also due to higher dimensionality, right? You can represent data in a much richer space compared to RNN's of similar parameter size and capture more complex features due to this higher dimension space that's enabled by each attention layer. That said I might be unfair to RNN's since they have such bad long-range dependency and "physically" can't do the same stuff even if it wanted.
@RobertMStahl 6 หลายเดือนก่อน
FWIW, have you seen the recent business presentation given by Randell L Mills who can explain the reality of N electron, 4 having the solution to EVERYTHING?
@wilfredomartel7781 6 หลายเดือนก่อน
😊
@ucngominh3354 6 หลายเดือนก่อน
hi

ต่อไป

เล่นอัตโนมัติ

Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution

Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution

Stride - Convolution in Neural Networks

Stride - Convolution in Neural Networks

Kolmogorov Arnold Networks (KAN) Paper Explained - An exciting new paradigm for Deep Learning?

Kolmogorov Arnold Networks (KAN) Paper Explained - An exciting new paradigm for Deep Learning?

มี พลัง 4 อย่างคุณจะเลือกอะไร? เลือกเลย!

มี พลัง 4 อย่างคุณจะเลือกอะไร? เลือกเลย!

Total ok LOL😂😂#trend #magic #tutorial #backstage #creative#shorts

Total ok LOL😂😂#trend #magic #tutorial #backstage #creative#shorts

คนขับมีวันไนท์สแตนกับมหาเศรษฐีและถูกเธอปฏิเสธ แต่จู่ๆ เขากลายเป็นจักรพรรดิที่เดินทางโดยไม่ระบุตัวตน

คนขับมีวันไนท์สแตนกับมหาเศรษฐีและถูกเธอปฏิเสธ แต่จู่ๆ เขากลายเป็นจักรพรรดิที่เดินทางโดยไม่ระบุตัวตน

ONE ลุมพินี 89 Full Fight | 29 พ.ย. 2567 | Ch7HD

ONE ลุมพินี 89 Full Fight | 29 พ.ย. 2567 | Ch7HD

Groups, Depthwise, and Depthwise-Separable Convolution (Neural Networks)

Groups, Depthwise, and Depthwise-Separable Convolution (Neural Networks)

What is OpenTelemetry?

What is OpenTelemetry?

Source of confusion! Neural Nets vs Image Processing Convolution

Source of confusion! Neural Nets vs Image Processing Convolution

How to train simple AIs

How to train simple AIs

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

Variational Autoencoders | Generative AI Animated

Variational Autoencoders | Generative AI Animated

Generative Model That Won 2024 Nobel Prize

Generative Model That Won 2024 Nobel Prize

I Never Understood Why Gravity Bends Light Even Without Mass… Until Now!

I Never Understood Why Gravity Bends Light Even Without Mass… Until Now!

How is THIS Coding Assistant FREE?

How is THIS Coding Assistant FREE?

Which one made you like this video?#keyboard

Which one made you like this video?#keyboard

เทคนิคถ่ายภาพแฟนคนกับดอกไม้ด้วยมือถือ #doodeevdo #smartphone #สอนถ่ายภาพ #สอนถ่ายรูป

เทคนิคถ่ายภาพแฟนคนกับดอกไม้ด้วยมือถือ #doodeevdo #smartphone #สอนถ่ายภาพ #สอนถ่ายรูป

Review Máy Đếm Tiền Tính Tổng #shorts

Review Máy Đếm Tiền Tính Tổng #shorts

พรีวิว vivo X200 และ X200 Pro - ต่างกันยังไง เลือกรุ่นไหนดี ??

พรีวิว vivo X200 และ X200 Pro - ต่างกันยังไง เลือกรุ่นไหนดี ??

ซื้อโทรศัพท์อะไรให้แม่ดี

ซื้อโทรศัพท์อะไรให้แม่ดี

CONFIGURATION💘PERFECTA🔔para✅ SAMSUNG😊A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 #shorts

CONFIGURATION💘PERFECTA🔔para✅ SAMSUNG😊A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 #shorts

Add Capacitor get HighVoltage ⚡️⚡️⚡️🤯 (how to get high voltage)

Add Capacitor get HighVoltage ⚡️⚡️⚡️🤯 (how to get high voltage)

The future of faster 3D Printing

The future of faster 3D Printing