Lecture 2 - Deep Learning Foundations: the role of over parameterization in DL optimization

MIT Introduction to Deep Learning | 6.S191

BABYMONSTER - 'Love In My Heart' M/V

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

人是不能做到吗？#火影忍者 #家人 #佐助

Lecture 7 - Deep Learning Foundations: Neural Tangent Kernels

Soheil Feizi

มุมมอง 25 454

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 29 ธ.ค. 2024

ความคิดเห็น • 30

@TheAIEpiphany 2 ปีที่แล้ว ⁺⁴
Cool video thanks!
00:00:00 Intro: linear regression
00:23:55 NTKs start here
01:01:33 link between NNs and ODEs (ordinary differential equations)
@debadeepta 4 ปีที่แล้ว ⁺¹⁷
Really nice lecture! I was looking to quickly learn NTKs before diving deep into the original papers and this really helped.
@zl7460 2 ปีที่แล้ว
+1. Most well-explained DL lecture I've seen for a long time
@StratosFair 2 ปีที่แล้ว ⁺²
Incredibly clear lecture, allowed me to fill the gaps in my understanding of NTK. Thank you professor !
@dv019 4 ปีที่แล้ว ⁺⁷
Great video, thank you! To the student asking about Kernels: the word is overloaded. It is used in linear algebra to mean the set of all vectors mapped to 0 by a linear transformation. Sometimes Green's functions in PDEs are called integral kernels. In general a kernel is "the central or most important part of something". I don't like how overloaded the word is either, but c'est la vie.
@DarkNinja-24 2 ปีที่แล้ว ⁺¹
Beautiful explanation!
@weisenjiang9179 3 ปีที่แล้ว ⁺²
great intro to NTK, benefit me a lot
@AyushSharma-ie7tj ปีที่แล้ว
Really nice lecture with a very even pace. Thank you for sharing.
@mstislavmaslennikov326 2 ปีที่แล้ว
The lecturer is imho doing a great job explaining difficult material!
@joonho0 4 ปีที่แล้ว ⁺⁴
Thanks a lot for sharing this lecture!
@sikun7894 3 ปีที่แล้ว ⁺²
Thank you so much for sharing these lectures! Really useful
@itachi7243456 4 ปีที่แล้ว ⁺⁴
These are fantastic, thanks!
@nhl8586 2 ปีที่แล้ว
Super useful for understanding NTK in 15 mins!
@AlexanderGoncharenko-e7o 3 ปีที่แล้ว ⁺¹
Awesome lesson! Straight and clear!
@yuwu7547 2 ปีที่แล้ว
Very useful and easy-catching lecture. Thanks a lot!
@sinaasadiyan 2 ปีที่แล้ว
great explanation, just Subscribed!
@MetaOptimizer 3 ปีที่แล้ว
41:07 Do we consider the large width of parameter (m) in empirical observation as an extremely large network such as GPT3? In other words, could I interpret the meaning of "the width of parameters" as "the number of trainable parameters"? Thank for your valuable lecture :)
@yuzhema2506 2 ปีที่แล้ว
Thanks for the nice lecture! One question: the bias term in the Taylor approximation seems dependent on x, which means for different input x, the bias term varies. This is different from the traditional kernel view where the bias term is the same for different transformed input phi(x). In other words, for NTK, the inputs in the transformed space do not strictly follow the same linear model. How do we interpret such deviation? Thanks
@meghbhalerao5208 2 ปีที่แล้ว
If I understand right, the NTK is derived when we only consider quadratic mse loss, right? can it be generalized to other loss functions?
@sayeedchowdhury11 3 ปีที่แล้ว
thanks for the nice lecture, I have a query, we're evaluating the gradient at w0, does it mean the kernel is evaluated based on gradients obtained from an untrained NN which has just been initialized? i mean is the f(w,x) a trained NN or just an initialized one?
@chongyizheng7758 3 ปีที่แล้ว ⁺¹
Question about the first-order Taylor approximation of neural network: Why the first term f(w_0, x) is not included in the kernel function since it is nonlinear w.r.t. x?
@ramanasubramanyam1110 3 ปีที่แล้ว
The first derivative is included (and called NTK) because it resembles the operation of a kernel on an input, i.e a transformation function mapping to a higher dimension
@chongyizheng7758 3 ปีที่แล้ว
@@ramanasubramanyam1110 Thanks for your reply, but I don't think I am asking for that. Let me clarify: My question is about the constant (the first) term f(w_0, x) at 41:16 instead of the derivative (the second) term in the equation. f(w_0, x) seems also nonlinearly depend on x, why it was excluded in the definition of NTK?
@hw1451 2 ปีที่แล้ว ⁺¹
I think since it's a constant, we can always subtract it from y.
@tanchienhao 2 ปีที่แล้ว
Thanks for the awesome lectures!!
@vi5hnupradeep 2 ปีที่แล้ว
Thankyou so much!
@chenamora1653 3 ปีที่แล้ว
So amazing
@da_lime 2 ปีที่แล้ว
Awesome, thanks!
@freerockneverdrop1236 4 หลายเดือนก่อน
The formula for the neural network in this video should be a 2 level summation instead of one level.
@ihany9061 3 ปีที่แล้ว
lifesaver!

ต่อไป

เล่นอัตโนมัติ

Lecture 2 - Deep Learning Foundations: the role of over parameterization in DL optimization

Lecture 2 - Deep Learning Foundations: the role of over parameterization in DL optimization

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

BABYMONSTER - 'Love In My Heart' M/V

BABYMONSTER - 'Love In My Heart' M/V

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

คุณอยากเรียนเวลาไหนทุกวันไปตลอดชีวิต? เลือกเลย!

คุณอยากเรียนเวลาไหนทุกวันไปตลอดชีวิต? เลือกเลย!

Deep Learning Foundations by Soheil Feizi : Transformers

Deep Learning Foundations by Soheil Feizi : Transformers

Deep Networks Are Kernel Machines (Paper Explained)

Deep Networks Are Kernel Machines (Paper Explained)

Tom Goldstein: "What do neural loss surfaces look like?"

Tom Goldstein: "What do neural loss surfaces look like?"

Learn Machine Learning Like a GENIUS and Not Waste Time

Learn Machine Learning Like a GENIUS and Not Waste Time

Stanford Seminar - Information Theory of Deep Learning, Naftali Tishby

Stanford Seminar - Information Theory of Deep Learning, Naftali Tishby

Theoretical Foundations of Graph Neural Networks

Theoretical Foundations of Graph Neural Networks

Hopfield network: How are memories stored in neural networks? [Nobel Prize in Physics 2024] #SoME2

Hopfield network: How are memories stored in neural networks? [Nobel Prize in Physics 2024] #SoME2

Deep Networks Are Kernel Machines, Pedro Domingos

Deep Networks Are Kernel Machines, Pedro Domingos

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Oren helps Durple escape Pinki in a way you wouldn't expect

Oren helps Durple escape Pinki in a way you wouldn't expect

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

Cat mode activated 🤣

Cat mode activated 🤣

ศึกมวยไทยพันธมิตร 16/12/2024

ศึกมวยไทยพันธมิตร 16/12/2024