Attention in transformers, visually explained | DL6

Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

The White Lotus Season 3 | Official Teaser | Max

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

CS4780 Transformers (additional lecture 2023)

Kilian Weinberger

มุมมอง 7 248

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ธ.ค. 2024

ความคิดเห็น •

@itachi4alltime ปีที่แล้ว ⁺¹⁰
He's back
@saisitaramanapradyumnamall5408 11 หลายเดือนก่อน
Thank you for this amazing series of lectures Professor!
@hrithiksingh6931 ปีที่แล้ว
Thankyou so much professor , learnt a lot from your videos. You are one of the best teachers I have seen in my life. Once again thankyou so much.
@AakarshNair 5 หลายเดือนก่อน
wow, you are a good teacher!
@trantandat2699 ปีที่แล้ว ⁺¹
The best transformer ever !
@coolarun3150 ปีที่แล้ว ⁺¹
crisp and clear!!!
@jonathnalee1790 ปีที่แล้ว
Missed this one cuz of a final; thanks for uploading this!
@mimunzar ปีที่แล้ว
Thank you a loooot! :)
@lotfullahandishmand4973 ปีที่แล้ว
Was waiting for this, after that machine learning course.
@fierydino9402 ปีที่แล้ว ⁺⁴
Professor thank you a lot! Do you have a plan to upload the lecture of diffusion model too?
@vivi412a8nl 10 หลายเดือนก่อน
I have a question regarding masked multi-head attention around 53:30, if the outputs are generated one by one, then how can the word 'bananas' knows about the word 'cherries' (because at the time bananas is generated, cherries is not yet generated) and be modified by it? ie., why do we have to worry about cherries modifying bananas (aka having information about the future) if cherries hasn't even existed at that point?
@kilianweinberger698 10 หลายเดือนก่อน
In some sense it is really all a speed-up. The moment cherry comes along, it could modify bananas. However, you don't want this, because you want to avoid re-computing all the representations of all the words you have already generated. If you do the masked attention, then you are safe, and you can re-use the representation you computed for bananas when cherry didn't even exist. Does this make sense?
@vivi412a8nl 10 หลายเดือนก่อน
@@kilianweinberger698 Thank you Professor that makes a lot of sense, I never thought about the idea of avoiding recalculation. Thank you again for making these great materials available for free.
@anas.2k866 5 หลายเดือนก่อน
@@vivi412a8nl I think the masked self-attention is there for not allowing the model to cheat. More clearly, during training, let say you have a sentence of 10 tokens, the model will output 10 vectors each one of them will contribute to compute the loss for that sentence. the loss of this sentence is the sum of the losses of each output of the model. If the output of token number 5 is an average of all input then it is easy for the model to predict token 6. You don't want this because in inference you will not have access to token 6. You want your model to model human language not to cheat by looking to future training tokens that you will not have during real time inference.
May be professor Weinberger can confirm this?
@suvkaka 10 หลายเดือนก่อน
@kilianweinberger698 Sir, How do we ensure that adding pos encoding does not distort the original embedding too much? or how is that the sums of embedding and positional encoding of different tokens do not collide?
@kilianweinberger698 10 หลายเดือนก่อน ⁺¹
It can change the encoding a little, and lately people have started developing alternatives. However, in general it isn’t really a big problem, because the positional embedding is always exactly the same for every training sequence, so the network can easily learn to remove it.
@suvkaka 10 หลายเดือนก่อน
@@kilianweinberger698 Thank you professor
@peterengel8601 ปีที่แล้ว ⁺²
Hi professor
@mykun8737 ปีที่แล้ว
Dear Kilian, the lectures in university classrooms can be quite challenging to follow. Could you please create a specialized course on machine learning and deep learning in the form of short video lectures with accompanying presentation slides? If possible, I would love to see these courses published on platforms like Udemy, for example.
@kilianweinberger698 ปีที่แล้ว ⁺⁵
I did! ecornell.cornell.edu/certificates/technology/machine-learning/
Note that Cornell does charge tuition for it, but you will also earn an official certificate.
@mykun8737 ปีที่แล้ว
I've visited the website you recommended, but I'm struggling to figure out how to learn there. If possible, could you consider moving the videos you've created to a platform like Udemy? Udemy has a large community of computer science students, and you might attract more students there@@kilianweinberger698
@goldencircle4331 ปีที่แล้ว
Hi,
Is there going to be a recording for Vision Transformers?

ต่อไป

เล่นอัตโนมัติ

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17

Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

The White Lotus Season 3 | Official Teaser | Max

The White Lotus Season 3 | Official Teaser | Max

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

ไฮไลท์ ฟุตบอล ASEAN MITSUBISHI ELECTRIC CUP 2024 : สิงคโปร์ พบ ไทย

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

Machine Learning Lecture 37 "Neural Networks / Deep Learning" -Cornell CS4780 SP17

Machine Learning Lecture 37 "Neural Networks / Deep Learning" -Cornell CS4780 SP17

A tutorial on Bayesian optimization with Gaussian processes

A tutorial on Bayesian optimization with Gaussian processes

CS480/680 Lecture 19: Attention and Transformer Networks

CS480/680 Lecture 19: Attention and Transformer Networks

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

What are Transformer Models and how do they work?

What are Transformer Models and how do they work?

Machine Learning Lecture 28 "Ball Trees / Decision Trees" -Cornell CS4780 SP17

Machine Learning Lecture 28 "Ball Trees / Decision Trees" -Cornell CS4780 SP17

Lecture 12.1 Self-attention

Lecture 12.1 Self-attention

The Return of Procedural Programming - Richard Feldman

The Return of Procedural Programming - Richard Feldman

Machine Learning Lecture 27 "Gaussian Processes II / KD-Trees / Ball-Trees" -Cornell CS4780 SP17

Machine Learning Lecture 27 "Gaussian Processes II / KD-Trees / Ball-Trees" -Cornell CS4780 SP17

Real Vs Mannequin Challenge😱

Real Vs Mannequin Challenge😱

PiXXiE - Pick A Card | OFFICIAL M/V

PiXXiE - Pick A Card | OFFICIAL M/V

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 1

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 1

🔴LIVE กัมพูชา vs ติมอร์-เลสเต | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024 | รอบแรก กลุ่ม A

🔴LIVE กัมพูชา vs ติมอร์-เลสเต | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024 | รอบแรก กลุ่ม A

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming