Transformers - Part 2 - Self attention complete equations

Attention in transformers, visually explained | DL6

Transformers - Part 3 - Encoder

🔴Live โหนกระแส สางบาปด้วยบุญปืน!!! แม่มือยิงแจงหนังคนละม้วน

Part1 🍖หญิงสาวแรงเกินไปจนขว้างรองเท้าลงไปในหม้อไฟของหัวหน้าแก๊ง #shorts #Chinesedrama #drama #fyp

Magic trick REVEALED… 👀😱🤣 | Triple Charm #Shorts

Transformers - Part 1 - Self-attention: an introduction

Lennart Svensson

มุมมอง 18 519

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 29 พ.ย. 2024

ความคิดเห็น • 18

@wangkuanlee3548 3 ปีที่แล้ว ⁺⁸
Superb explanation. This is the clearest explanation of the concept of weight in self-attention I have ever heard. Thank you so much.
@exxzxxe 2 ปีที่แล้ว ⁺¹
A first class explanation of self attention- the best on TH-cam.
@piyushkumar-wg8cv ปีที่แล้ว
Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.
@andrem82 2 ปีที่แล้ว
Best explanation of self-attention I've seen so far. This is gold.
@kencheligeer3448 3 ปีที่แล้ว ⁺³
It is a very brilliant explanation about self-attention!!! Thank you.
@mar-a-lagofbibug8833 3 ปีที่แล้ว ⁺³
Thank you for sharing.
@jhnflory 2 ปีที่แล้ว
Thanks for putting these videos together!
@mustafakocakulak5895 3 ปีที่แล้ว ⁺¹
Best explanation ever :) Thank you
@prasadkendre149 2 ปีที่แล้ว
greatful forever
@euisasriani_01 3 ปีที่แล้ว
Thank you for great explanation. I still don't understand about how to gain Wq ad Wk.
@kacemichakdi3048 3 ปีที่แล้ว ⁺¹
Thank you for your explanation. I just didnt understand how we chose W_k and W_q???
@lennartsvensson7636 3 ปีที่แล้ว ⁺¹
These matrices contain learnable parameters that can be trained using standard techniques from deep learning.
@kacemichakdi3048 3 ปีที่แล้ว
Thank you
@murkyPurple123 3 ปีที่แล้ว ⁺¹
Thank you
@ahmedb2559 ปีที่แล้ว
Thank you !
@po-yupaulchen166 3 ปีที่แล้ว
Great and clear explanation. One question about W_Q and W_K. Since z1 = k1^T *q3 = x1^T * (W_k^T * W_Q) * x2, and W_k and W_Q are trainable matrices, could we just combine it as a matrix like
W_KQ = W_k^T * W_Q to reduce the number of paramters?
@lennartsvensson7636 3 ปีที่แล้ว
What you are suggesting should be possible as long as the matrices are quadratic.
@prateekpatel6082 10 หลายเดือนก่อน
pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she

ต่อไป

เล่นอัตโนมัติ

Transformers - Part 2 - Self attention complete equations

Transformers - Part 2 - Self attention complete equations

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

Transformers - Part 3 - Encoder

Transformers - Part 3 - Encoder

🔴Live โหนกระแส สางบาปด้วยบุญปืน!!! แม่มือยิงแจงหนังคนละม้วน

🔴Live โหนกระแส สางบาปด้วยบุญปืน!!! แม่มือยิงแจงหนังคนละม้วน

Part1 🍖หญิงสาวแรงเกินไปจนขว้างรองเท้าลงไปในหม้อไฟของหัวหน้าแก๊ง #shorts #Chinesedrama #drama #fyp

Part1 🍖หญิงสาวแรงเกินไปจนขว้างรองเท้าลงไปในหม้อไฟของหัวหน้าแก๊ง #shorts #Chinesedrama #drama #fyp

Magic trick REVEALED… 👀😱🤣 | Triple Charm #Shorts

Magic trick REVEALED… 👀😱🤣 | Triple Charm #Shorts

Confronting Ronaldo

Confronting Ronaldo

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

Transformers - Part 7 - Decoder (2): masked self-attention

Transformers - Part 7 - Decoder (2): masked self-attention

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

Transformers explained | The architecture behind LLMs

Transformers explained | The architecture behind LLMs

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Transformer - Part 6 - Decoder (1): testing and training

Transformer - Part 6 - Decoder (1): testing and training

Vision Transformer Basics

Vision Transformer Basics

How to Read Math in Deep Learning Paper?

How to Read Math in Deep Learning Paper?

How to explain Q, K and V of Self Attention in Transformers (BERT)?

How to explain Q, K and V of Self Attention in Transformers (BERT)?

[LIVE] : ONE ลุมพินี 89 | คู่เอก "ยอดไอคิว vs คิริลล์"

[LIVE] : ONE ลุมพินี 89 | คู่เอก "ยอดไอคิว vs คิริลล์"

伪装成一棵树整蛊妹妹，结果妹妹当场怀疑人生竟要揍我？【两只马儿-恶搞姐妹】

伪装成一棵树整蛊妹妹，结果妹妹当场怀疑人生竟要揍我？【两只马儿—恶搞姐妹】

Total ok LOL😂😂#trend #magic #tutorial #backstage #creative#shorts

Total ok LOL😂😂#trend #magic #tutorial #backstage #creative#shorts

อาจารย์ต้อยฝันดีเข้าแน่นอนเน้นให้แล้วตัวไหนตัวจริงเสียงจริงพิสูจน์ได้งวด 1 ธันวาคม 2567

อาจารย์ต้อยฝันดีเข้าแน่นอนเน้นให้แล้วตัวไหนตัวจริงเสียงจริงพิสูจน์ได้งวด 1 ธันวาคม 2567

มีรถผีสิงอยู่ในฟาร์ม | บรึ๋ย | การ์ตูนเด็ก | นายอำเภอลาบราดอร์ | Kids Cartoon | Sheriff Labrador

มีรถผีสิงอยู่ในฟาร์ม | บรึ๋ย | การ์ตูนเด็ก | นายอำเภอลาบราดอร์ | Kids Cartoon | Sheriff Labrador

🔴Live โหนกระแส หรือเค้าจะหาว่าผมเป็นคนกลั่นแกล้ง ไผ่ลิกค์-สิระ แจงผมไปแกล้งอะไรคุณ

🔴Live โหนกระแส หรือเค้าจะหาว่าผมเป็นคนกลั่นแกล้ง ไผ่ลิกค์-สิระ แจงผมไปแกล้งอะไรคุณ

Lamborghini vs Smoke 😱

Lamborghini vs Smoke 😱