Self-Attenion for RNN (1.25x speed recommended)

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

MY REQUEST| SAP TRAINING IN KOLKATA

เอาหรือไม่เอา (ซีเรียล)

ฟังสดเดอะโกสเรดิโอ 24/11/2567 เรื่องเล่าผีเดอะโกส

IRENE 아이린 'Like A Flower' MV

Attention for RNN Seq2Seq Models (1.25x speed recommended)

Shusen Wang

มุมมอง 31 335

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 พ.ย. 2024

ความคิดเห็น •

@sinamansourdehghan1195 2 ปีที่แล้ว ⁺⁸
Your explanation was very clear and useful. I strongly recommend this video if you want to understand the concept of the Attention mechanism in RNNs.
@tahamagdy4932 2 ปีที่แล้ว ⁺²
Shusen Wang, this was extremely beneficial, absolute masterpiece.
@naraendrareddy273 2 ปีที่แล้ว ⁺¹
Thanks man, I've done my last minute prep for the exam through this video.
@teetanrobotics5363 3 ปีที่แล้ว ⁺⁷
Amazing content bro.lots of hard work . thank you so much .please make more AI playlists like NLP, RL , Deep RL ans Meta Learning with these amazing animations.
@bvf8611 หลายเดือนก่อน
extremely clear and easy to follow explanatiom
@FelLoss0 ปีที่แล้ว
You're my hero. Marry me!
Hahaha this is just a comment to let you know that your explanation can easily be the clearest one on TH-cam to understand attention. Keep up the good work! Thanks a mil!
@programmer49 ปีที่แล้ว
The best on TH-cam, thank you very much
@thanser67 2 ปีที่แล้ว
Astonishing pedagogic effort Shusen! That’s a lot of work involved to share knowledge. Kudos !
@yugoi6944 2 ปีที่แล้ว
Thank you for the fruitful lecture!
Instead of α_i, using α_{i,j}=align(h_i, s_j) makes the equation easier to see for me.
But it's super helpful for beginners like me, thanks again!
@yugoi6944 2 ปีที่แล้ว
The same notation was already used in the next next lecture.
Sorry for the redundant comment.
@longdang7791 2 ปีที่แล้ว
So excited. Great supporting material to Goodfellow textbook. I am building my knowledge for the vision Transformer model.
@vent_srikar7360 ปีที่แล้ว
very beautifullly and simply explained ,GGs
@-long- 3 ปีที่แล้ว ⁺²
7:25 I think concatenation before the linear layer is from the paper of Luong et al. 2015. In Bahdanau et al., the authors performed matrix multiplication with linear layers first (on both h and s), then the concatenation.
@kotanvich 2 ปีที่แล้ว
Best explanation I've ever seen
@madhu1987ful 2 ปีที่แล้ว ⁺¹
Amazing explanation 👏 just one question what are A and A prime in this video? h correspond to hidden states of encoder at different time steps
@joshithmurthy6209 ปีที่แล้ว
Very good explanations thank you very much
@archibaldchain1204 2 ปีที่แล้ว ⁺²
I have a question: what is the output of attention and how do you measure the loss?
@sachavanweeren9578 2 ปีที่แล้ว
very well explained ... thank you very much
@HashanDananjaya ปีที่แล้ว
Explained nicely. Thank you.
@lancelotdsouza4705 2 ปีที่แล้ว
Beautifully explained
@likeapple1929 3 ปีที่แล้ว ⁺¹
It could be better if you can address QKV with your notation. I'm new to attention mechanism and I'm getting confused with some of your notations. But the explanation itself is very clear.
@longdang7791 2 ปีที่แล้ว
Which slides or time step are you referring to?
@abhishekswain2502 2 ปีที่แล้ว
This is really good ! Thanks !
@RamazanErdemUysal ปีที่แล้ว
In the decoder part of Seq2Seq with attention model, decoder uses three inputs. At first it uses c0, s0, and x'1 to predict s'1, here s0 is the latent representation of encoder and x'1 is the start sign, s0 and x'1 is different. In the next step uses c1, s1, and x'2 to predict s'2. Aren't s1 and x'2 same here? Because s1 is the previous hidden state and the x'2 is the predicted word, which is like a result of probability distribution based on s1. If I am not wrong, it supposed to use only one of them, or always use the s0. Can someone clarify this?
@Toluclassics 3 ปีที่แล้ว ⁺¹
Best attention video!
@josephwashington8939 3 ปีที่แล้ว
你讲的很清楚！谢谢！
@srinathkumar1452 3 ปีที่แล้ว
Very well explained!
@hoang_minh_thanh 2 ปีที่แล้ว
Hi @ShusenWangEng, which template you have use to create this slide?
I search but cannot found any slide in overleaf like this. Thanks
@alex-m4x4h 10 หลายเดือนก่อน
at 19:26 the number of weights should be m*t+1 or am i getting it wrong ? because we have c0 as well
@RAHUDAS 2 ปีที่แล้ว
I was looking for way to implement the encoder decoder , with attention model , with out using the for loop at decoder stage, is it possible???
@Obbe79 3 ปีที่แล้ว
So good! Thanks
@modai7452 3 ปีที่แล้ว
Excellent video
@iblard 2 ปีที่แล้ว
You mention at 11:19 that x1prime is the start sign, later (15:22) you mention x2prime as obtained in the previous step, but how? You show clearly how to obtain s1 and c1 but not x2prime.
@RamazanErdemUysal ปีที่แล้ว ⁺¹
I am also confused about that. Based on my intuition, using s0, c0, and x1, generated hidden state s1 at the decoder is used to generate a probability distribution over the vocabulary of possible output tokens. And the possible outcome is x'2. Again, x'2 is used together with s1, and c1 to generate s2. My confusion is that, x'2 and s1 carries the same information since x'2 is generated from s1. Therefore I don't see any reason to use both of them.
@RyanMcCoppin 2 ปีที่แล้ว
Very clear lecture. Thank you!
@t3dx 2 ปีที่แล้ว
It is not clear to me what the vector V, used for inner product with tanh of W and hiS0, corresponds.
@longdang7791 2 ปีที่แล้ว
You can review his previous slides about basics of RNN. I guess it is the learnable parameter matrix connecting the inputs to the hidden states.
@pawelsubko7277 3 ปีที่แล้ว ⁺²
What is x' ? And where do you get x'1 from?
@maxwelikow9119 3 ปีที่แล้ว ⁺⁴
x‘1 is the start sign (like an empty space), x‘2 is the first word of the decoder, x‘3 the second and so on.
@anhtotuyet9652 2 ปีที่แล้ว
the video image is too poor, you need to fix it more
@iotsharingdotcom22 2 ปีที่แล้ว ⁺¹
you need to check your internet quality
@avojtech ปีที่แล้ว
How it comes at about 7:27 that s0 is suddenly a vector? In the previous slide you state that s0 = hm. Useless video...
@nidaulhasanati8884 5 วันที่ผ่านมา
I am very enlightened.. thankyou

ต่อไป

เล่นอัตโนมัติ

Self-Attenion for RNN (1.25x speed recommended)

Self-Attenion for RNN (1.25x speed recommended)

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

MY REQUEST| SAP TRAINING IN KOLKATA

MY REQUEST| SAP TRAINING IN KOLKATA

เอาหรือไม่เอา (ซีเรียล)

เอาหรือไม่เอา (ซีเรียล)

ฟังสดเดอะโกสเรดิโอ 24/11/2567 เรื่องเล่าผีเดอะโกส

ฟังสดเดอะโกสเรดิโอ 24/11/2567 เรื่องเล่าผีเดอะโกส

IRENE 아이린 'Like A Flower' MV

IRENE 아이린 'Like A Flower' MV

ขอบิณฑบาตชีวิตสัตว์ ไม่อยากให้ทำกรรมหนัก | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

ขอบิณฑบาตชีวิตสัตว์ ไม่อยากให้ทำกรรมหนัก | #Shorts #เซนสื่อรักสื่อวิญญาณ ปี 2 | #oneคลาสสิก

Transformer Model (1/2): Attention Layers

Transformer Model (1/2): Attention Layers

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

How did the Attention Mechanism start an AI frenzy? | LM3

How did the Attention Mechanism start an AI frenzy? | LM3

seq2seq with attention (machine translation with deep learning)

seq2seq with attention (machine translation with deep learning)

Sequence to Sequence Learning with Encoder-Decoder Neural Network Models by Dr. Ananth Sankar

Sequence to Sequence Learning with Encoder-Decoder Neural Network Models by Dr. Ananth Sankar

Lecture 13: Attention

Lecture 13: Attention

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!

Sequence Models Complete Course

Sequence Models Complete Course

Attention for Neural Networks, Clearly Explained!!!

Attention for Neural Networks, Clearly Explained!!!

น้ำใจสาวช่องเม็ก (ນ້ຳໃຈສາວຊ່ອງເມກ) - กีต้าร์ นิภาพร【OFFICIAL MV】#ช่องเม็กเดอะซีรีส์

น้ำใจสาวช่องเม็ก (ນ້ຳໃຈສາວຊ່ອງເມກ) - กีต้าร์ นิภาพร【OFFICIAL MV】#ช่องเม็กเดอะซีรีส์

ไวรัลหนักมาก! “สุกี้พรศิริ” ขายดีจนต้องเพิ่มเตา เปิดเพียง 3 ชม. พีกสุด 130 คิว | เส้นทางเศรษฐี

ไวรัลหนักมาก! “สุกี้พรศิริ” ขายดีจนต้องเพิ่มเตา เปิดเพียง 3 ชม. พีกสุด 130 คิว | เส้นทางเศรษฐี

เพื่อนผมมันทำได้ไง มันทำไม่ได้ไม่ใช่หรอ?? | Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

เพื่อนผมมันทำได้ไง มันทำไม่ได้ไม่ใช่หรอ?? | Minecraft #minecraft #มายคราฟ #fypシ #minecraftmemes #ตลก

ยิ่งกว่าถูกหวย ! เจอ Threadripper และ RTX 2080 Ti ในถังขยะ #ExtremeIT

ยิ่งกว่าถูกหวย ! เจอ Threadripper และ RTX 2080 Ti ในถังขยะ #ExtremeIT

FIN | เคยบอกแล้วใช่ไหมว่าอย่าพาใครมากินในบ้านของเรา | หวานรักต้องห้าม EP.15 | 3Plus

FIN | เคยบอกแล้วใช่ไหมว่าอย่าพาใครมากินในบ้านของเรา | หวานรักต้องห้าม EP.15 | 3Plus

🔴Live : เกาะติดนับคะแนนเลือกตั้งนายก อบจ.อุดรธานี "เพื่อไทย VS ประชาชน" : Matichon TV

🔴Live : เกาะติดนับคะแนนเลือกตั้งนายก อบจ.อุดรธานี "เพื่อไทย VS ประชาชน" : Matichon TV

[LIVE] : ONE ลุมพินี 88 | คู่เอก "ป้อมเพชร vs อัสลามจอน"

[LIVE] : ONE ลุมพินี 88 | คู่เอก "ป้อมเพชร vs อัสลามจอน"

โหดยิ่งกว่าในหนัง! หึงโหดลวงอดีตแฟน มัดมือ-เท้า ยัดรถเก๋งหนี : Khaosod - ข่าวสด

โหดยิ่งกว่าในหนัง! หึงโหดลวงอดีตแฟน มัดมือ-เท้า ยัดรถเก๋งหนี : Khaosod - ข่าวสด