Attention in transformers, visually explained | Chapter 6, Deep Learning

Rotary Positional Embeddings: Combining Absolute and Relative

What is the Vision Transformer?

มายคราฟแต่ถ้าผมเห็น "สีส้ม" คลิปนี้จะระเบิด!?

ได้เวลา! พาน้องแฝดฉีดวัคซีนครบ 1 เดือน🫣 [cc] แดนแพทตี้ SS2 | EP.52 |

มายคราฟแต่ถ้าผมเห็น "สีเขียว" คลิปนี้จะระเบิด!?

The Position Encoding In Transformers

The ML Tech Lead!

มุมมอง 494

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 30 ก.ย. 2024
Transformers and the self-attention are powerful architectures to enable large language models, but we need a mechanism for them to understand the order of the different tokens we input into the models. The position encoding is that mechanism! There are many ways to encode the positions, but let me show you the way it was developed in the "Attention is all you need" paper. Let's get into it!

ความคิดเห็น • 5

@TemporaryForstudy 2 หลายเดือนก่อน
nice. but i have one doubt. like how adding sine and cosine values ensuring that we are encoding the positions. like how did the author come to this conclusion
why not other values?
@TheMLTechLead 2 หลายเดือนก่อน ⁺¹
The sine and cosine functions provide smooth and continuous representations, which help in learning the relative positions effectively. For example, the encoding for positions k and k+1 will be similar, reflecting their proximity in the sequence. The frequency-based sinusoidal functions allow the encoding to generalize to sequences of arbitrary length without needing to re-learn positional information for different sequence lengths. The model can understand relative positions beyond the length of sequences seen during training. The combination of sine and cosine functions ensures that each position has a unique encoding. The orthogonality property of these functions helps in distinguishing between different positions effectively, even for long sequences. The different frequencies used in the positional encodings allow the model to capture both short-term and long-term dependencies within the sequence. Higher frequency components help in understanding local relationships, while lower frequency components help in capturing global structures.
Also, sinusoidal functions are differentiable, which is crucial for backpropagation during training. This ensures that the model can learn to use the positional encodings effectively through gradient-based optimization methods.
@math_in_cantonese 2 หลายเดือนก่อน
I have a question, for pos=0 and "horizontal_index"=2, shouldn't it be PE(pos,2) = sin(pos/10000^(2/d_model)) ?
I believe you used the same symbol "i" for 2 different way of indexing, right ?
7:56
@TheMLTechLead 2 หลายเดือนก่อน
Yeah you are right, I realized I made that mistake. I need to reshoot it.
@AlainDrolet-e4z 13 วันที่ผ่านมา
Thank you Damien, and math_in_cantonese
I'm in the middle of writing a short article discussing position encoding.
Damien, feel proud that you are the first reference I quote in the article!
I was just going crazy trying to nail the exact meaning of "i".
In Damien's video it is clear he means "i" the dimension index, and the values shown with sin/cos match.
But now I could not make any logic of this understanding with the equation formulation below:
PE(pos,2i) = sin(pos/10000^2i/dmodel)
PE(pos,2i+1) = cos(pos/10000^2i/dmodel)
If see this as PE(pos, 0) referreing to the first column (column zero)
and, say, PE(pos,5) as referring to the sixth column (column 5), with 5 = 2i+1 => i = (5-1)/2 = 2.
So "i" is more like the index of a (sin,cos) pair of dimensions. Its range is d_model/2.
The original sin (😄, pun intended) is in the Attention is all you need.
There they simply state:
> where pos is the position and i is the dimension
This is wrong, it seems, 2i and 2i+1 are the dimensions.
In any case big thank you Damien, I have watched, many of your videos.
They are quite useful in ramping me up on LLM and the rest.
Merci beaucoup
Alain

ต่อไป

เล่นอัตโนมัติ

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention in transformers, visually explained | Chapter 6, Deep Learning

Rotary Positional Embeddings: Combining Absolute and Relative

Rotary Positional Embeddings: Combining Absolute and Relative

What is the Vision Transformer?

What is the Vision Transformer?

มายคราฟแต่ถ้าผมเห็น "สีส้ม" คลิปนี้จะระเบิด!?

มายคราฟแต่ถ้าผมเห็น "สีส้ม" คลิปนี้จะระเบิด!?

ได้เวลา! พาน้องแฝดฉีดวัคซีนครบ 1 เดือน🫣 [cc] แดนแพทตี้ SS2 | EP.52 |

ได้เวลา! พาน้องแฝดฉีดวัคซีนครบ 1 เดือน🫣 [cc] แดนแพทตี้ SS2 | EP.52 |

มายคราฟแต่ถ้าผมเห็น "สีเขียว" คลิปนี้จะระเบิด!?

มายคราฟแต่ถ้าผมเห็น "สีเขียว" คลิปนี้จะระเบิด!?

รวม 3 ตัวแม่ตึงสุดต่างยุคแม่ตั๊ก-แม่มณี-แม่ชม้อย : 28-09-67 | iNN Top Story

รวม 3 ตัวแม่ตึงสุดต่างยุคแม่ตั๊ก-แม่มณี-แม่ชม้อย : 28-09-67 | iNN Top Story

Understanding How LoRA Adapters Work!

Understanding How LoRA Adapters Work!

Positional Encoding in Transformer Neural Networks Explained

Positional Encoding in Transformer Neural Networks Explained

How many kernel system calls do runtimes make?

How many kernel system calls do runtimes make?

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Understanding How Vector Databases Work!

Understanding How Vector Databases Work!

Transformers explained | The architecture behind LLMs

Transformers explained | The architecture behind LLMs

Vision Transformer Basics

Vision Transformer Basics

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

Live #หนุ่มกรรชัย เผยความรู้สึก หลัง #ลีน่าจังประกาศขอโทษ แต่ถ้ายังฟ้องจะแจ้งความกลับเหมือนกัน

Live #หนุ่มกรรชัย เผยความรู้สึก หลัง #ลีน่าจังประกาศขอโทษ แต่ถ้ายังฟ้องจะแจ้งความกลับเหมือนกัน

🎮ของโคตรดี!!! WARZONEซีซั่นนี้มีแต่ผีและความหลอน👻

🎮ของโคตรดี!!! WARZONEซีซั่นนี้มีแต่ผีและความหลอน👻

+1000 Aura For This Save! 🥵

+1000 Aura For This Save! 🥵

Together with Joy, helps Riley get rid of the tooth decay virus! 👍

Together with Joy, helps Riley get rid of the tooth decay virus! 👍

The neighbor misunderstood the cute Puppy...😨💀 #puppy #horror #cartoon

The neighbor misunderstood the cute Puppy...😨💀 #puppy #horror #cartoon

เกว็นมาอุดหนุนร้าน BEN 10 ตามสั่ง #ตลก #ละครสั้น #ben10 #บ้านกูเอง

เกว็นมาอุดหนุนร้าน BEN 10 ตามสั่ง #ตลก #ละครสั้น #ben10 #บ้านกูเอง

รัฐบาลยัน ! แจกเงิน “ดิจิทัลวอลเล็ต” เฟส 2 แน่นอน | ข่าวเย็นประเด็นร้อน

รัฐบาลยัน ! แจกเงิน “ดิจิทัลวอลเล็ต” เฟส 2 แน่นอน | ข่าวเย็นประเด็นร้อน