Each position in the sequence is assigned a vector of the same dimension as the model's hidden states. The vector for each position is filled using sine and cosine functions at different frequencies using some formula. The positional encodings are added to the token embeddings. If the token embedding for a token at position pos is Epos , and the positional encoding for that position is PEpos , the input to the transformer layer becomes Epos + PEpos This addition allows the model to incorporate information about the position of each token in the sequence. The periodic nature of sine and cosine functions ensures that even for long sequences, the positional encodings are distinct enough to capture the position information.
made it easy to understand , thank you
More to come bro 😀
Thank you sir🥰🙌🏻
❤️😍
Brief and informative 🙌
Thank you💗
useful content 🙌👏
Thank you 😍
How does positional encoding keeps track of the ordering of tokens?
Each position in the sequence is assigned a vector of the same dimension as the model's hidden states.
The vector for each position is filled using sine and cosine functions at different frequencies using some formula.
The positional encodings are added to the token embeddings. If the token embedding for a token at position pos is Epos , and the positional encoding for that position is PEpos , the input to the transformer layer becomes Epos + PEpos
This addition allows the model to incorporate information about the position of each token in the sequence.
The periodic nature of sine and cosine functions ensures that even for long sequences, the positional encodings are distinct enough to capture the position information.
@@dsbrain ohk . Got it. Thnks