Decoder-Only Transformer for Next Token Prediction: PyTorch Deep Learning Tutorial
ฝัง
- เผยแพร่เมื่อ 11 พ.ย. 2024
- In this tutorial video I introduce the Decoder-Only Transformer model to perform next-token prediction!
Donations, Help Support this work!
www.buymeacoff...
The corresponding code is available here! (Section 14)
github.com/Luk...
Discord Server:
/ discord
Super helpful. Deserves way more views!
Thank you for creating such great content. I have a question though. Does this mean that decoder-only transformer architecture can be implemented with the torch.nn.TransformerEncoder class, passing `is_causal=True` for the forward call?
You'll also need to pass an attention mask as well
@@LukeDitria From Pytorch doc: "is_causal (Optional[bool]) - If specified, applies a causal mask as mask . " I think you can also only use this flag for a conventional causal mask. But okay, funny that to create GPT-like ( = decoder only model ) we should use the TransformerEncoder class 😄