Decoder-Only Transformer for Next Token Prediction: PyTorch Deep Learning Tutorial

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 พ.ย. 2024
  • In this tutorial video I introduce the Decoder-Only Transformer model to perform next-token prediction!
    Donations, Help Support this work!
    www.buymeacoff...
    The corresponding code is available here! (Section 14)
    github.com/Luk...
    Discord Server:
    / discord

ความคิดเห็น • 4

  • @RahulMarchand
    @RahulMarchand 3 หลายเดือนก่อน +2

    Super helpful. Deserves way more views!

  • @jhauret
    @jhauret หลายเดือนก่อน

    Thank you for creating such great content. I have a question though. Does this mean that decoder-only transformer architecture can be implemented with the torch.nn.TransformerEncoder class, passing `is_causal=True` for the forward call?

    • @LukeDitria
      @LukeDitria  หลายเดือนก่อน

      You'll also need to pass an attention mask as well

    • @jhauret
      @jhauret หลายเดือนก่อน

      ​@@LukeDitria From Pytorch doc: "is_causal (Optional[bool]) - If specified, applies a causal mask as mask . " I think you can also only use this flag for a conventional causal mask. But okay, funny that to create GPT-like ( = decoder only model ) we should use the TransformerEncoder class 😄