Use of Long Text Sequences with LLM’s Trained on Shorter, Part-2 (Attention with Linear Biases)

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ต.ค. 2024
  • Contains.
    Attention with Linear Biases Algorithm.
    Discussions & future Techniques
    References.
    1. Su, Jianlin, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. "Roformer: Enhanced transformer with rotary position embedding." Neurocomputing 568 (2024): 127063.
    2. Press, Ofir, Noah A. Smith, and Mike Lewis. "Train short, test long: Attention with linear biases enables input length extrapolation." arXiv preprint arXiv:2108.12409 (2021).
    3. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).

ความคิดเห็น •