Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

Attention in transformers, visually explained | DL6

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

LIVE🔴 : Cambodia vs Timor-Leste | ASEAN Championship 2024 | 17.12.24

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Angelos Katharopoulos

มุมมอง 6 516

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ม.ค. 2025

ความคิดเห็น • 8

@hbenchek 4 ปีที่แล้ว ⁺²
Great paper overall ! Benchmarks on NMT or MLM tasks would be appreciated :)
@konataizumi5829 10 หลายเดือนก่อน ⁺²
Hello! Thank you for the interesting paper. On the last equation of the "Linear Attention" slide, why do you not cancel out the Φ(Qi)^T in the equation since it's in both the numerator and the denominator?
@artemignatov604 3 ปีที่แล้ว ⁺¹
Dear Angelos, it's the great method, but I did not find in your repo practical examples of using such transformers builders. Could you please add some examples related to NLP task such intent recognizer? May be it will be fast to take some ready tokenizer like in hugging-face and combine with your method? Also will be great to unveal some routines for train, test such modules. In video you have mentioned 2 experiments with MNIST and CIFAR-10 but in repo no such train/test examples exist.
@임진우-w3f 3 ปีที่แล้ว
Ingenious idea! Admirable.
@这啥呀这是 ปีที่แล้ว
Hi, is there a pretrained weights we can download? Or do we have to train from scratch, thanks!
@swfsql ปีที่แล้ว
I guess you could finetune from a normal softmax-based attention model, continuing from the same qkv weights.
@aBigBadWolf 4 ปีที่แล้ว
You are using a custom cuda kernel and compare it to non-custom cuda implementations of the transformer and lsh? That is misleading.
@AKatharas 4 ปีที่แล้ว ⁺¹⁵
Thanks for the interest in the paper. I would disagree that it is misleading. Our kernel is nowhere near as optimized as the default CUDNN GEMM implementations. Also we were as transparent as possible in the paper and we see the most important contribution to be the formulation instead of the CUDA implementation. Finally, the CUDA kernel is only used for training and not for inference that is implemented with the provided PyTorch operations.

ต่อไป

เล่นอัตโนมัติ

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

LIVE🔴 : Cambodia vs Timor-Leste | ASEAN Championship 2024 | 17.12.24

LIVE🔴 : Cambodia vs Timor-Leste | ASEAN Championship 2024 | 17.12.24

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

หลอกเพื่อนจับอึ #funny #แกล้ง #แกล้งเพื่อน #อึ #เพื่อนแกล้ง #ละคร

Theoretical Foundations of Graph Neural Networks

Theoretical Foundations of Graph Neural Networks

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

How might LLMs store facts | DL7

How might LLMs store facts | DL7

Sequence Models Complete Course

Sequence Models Complete Course

Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)

Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Theoretical and Practical Insights from Linear Transformers

Theoretical and Practical Insights from Linear Transformers

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

Live! ถ่ายทอดสดหวย ถ่ายทอดสดการออกรางวัลสลากกินแบ่งรัฐบาล งวดวันที่ 16 ธันวาคม 2567

Live! ถ่ายทอดสดหวย ถ่ายทอดสดการออกรางวัลสลากกินแบ่งรัฐบาล งวดวันที่ 16 ธันวาคม 2567

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

ซินเดอเรลล่ากลายเป็นภรรยาของลุงสุดหล่อหลังจากคืนโรแมนติกนั้น ไม่รู้ว่าเธอได้พบกับมหาเศรษฐี

ซินเดอเรลล่ากลายเป็นภรรยาของลุงสุดหล่อหลังจากคืนโรแมนติกนั้น ไม่รู้ว่าเธอได้พบกับมหาเศรษฐี

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

ต้าห์อู๋-ออฟโรด ขอฝึกวิชาเซียน จับหมูป่ามือเปล่า | เฮ็ดอย่างเซียนหรั่ง FULL EP.21 | One Playground

ต้าห์อู๋-ออฟโรด ขอฝึกวิชาเซียน จับหมูป่ามือเปล่า | เฮ็ดอย่างเซียนหรั่ง FULL EP.21 | One Playground