Mor Geva: Transformer Feed Forward Layers are Key-Value Memories, and Build Predictions

This is why Deep Learning is really weird.

Understanding AI from Scratch - Neural Networks Course

My favourite features after a week of testing! #dodgechallenger #rccars #primalchallenger

[DRIP] ‘Woke Up In Tokyo (RUKA & ASA)’ PREVIEW

กล่องปริศนานี้มีความลับอะไรซ่อนอยู่!? เข้าไปถึงกับช็อก! #minecraft#เกมกับshorts #mrwattana

Gail Weiss: Thinking Like Transformers

Formal Languages and Neural Networks Seminar

มุมมอง 16 138

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 19 ต.ค. 2024
Paper presented by Gail Weiss to the Neural Sequence Model Theory discord on the 24th of February 2022.
Gail's references:
On Transformers and their components:
Thinking Like Transformers (Weiss et al, 2021) arxiv.org/abs/... (REPL here: github.com/tec...)
Attention is All You Need (Vaswani et al, 2017) arxiv.org/abs/...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018) arxiv.org/abs/...
Improving Language Understanding by Generative Pre-Training (Radford et al, 2018) s3-us-west-2.a...
Are Transformers universal approximators of sequence-to-sequence functions? (Yun et al, 2019) arxiv.org/abs/...
Theoretical Limitations of Self-Attention in Neural Sequence Models (Hahn, 2019) arxiv.org/abs/...
On the Ability and Limitations of Transformers to Recognize Formal Languages (Bhattamishra et al, 2020) arxiv.org/abs/...
Attention is Turing-Complete (Perez et al, 2021) jmlr.org/paper...
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers (Wei et al, 2021) arxiv.org/abs/...
Multilayer feedforward networks are universal approximators (Hornik et al, 1989) www.cs.cmu.edu...
Deep Residual Learning for Image Recognition (He at al, 2016) www.cv-foundat...
Universal Transformers (Dehghani et al, 2018) arxiv.org/abs/...
Improving Transformer Models by Reordering their Sublayers (Press et al, 2019) arxiv.org/abs/...
On RNNs:
Explaining Black Boxes on Sequential Data using Weighted Automata (Ayache et al, 2018) arxiv.org/abs/...
Extraction of rules from discrete-time recurrent neural networks (Omlin and Giles, 1996) www.semanticsc...
Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples (Weiss et al, 2017) arxiv.org/abs/...
Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning (Rabusseau et al, 2018) arxiv.org/abs/...
On the Practical Computational Power of Finite Precision RNNs for Language Recognition (Weiss et al, 2018) aclanthology.o...
Sequential Neural Networks as Automata (Merrill, 2019) aclanthology.o...
A Formal Hierarchy of RNN Architectures (Merrill et al, 2020) aclanthology.o...
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (Joulin and Mikolov, 2015) proceedings.ne...
Learning to Transduce with Unbounded Memory (Grefenstette et al, 2015) proceedings.ne...
Paper mentioned in discussion at the end:
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth (Dong et al, 2021) icml.cc/virtua...

ความคิดเห็น • 10

@vandarkholme442 3 หลายเดือนก่อน
Awesome analogies for really understanding what is happening under the hood. Thanks!
@swim3936 ปีที่แล้ว ⁺²
fantastic presentation!
@alexanderkyte4675 ปีที่แล้ว ⁺⁷
Could I please have the slides? They’re partially obscured by the listeners here. I’d like to use them for a reading group.
@formallanguagesandneuralne5578 ปีที่แล้ว ⁺³
hey, not managing to respond from my own account so positing from here - the slides are on my website, which is hosted on github - gailweiss dot github dot io
@GodofStories ปีที่แล้ว ⁺²
This is great
@stevenshaw124 ปีที่แล้ว ⁺²
this was an excellent presentation! thank you!
@haksasseeducation9565 2 หลายเดือนก่อน
I don't agree with the slide presented at 21:35 about the input of each head. Actually, each head receives the same output from the previous embedding and positional layer.
@homeboundrecords6955 ปีที่แล้ว ⁺¹
I'll bet this reply will not be read, but... isn't the "subject" = "I" and the "object" = "dog" ?
@LGcommaI ปีที่แล้ว ⁺¹
Yes, that's correct. The terminology is confusing though (IF one knows Latin): the 'subject' literally is 'that which is (thrown) UNDER' while the 'object' is 'that which is (thrown) on top' . Everyday sensibilities thus would expect that the object is the one who does sth. and the subject the one which has sth. done TO it. The standard convention is the OPPOSITE however.
@RaviAnnaswamy ปีที่แล้ว ⁺¹
@@LGcommaI object generally refers to inert things and the 'subject' is used as English word for persons (King asked his subjects to pay more tax during the drought years...). This could be the reason for English grammar using subject for the actor and object for the acted upon (victim).

ต่อไป

เล่นอัตโนมัติ

Mor Geva: Transformer Feed Forward Layers are Key-Value Memories, and Build Predictions

Mor Geva: Transformer Feed Forward Layers are Key-Value Memories, and Build Predictions

This is why Deep Learning is really weird.

This is why Deep Learning is really weird.

Understanding AI from Scratch - Neural Networks Course

Understanding AI from Scratch – Neural Networks Course

My favourite features after a week of testing! #dodgechallenger #rccars #primalchallenger

My favourite features after a week of testing! #dodgechallenger #rccars #primalchallenger

[DRIP] ‘Woke Up In Tokyo (RUKA & ASA)’ PREVIEW

[DRIP] ‘Woke Up In Tokyo (RUKA & ASA)’ PREVIEW

กล่องปริศนานี้มีความลับอะไรซ่อนอยู่!? เข้าไปถึงกับช็อก! #minecraft#เกมกับshorts #mrwattana

กล่องปริศนานี้มีความลับอะไรซ่อนอยู่!? เข้าไปถึงกับช็อก! #minecraft#เกมกับshorts #mrwattana

เพื่อน ๆ ใช้แบบไหนกันคอมเมนต์มานะคะ #เซียนหรั่ง #จ่าลอด #มิ้นท์นวินดา #บิวบอง #ยิ่งรู้จักยิ่งVlogเธอ

เพื่อน ๆ ใช้แบบไหนกันคอมเมนต์มานะคะ #เซียนหรั่ง #จ่าลอด #มิ้นท์นวินดา #บิวบอง #ยิ่งรู้จักยิ่งVlogเธอ

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

Gail Weiss: Thinking like Transformers

Gail Weiss: Thinking like Transformers

Nouha Dziri: Faith and Fate: Limits of Transformers on Compositionality

Nouha Dziri: Faith and Fate: Limits of Transformers on Compositionality

GEOMETRIC DEEP LEARNING BLUEPRINT

GEOMETRIC DEEP LEARNING BLUEPRINT

What are Transformer Neural Networks?

What are Transformer Neural Networks?

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Vision Transformer Basics

Vision Transformer Basics

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

[LIVE] : ONE ลุมพินี 83 วันนี้!! คู่เอก "พันฤทธิ์ vs ซุปเปอร์บอล วันของโอม MBK"

[LIVE] : ONE ลุมพินี 83 วันนี้!! คู่เอก "พันฤทธิ์ vs ซุปเปอร์บอล วันของโอม MBK"

เล่นซ่อนแอบกับตัวแทน 77 จังหวัด!! ใครแอบเก่งที่สุด?!

เล่นซ่อนแอบกับตัวแทน 77 จังหวัด!! ใครแอบเก่งที่สุด?!

พ่อมีแฝด2คน!! ซิ่งจนมอเตอร์ไซค์คว่ำ สับสนจนโดดถีบพ่อ แกล้งคน

พ่อมีแฝด2คน!! ซิ่งจนมอเตอร์ไซค์คว่ำ สับสนจนโดดถีบพ่อ แกล้งคน

ฝนตกมองทางไม่เห็น แก้ง่ายๆอุ่นใจตลอดทาง 5555

ฝนตกมองทางไม่เห็น แก้ง่ายๆอุ่นใจตลอดทาง 5555

อ่านยังไง #tamzen #anime #การ์ตูน #การ์ตูน #แต้มเซน #shortvideo #ครู

อ่านยังไง #tamzen #anime #การ์ตูน #การ์ตูน #แต้มเซน #shortvideo #ครู

🔴 ฟุตบอลแชมป์กีฬา 7HD แชมเปียน คัพ 2024 สนาม 2 วันที่ 20 ต.ค. 2567

🔴 ฟุตบอลแชมป์กีฬา 7HD แชมเปียน คัพ 2024 สนาม 2 วันที่ 20 ต.ค. 2567

🔴𝐋𝐈𝐕𝐄 งานออฟไลน์ RoV Pro League 2024 Winter รอบ 4 ทีมสุดท้าย @ไบเทคบางนา Hall 98

🔴𝐋𝐈𝐕𝐄 งานออฟไลน์ RoV Pro League 2024 Winter รอบ 4 ทีมสุดท้าย @ไบเทคบางนา Hall 98

일상 속 기분 좋음 [Proto disco meme]

일상 속 기분 좋음 [Proto disco meme]