Gail Weiss: Thinking Like Transformers

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 ต.ค. 2024
  • Paper presented by Gail Weiss to the Neural Sequence Model Theory discord on the 24th of February 2022.
    Gail's references:
    On Transformers and their components:
    Thinking Like Transformers (Weiss et al, 2021) arxiv.org/abs/... (REPL here: github.com/tec...)
    Attention is All You Need (Vaswani et al, 2017) arxiv.org/abs/...
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018) arxiv.org/abs/...
    Improving Language Understanding by Generative Pre-Training (Radford et al, 2018) s3-us-west-2.a...
    Are Transformers universal approximators of sequence-to-sequence functions? (Yun et al, 2019) arxiv.org/abs/...
    Theoretical Limitations of Self-Attention in Neural Sequence Models (Hahn, 2019) arxiv.org/abs/...
    On the Ability and Limitations of Transformers to Recognize Formal Languages (Bhattamishra et al, 2020) arxiv.org/abs/...
    Attention is Turing-Complete (Perez et al, 2021) jmlr.org/paper...
    Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers (Wei et al, 2021) arxiv.org/abs/...
    Multilayer feedforward networks are universal approximators (Hornik et al, 1989) www.cs.cmu.edu...
    Deep Residual Learning for Image Recognition (He at al, 2016) www.cv-foundat...
    Universal Transformers (Dehghani et al, 2018) arxiv.org/abs/...
    Improving Transformer Models by Reordering their Sublayers (Press et al, 2019) arxiv.org/abs/...
    On RNNs:
    Explaining Black Boxes on Sequential Data using Weighted Automata (Ayache et al, 2018) arxiv.org/abs/...
    Extraction of rules from discrete-time recurrent neural networks (Omlin and Giles, 1996) www.semanticsc...
    Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples (Weiss et al, 2017) arxiv.org/abs/...
    Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning (Rabusseau et al, 2018) arxiv.org/abs/...
    On the Practical Computational Power of Finite Precision RNNs for Language Recognition (Weiss et al, 2018) aclanthology.o...
    Sequential Neural Networks as Automata (Merrill, 2019) aclanthology.o...
    A Formal Hierarchy of RNN Architectures (Merrill et al, 2020) aclanthology.o...
    Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (Joulin and Mikolov, 2015) proceedings.ne...
    Learning to Transduce with Unbounded Memory (Grefenstette et al, 2015) proceedings.ne...
    Paper mentioned in discussion at the end:
    Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth (Dong et al, 2021) icml.cc/virtua...

ความคิดเห็น • 10

  • @vandarkholme442
    @vandarkholme442 3 หลายเดือนก่อน

    Awesome analogies for really understanding what is happening under the hood. Thanks!

  • @swim3936
    @swim3936 ปีที่แล้ว +2

    fantastic presentation!

  • @alexanderkyte4675
    @alexanderkyte4675 ปีที่แล้ว +7

    Could I please have the slides? They’re partially obscured by the listeners here. I’d like to use them for a reading group.

    • @formallanguagesandneuralne5578
      @formallanguagesandneuralne5578  ปีที่แล้ว +3

      hey, not managing to respond from my own account so positing from here - the slides are on my website, which is hosted on github - gailweiss dot github dot io

  • @GodofStories
    @GodofStories ปีที่แล้ว +2

    This is great

  • @stevenshaw124
    @stevenshaw124 ปีที่แล้ว +2

    this was an excellent presentation! thank you!

  • @haksasseeducation9565
    @haksasseeducation9565 2 หลายเดือนก่อน

    I don't agree with the slide presented at 21:35 about the input of each head. Actually, each head receives the same output from the previous embedding and positional layer.

  • @homeboundrecords6955
    @homeboundrecords6955 ปีที่แล้ว +1

    I'll bet this reply will not be read, but... isn't the "subject" = "I" and the "object" = "dog" ?

    • @LGcommaI
      @LGcommaI ปีที่แล้ว +1

      Yes, that's correct. The terminology is confusing though (IF one knows Latin): the 'subject' literally is 'that which is (thrown) UNDER' while the 'object' is 'that which is (thrown) on top' . Everyday sensibilities thus would expect that the object is the one who does sth. and the subject the one which has sth. done TO it. The standard convention is the OPPOSITE however.

    • @RaviAnnaswamy
      @RaviAnnaswamy ปีที่แล้ว +1

      @@LGcommaI object generally refers to inert things and the 'subject' is used as English word for persons (King asked his subjects to pay more tax during the drought years...). This could be the reason for English grammar using subject for the actor and object for the acted upon (victim).