True ML Talks #20 |Analysing transformer dynamics as a mental model

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ม.ค. 2025

ความคิดเห็น • 4

  • @Tunadorable
    @Tunadorable ปีที่แล้ว +1

    i was so excited after reading this paper, thanks for having him on the podcast! great talk! I’m currently working on an fleshing out an architecture idea specifically designed to take advantage of the insights from this paper. goal is to start coding & training soon 🤞

  • @nitinbarot9481
    @nitinbarot9481 ปีที่แล้ว

    Sumeet, is the any insight into the size of embedding to which dimension gets most reward of entropy gradient?[ I am assuming embedding dimensions somehow contribute in entropy of a point in embedding]

  • @nitinbarot9481
    @nitinbarot9481 ปีที่แล้ว

    Sumeet , in your argument of sampling the next token, makes a sort of random choice in the neighbourhood , which in a black-box view gives illusion of learning .

    • @sumeetssingh8679
      @sumeetssingh8679 ปีที่แล้ว +1

      Illusion of learning comes from the fact that a context sequence (prompt) induces the Transformer to walk on a (predetermined) path that appears like intelligent behaviour to us humans. Randomness is in the choosing of one of the predetermined paths. But if all the low-entropy paths appeared like intelligent behaviour to humans, then no matter what path was randomly picked, it would still appear as if the model was saying something intelligent. If I understand you correctly, you're saying that because the model can produce a variety of paths due to random sampling, it appears intelligent to us, as opposed to a parrot regurgutating from memory. Yes, that would be accurate under this viewpoint.