Multi-Head vs Grouped Query Attention. Claude AI, Llama-3, Gemma are choosing speed over quality?

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ธ.ค. 2024

ความคิดเห็น • 22

  • @makepeace88
    @makepeace88 5 หลายเดือนก่อน +3

    I just attended detailed anatomy of LLM session.. and it’s just wow! Nobody’s telling these details. Thanks very much Chris ❤

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน +2

      Glad it was useful, I skipped a lot of details, as I wanted to keep the focus on MHA vs GQA. I will probs do some other videos on some of the other details

  • @trsd8640
    @trsd8640 5 หลายเดือนก่อน +2

    Great video! I don’t understand it fully, had to watch it again, but I‘m getting a idea of what is happening! Thank you!

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน +3

      it was quite a tough one to record, as i'm trying to avoid explaining the entire transformers architecture and attention fully (i'll do that in another video), but do enough to just show how this architectural change is affecting models output. it was a weird balance and apologies that i never explained it enough

  • @LombardyKozack
    @LombardyKozack 4 หลายเดือนก่อน +2

    LLaMA2-70b uses GQA (only its 7b version used MHA)

    • @chrishayuk
      @chrishayuk  4 หลายเดือนก่อน +1

      fair point

  • @everyhandletaken
    @everyhandletaken 5 หลายเดือนก่อน +2

    Interesting!
    Claude 3.5 Sonnet is definitely great for code, much better than cgpt 4-o & has really helped me solve things that are well beyond my brain capacity in the last few days.

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน +1

      totally agree, much better for code than gpt-4o

  • @ilkkalehto8507
    @ilkkalehto8507 4 หลายเดือนก่อน +1

    Brilliant!

  • @danielhenderson7050
    @danielhenderson7050 5 หลายเดือนก่อน +3

    This was very interesting

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน +1

      Glad you enjoyed, definitely a fun rabbit hole

  • @Leo-ph7ow
    @Leo-ph7ow 5 หลายเดือนก่อน +3

    Excelent content! Thanks!

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน +1

      Glad you liked it!

  • @c0ffe3caf3
    @c0ffe3caf3 18 วันที่ผ่านมา

    Sorry, I watched this all the way through, but I don't think you ever gave much to support your claim that group query attention was the cause of what you and your GPT-4 prompt ranked as worse outputs? It seems like at best you made a case for correlation of many newer models that adopt techniques like GQA to be worse under a metric.
    Even if the correlation is true, how do you demonstrate that the cause is GQA and not other factors that the same models have all adopted like some kinds of fine tuning over synthetic data or instruct tuning (eg. perhaps the answers you are judging as worse are the result of optimising for some LLM benchmark scores)?

  • @awaisamin3819
    @awaisamin3819 2 หลายเดือนก่อน +1

    Look at me when you talk to Me Boy Look AT ME
    You shy too much Love it
    Thanks its it really helped in my pretention

    • @chrishayuk
      @chrishayuk  2 หลายเดือนก่อน

      hahaha, yeah, i'm really bad at that sometimes

  • @seanknowles9985
    @seanknowles9985 5 หลายเดือนก่อน +1

    Intel agencies are having their fill first. Its obviously being slowed down so three letter agencies can get ahead of this.

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน +1

      lol, i'm sure 3 letter agencies are having their say but i suspect it's not on MHA vs GQA but would love to hear that conversation if they were

  • @김화겸-y6e
    @김화겸-y6e 5 หลายเดือนก่อน

    I believe 4o's judges only 90%

    • @chrishayuk
      @chrishayuk  5 หลายเดือนก่อน

      interesting, where did you get that info from?