Are Claude 3.5 Sonnet, Llama-3 and Gemini choosing speed over quality?

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ก.ค. 2024
  • in this video chris looks at how model providers are trending towards using grouped query attention vs traditional multi-headed attention in transformer models and how this is impacting output in areas such as summarization. in this video chris shows that you get better coherent output from models such as llama-2 or claude 3-opus over new models such as llama-3 or gemini or gemma. in the end, in certain scenarios such as summarization or generative content, gpt-4o still beats sonnet.
    repo
    github.com/chrishayuk/mha_gqa...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 14

  • @makepeace88
    @makepeace88 8 วันที่ผ่านมา +1

    I just attended detailed anatomy of LLM session.. and it’s just wow! Nobody’s telling these details. Thanks very much Chris ❤

    • @chrishayuk
      @chrishayuk  8 วันที่ผ่านมา

      Glad it was useful, I skipped a lot of details, as I wanted to keep the focus on MHA vs GQA. I will probs do some other videos on some of the other details

  • @trsd8640
    @trsd8640 9 วันที่ผ่านมา +1

    Great video! I don’t understand it fully, had to watch it again, but I‘m getting a idea of what is happening! Thank you!

    • @chrishayuk
      @chrishayuk  9 วันที่ผ่านมา +2

      it was quite a tough one to record, as i'm trying to avoid explaining the entire transformers architecture and attention fully (i'll do that in another video), but do enough to just show how this architectural change is affecting models output. it was a weird balance and apologies that i never explained it enough

  • @danielhenderson7050
    @danielhenderson7050 9 วันที่ผ่านมา +2

    This was very interesting

    • @chrishayuk
      @chrishayuk  9 วันที่ผ่านมา

      Glad you enjoyed, definitely a fun rabbit hole

  • @everyhandletaken
    @everyhandletaken 9 วันที่ผ่านมา +1

    Interesting!
    Claude 3.5 Sonnet is definitely great for code, much better than cgpt 4-o & has really helped me solve things that are well beyond my brain capacity in the last few days.

    • @chrishayuk
      @chrishayuk  9 วันที่ผ่านมา

      totally agree, much better for code than gpt-4o

  • @Leo-ph7ow
    @Leo-ph7ow 10 วันที่ผ่านมา +2

    Excelent content! Thanks!

    • @chrishayuk
      @chrishayuk  10 วันที่ผ่านมา

      Glad you liked it!

  • @seanknowles9985
    @seanknowles9985 9 วันที่ผ่านมา

    Intel agencies are having their fill first. Its obviously being slowed down so three letter agencies can get ahead of this.

    • @chrishayuk
      @chrishayuk  9 วันที่ผ่านมา

      lol, i'm sure 3 letter agencies are having their say but i suspect it's not on MHA vs GQA but would love to hear that conversation if they were