No More Hallucinations? New Benchmark Released

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ม.ค. 2025

ความคิดเห็น •

  • @BrianMosleyUK
    @BrianMosleyUK 16 ชั่วโมงที่ผ่านมา +1

    NotebookLM uses Gemini 2.0 Flash experimental. Good to know.

  • @Codemanlex
    @Codemanlex 15 ชั่วโมงที่ผ่านมา +2

    People keep underestimating what Ai can do and what it would do.
    I would say its good not to get into the hype
    But totally dismissing it or thinking this is some nft hype is flawed.

  • @leobeeson1
    @leobeeson1 3 ชั่วโมงที่ผ่านมา

    Somewhat misleading research. It evaluates hallucinations in simple tasks (summarise, report, needle in a haystack, etc.). I took Google's premium package to have access to the latest Gemini models after constantly running out of messages on Claude Pro, and ALL of Gemini's models suck at reasoning. I still prefer building with Claude or GPT and implement procedures that evaluate groundedness of the generated answer, rather than using any of the Gemini models. This "research" feels like a "special olympics" competition organised by mom and pop.

    • @YaronBeen
      @YaronBeen  59 นาทีที่ผ่านมา

      Haha well said
      The idea is great.
      The benchmark is much needed.
      But the whole research seems ungrounded ;)