ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก

MoME Reduces LLM Hallucinations by 10X!

แชร์
ฝัง
  • เผยแพร่เมื่อ 13 มิ.ย. 2024
  • More on the announcement: www.lamini.ai/...
    Check out our upcoming live trainings to learn more about building with LLMs:
    maven.com/dair...
    #ai #machinelearning #engineering #coding

ความคิดเห็น • 15

  • @marinepower
    @marinepower 2 หลายเดือนก่อน +3

    This is interesting and somewhat aligns with how the brain seems to work. We have general capabilities that we use all the time, but we are also able to retrieve memories even after years of not accessing them. So it implies that we have weights that change, and memories that are more static / MoE-like where we can pull them up at will.

  • @bluetensee
    @bluetensee 2 หลายเดือนก่อน +3

    good job again. thank you for your insightful expertise! and thanks for not clickbaiting!!! hope you'll get a lot more followers soon. you deserve it!

    • @elvissaravia
      @elvissaravia  2 หลายเดือนก่อน

      I appreciate that!

  • @valtersilva5386
    @valtersilva5386 2 หลายเดือนก่อน

    Loved the tone on the content man, you've got a new subscriber! Great job!

  • @aireddy
    @aireddy 2 หลายเดือนก่อน +1

    It is fantastic if it is really reducing 10x hallucinations. Thank you for sharing your thoughts!!

  • @jeffg4686
    @jeffg4686 2 หลายเดือนก่อน +1

    Nice! That does add a lot more comfort in correct answers.
    The "mixture of agents" model architecture is coming in with some good stuff too (not as good as this though - this is big).
    We're not far from some really smart agents...

  • @novantha1
    @novantha1 2 หลายเดือนก่อน +4

    So, I think it's a bit misleading, or perhaps unintuitive, rather, that this technique was labelled "MoE". It's more like S-Lora, where the model actively swaps out relevant LoRAs at inference time. It's not strictly speaking anything "new" as such, but a series of existing techniques tied together into a simple package.
    I'm not sure how useful it really is to the broader community, particularly given that it's not open source, and that there are existing techniques, like mechanistic interpretability, that should essentially do something really quite similar at the end of the day, to say nothing of advancements in reinforcement learning which will not eliminate an LLM's ability to lack confidence (raw LLMs actually have a pretty good internal estimate before instruction tuning of how accurate the facts they're saying are, we just destroy it in fine tuning atm, but forcing them to answer confidently).

  • @yahm0n
    @yahm0n 2 หลายเดือนก่อน +2

    This seems the same as regular mixture of experts.

  • @mihaitanita
    @mihaitanita 2 หลายเดือนก่อน +2

    Hmm. Lots of PR stunts on their blog. So still... skeptical. I really don't get the main trickery, and 200 API calls per month is not enough to get a proper test-through. "Internal memorization. Tuning the weights, not RAG. You can layer them." /via X.

  • @terionname
    @terionname 2 หลายเดือนก่อน +7

    not open source =(

  • @williamzhao3885
    @williamzhao3885 2 หลายเดือนก่อน +4

    I feel like 95% is hard to believe. are they really training 1 million models? I am also not sure how accurate is their routing model

    • @elvissaravia
      @elvissaravia  2 หลายเดือนก่อน +2

      There are a lot of parts to look more closely. I am also wonder how general the approach is to different domains and type of data.

  • @bbrother92
    @bbrother92 2 หลายเดือนก่อน

    are ML engineer?

  • @pradeepbansal23
    @pradeepbansal23 2 หลายเดือนก่อน

    But is it right to call this as innovation ? Just training million of experts with task specific facts can't be said to be research ?

    • @xt-89907
      @xt-89907 2 หลายเดือนก่อน +1

      It’s special because it swaps in those experts within a larger architecture. Related research on polysemanticity also suggests that sparsity will enhance explainability and steer ability