Mistral 7B: Smarter Than ChatGPT & Meta AI - AI Paper Explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ย. 2024
  • An Actually Big Week in AI, Mistral 7B’s performance demonstrates what small models can do with enough conviction. The New 7B LLaMA Killer? Tracking the smallest models performing above 60% on MMLU is quite instructive: in two years, it went from Gopher (280B, DeepMind. 2021), to Chinchilla (70B, DeepMind, 2022), to Llama 2 (34B, Meta, July 2023), and to Mistral 7B. Mistral 7B is released in Apache 2.0. This AI Startup actually "Open"s AI, and is more efficient than ChatGPT.
    #openai #chatgpt #llama2
    / harrymapodile
    Mistral 7B -
    Report: mistral.ai/new...
    Blog: mistral.ai/new...
    Github: github.com/mis...
    Magnet: x.com/MistralA...
    Longformer: The Long-Document Transformer -
    Paper: arxiv.org/pdf/...
    Mistral AI blows in with a $113M seed -
    TechCrunch: tcrn.ch/3CrOhTC
    Tweets -
    / 1707430998600831424

ความคิดเห็น • 8

  • @coalhater392
    @coalhater392 11 หลายเดือนก่อน +15

    Don't really have anything to add so I'm just commenting for the algorithm hope it helps.

    • @harrymapodile
      @harrymapodile  11 หลายเดือนก่อน +3

      Thanks a ton!

  • @jaysonp9426
    @jaysonp9426 10 หลายเดือนก่อน

    This is excellent. Please keep making vidoes. This is extremely high quality. New subscriber 👍

    • @harrymapodile
      @harrymapodile  9 หลายเดือนก่อน

      thanks for the support

  • @talroitberg5913
    @talroitberg5913 11 หลายเดือนก่อน

    I have a general question on LLM attention. So, if I understand correctly, when an LLM generates a token, it has layers that attend to other words in the context window, and add information about relevant words into the latest token. Say the context window is n tokens, then token n has a lot of info about the rest of the context window, and token n+1 is the output for this iteration.
    To generate token n+2, you give the LLM a new prompt consisting of tokens 2 to n+1 (i.e. ending with what the LLM just generated in the previous step).
    My question is -- does this new prompt keep all the self-attention information from the previous cycle, or do they discard it and start from scratch? Intuitively it seems like you could save a lot of time by caching the results of the last attention layer (or wherever you find the end result of all the work the hidden layers did last time around). Do they do that? If not, do you know why not? Or do some LLMs do it, and others don't?

    • @mohl-bodell2948
      @mohl-bodell2948 10 หลายเดือนก่อน +1

      Yes, during inference, caching is common

  • @jamesoukassou7544
    @jamesoukassou7544 9 หลายเดือนก่อน

    lol , it doesnt even compare. Sorry but mistral doesnt even respond to the simplest question. I asked it to parse some informations from a given text that was way way simpler than what i actually gave to chatgpt and it fails hard.

    • @harrymapodile
      @harrymapodile  9 หลายเดือนก่อน +1

      Haha yeah, ChatGPT is always updating. GPT-4 turbo just went live in ChatGPT four days ago, the thing that makes mistral impressive is the density and how quantized it is. And the Apache license which makes it friendlier than llama 2, if you’re just consuming ChatGPT is perfect. But as a platform for building and SFT, mistral is king right now.