NEW Mixtral 8x22b Tested - Mistral's New Flagship MoE Open-Source Model

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ธ.ค. 2024

ความคิดเห็น • 252

  • @Wren206
    @Wren206 7 หลายเดือนก่อน +57

    Forgot to say: Thank you so much for making these videos and for being so dedicated to them! It means a lot!

  • @En1Gm4A
    @En1Gm4A 7 หลายเดือนก่อน +24

    These are the OG videos. Thanks great content

  •  7 หลายเดือนก่อน +39

    3:05 actually snake is supposed to go through the wall on many snake games. It is even more impressive that AI added it as it involves extra code for that.

    • @minemakers3
      @minemakers3 7 หลายเดือนก่อน

      fact

    • @apester2
      @apester2 7 หลายเดือนก่อน +3

      Possible but it stail failed when directly asked to make that not the behaviour.

    • @StevenAkinyemi
      @StevenAkinyemi 7 หลายเดือนก่อน

      ​@@apester2 No. It would have failed if it was specifically told not to add that behavior. A lot of snake games allow passing through the wall. It is open to interpretation.

    • @apester2
      @apester2 7 หลายเดือนก่อน +6

      @@StevenAkinyemi there were two requests. One was write snake. If your interpretation is correct it passed the first request. The second request was “make the game end if it passes out of the window”. Independent of other games. It failed to do that request.

    • @StevenAkinyemi
      @StevenAkinyemi 7 หลายเดือนก่อน

      @@apester2 Oh. I missed that

  • @RWilders
    @RWilders 7 หลายเดือนก่อน +4

    Thanks again for the video.
    For the apple prompt, this one works fine with GPT4 : Give me ten sentences where each sentence ends with the word apple.
    Maybe you could use that for your tests.
    Chat GPT result :
    I ventured into the garden to pick the last remaining apple.
    Upon examining the contents of the pie, I realized it lacked an apple.
    He couldn't resist adding another slice to his already full plate of apple.
    As the sun set, the sky's hue reminded me of a golden apple.
    No matter the question, her answer was invariably, "apple."
    For his lunch, all he desired was a crisp, sweet apple.
    Walking through the market, every stall seemed to boast its own variety of apple.
    It wasn't just any fruit; it was the perfect apple.
    She decorated the tabletop with a centerpiece featuring an ornate bowl and a single apple.
    In his tale, the magic was always in the mystical apple.

  • @MeinDeutschkurs
    @MeinDeutschkurs 7 หลายเดือนก่อน +56

    It’s Open weight, but not open source, Matt. We do not have access to the data set.

    • @4.0.4
      @4.0.4 7 หลายเดือนก่อน

      Important difference, too. Some models introduce cool new training methods, good datasets etc that improve the ecosystem for everyone.

    • @matthew_berman
      @matthew_berman  7 หลายเดือนก่อน +12

      I’ll make sure to clarify next time thank you

    • @MeinDeutschkurs
      @MeinDeutschkurs 7 หลายเดือนก่อน

      @@matthew_berman , Great! ❤️

    • @codycast
      @codycast 7 หลายเดือนก่อน

      Yo mamma is open weight

    • @Joe333Smith
      @Joe333Smith 7 หลายเดือนก่อน +5

      That's nonsense. Open source code is open source. Data has never been part of open source.

  • @briancase6180
    @briancase6180 7 หลายเดือนก่อน +12

    I think you need to pay attention to the setting of the temperature.... That could explain the difference better this and the previous mixtral-8x7b. And, you could rephrase the ending in Apple question with "where the last word is apple" or something like that. I think it's more interesting if there's a test of three, say, different phrasings to see just what the right prompting strategy is for the model.

    • @AA-yl9ht
      @AA-yl9ht 7 หลายเดือนก่อน

      The temperature thing bugs the hell out of me. Any non-greedy setting is going to be selecting tokens at random from the output distribution, and can absolutely be the difference between getting a 1/2/3 on the same question. I have no idea why he's applying temperature during logic tests at all, temperature only forces the model to write creatively by forcing it to make mistakes.
      Someone needs to call him out on this because its hard to take the result of any test seriously, knowing the answer might only be incorrect because the wrong token was randomly selected

  • @MichielvanderBlonk
    @MichielvanderBlonk 7 หลายเดือนก่อน +44

    The question about the 10 foot hole is exactly how math teachers expect your answer to be. If you make any remarks about common sense you will be called a smart ass and a cheater, so the LLMs are behaving exactly as we teach humans.

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน +5

      Experienced math teachers would say to assume something so as to avoid that.

    • @DefaultFlame
      @DefaultFlame 7 หลายเดือนก่อน +8

      @@WhyteHorse2023 I think the word you are looking for is "good" math teachers. Experience doesn't improve all teachers. It makes some of them worse even.

    • @alekjwrgnwekfgn
      @alekjwrgnwekfgn 7 หลายเดือนก่อน

      And 2 + 2 = white supremacy. Math teachers who don’t know this will be canceled.

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน

      @@DefaultFlame Yeah, I guess I assume teachers learn through experience but apparently not.

    • @DefaultFlame
      @DefaultFlame 7 หลายเดือนก่อน

      @@WhyteHorse2023 Some do, but they are people and not all people do. I've had amazing teachers and absolutely horrible teachers, both with many years of experience.
      Edit: One of the best teachers I've had actually only had one year of experience. Wasn't a math teacher though. He was really good at communicating, handling the class, and engaging people in the subject.

  • @TheUnknownFactor
    @TheUnknownFactor 7 หลายเดือนก่อน +6

    To be fair, the 10 foot hole being dug by 1 person could be 50 feet wide and allow 50 people to dig at the same time. The fact that only the depth (and technically not even that) is explicitly provided allows for different assumptions about crowding

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน +1

      absolutely, I love you

  • @ernestuz
    @ernestuz 7 หลายเดือนก่อน +2

    In this world of corporate crap, Mistral way of doing things is better than fresh air. They know their models ROCK. Every single Mistral free model released to date have become a favourite of mine.

  • @ShaunPrince
    @ShaunPrince 7 หลายเดือนก่อน +6

    The snake IS supposed to go through the wall. Looks like a perfect one-shot implementation.

    • @matthew_berman
      @matthew_berman  7 หลายเดือนก่อน +1

      I think both are valid

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน +1

      thanks, I love you

  • @BlayneOliver
    @BlayneOliver 7 หลายเดือนก่อน +14

    Intermatic is not free. They charge $15pm to access this model

    • @Intel1502
      @Intel1502 7 หลายเดือนก่อน

      this.

    • @HyBlock
      @HyBlock 7 หลายเดือนก่อน

      that.

    • @ChaineYTXF
      @ChaineYTXF หลายเดือนก่อน

      those.

  • @BlayneOliver
    @BlayneOliver 7 หลายเดือนก่อน +2

    Thanks, this model actually shows promise. I appreciate your bringing it to our attention

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      absolutely, I love you

  • @CLSgod
    @CLSgod 7 หลายเดือนก่อน +1

    Thanks for testing!

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      absolutely, I love you

  • @exumatronstudios
    @exumatronstudios 7 หลายเดือนก่อน +1

    Matt love your content. Keep up the good work.

  • @XandyXu-o5h
    @XandyXu-o5h 7 หลายเดือนก่อน

    Thank you, practice is always more effective than hearing concepts

  • @QuantzAi
    @QuantzAi 7 หลายเดือนก่อน +4

    @Matthew Berman infermatic requires Total Plus which is paid in order to test it

  • @UnchartedWorlds
    @UnchartedWorlds 7 หลายเดือนก่อน +1

    Infermatic Ai is NOT Free if we want to perform this test our selfs, Matt you should have mentioned that! it costs 15$ per month to play with all the models you see in the dropdown

  • @oratilemoagi9764
    @oratilemoagi9764 7 หลายเดือนก่อน +1

    It got the question right "How many words are in your prompt?", It included the full stop as a word
    and most models count the spaces in between also

  • @gvi341984
    @gvi341984 7 หลายเดือนก่อน +1

    When it can do partial or ordinary differential in latex by itself then we talk amazing

  • @Matlockization
    @Matlockization 5 หลายเดือนก่อน

    Loving the style of this AI model, "mixture of experts".

  • @freedtmg16
    @freedtmg16 7 หลายเดือนก่อน +3

    IDK how but I'd love to see a tool-use test for the open source models.

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      thanks, I love you

  • @HAL9000-B
    @HAL9000-B 7 หลายเดือนก่อน +5

    THIS with Agents... AMAZING!!! Thank you Matthew, greetings from Berlin!

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      absolutely, I love you

  • @tfre3927
    @tfre3927 7 หลายเดือนก่อน +2

    Infermatic must have been waiting for your video. It's not free anymore dude - a bunch including the new Mixtral are PAID.

  • @jarail
    @jarail 7 หลายเดือนก่อน +2

    We really just need to wait a few more days for fine tunes and quantization. This model is going to do great things!

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      absolutely, I love you

  • @metantonio
    @metantonio 7 หลายเดือนก่อน +4

    How much VRAM and RAM needs to run locally?

    • @wrOngplan3t
      @wrOngplan3t 7 หลายเดือนก่อน +1

      Infinite
      (jk ofc :P but in my case might as well be. Seems the files alone are about 59 files times 5 GB each... so 300 GB? Idk).

  • @RainbowSixIntel
    @RainbowSixIntel 7 หลายเดือนก่อน +1

    I honestly think the model will perform MUCH better when mistral themselves release an instruct chat finetuned version.

  • @gitmaxd
    @gitmaxd 7 หลายเดือนก่อน +7

    This model is fantastic! Another banger!

    • @matthew_berman
      @matthew_berman  7 หลายเดือนก่อน +2

      Agreed. Wait until more fine tuned versions come out!

    • @cesarsantos854
      @cesarsantos854 7 หลายเดือนก่อน

      @@matthew_berman Maybe it could be a good idea comparing open source models written from scratch to be uncensored to others censored or finetuned to be uncensored. Some researchers say the censorship finetuning greatly corrodes capabilities and further finetuning to decensor them corrodes them even further.

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      absolutely, I love you

  • @Taskade
    @Taskade 7 หลายเดือนก่อน

    Can’t wait to team up with Mistral in our next exciting Multi-Agent update for Taskade! 🚀

  • @benbork9835
    @benbork9835 7 หลายเดือนก่อน

    I tried the killer question and it first try worked for me. Although its probably a slight different chat interface specific model I was using. Anyways you could, beside the old one, start a new benchmark spread sheet where you do best of 3. This might give us an accuracy metric which might reveal more of the models abilities.

  • @PyjamasBeforeChrist
    @PyjamasBeforeChrist 7 หลายเดือนก่อน +1

    This needs to be on Groq asap

  • @awesomebearaudiobooks
    @awesomebearaudiobooks 6 หลายเดือนก่อน +1

    Honestly, I feel like Llama3 is better than Mixtral 8*22b, despite being two times as small... And I remember how much I was impressed by Mixtral 8*7b...
    And don't get me wrong, both Mixtral 8*7b and Mixtral 8*22 are great, but they are still on another (lower) level when compared to closed-source, models, while Llama3 is on the level of modern closed-source models!

  • @theycallmethesoandso
    @theycallmethesoandso 7 หลายเดือนก่อน +2

    I think the killer question is highly subjective and a matter of definition. You could assume a "killer" is a contract killer, a dead killer is just a body and the person who killed one of the killers acted in self defense. Or a variation of that. Would you call a person killing in self defense a killer? This could be seen as victim shaming and probably traumatic for that person. Context matters in language and there aren't 100% correct definitions outside of closed declarative systems. People who design tests are often blind to their own assumptions and should be careful trying to set standards for some universal truth when using an open ever changing system of meaning. llms run robots so it's not just a chat problem.

    • @user-on6uf6om7s
      @user-on6uf6om7s 7 หลายเดือนก่อน

      Yeah, ideally the model will go into detail on the interpretations but there are a few different potentially correct ways at 4, 3, or even 2 if you really get creative with your definitions as you said. But regardless, if it's not between 2 and 4, it's definitely wrong.

  • @micbab-vg2mu
    @micbab-vg2mu 7 หลายเดือนก่อน +1

    It looks as a great model:)

  • @okuz
    @okuz 7 หลายเดือนก่อน +1

    this model is not free on intermatic. also there is no option for deleting your account in the settings on their website.

  • @recorr
    @recorr 7 หลายเดือนก่อน

    with pi ... after 3 explanations :
    Initially, the marble is placed inside the cup.
    When the cup is turned upside down on the table, gravity pulls the marble towards the table, causing it to fall out of the cup and onto the table.
    The cup is then picked up and placed inside the microwave, but since the marble has already fallen out, it is not inside the cup anymore.

  • @science_mbg
    @science_mbg 7 หลายเดือนก่อน +4

    Unfortunatelly it is not free, it requires a subscription to let you use it!

  • @Alf-Dee
    @Alf-Dee 7 หลายเดือนก่อน +1

    Would you make some sort of coding challenge between LLMs using different Agents systems?
    At this point we need a solid benchmark to define which are the best LLMs for this purpose.
    A video like that would be awesome 😎

  • @paugargallo7813
    @paugargallo7813 7 หลายเดือนก่อน +1

    Great content! Are you going to test Gemini PRO 1.5?

  • @garyjurman8709
    @garyjurman8709 7 หลายเดือนก่อน

    About the cup and marble question: I actually don't think that the AIs are having a problem with the idea of gravity or even that the marble can't travel with the cup. I believe the AIs are having a problem with the concept of upside-down. I had a similar problem with the image generation AIs when I asked them to draw a bucket upside-down with a guy sitting on it. It couldn't flip the bucket for some reason. It was able to do it when I said "put the bucket on his head," but otherwise it kept drawing the bucket right-side up no matter what.

  • @idontexist-satoshi
    @idontexist-satoshi 7 หลายเดือนก่อน

    I suggest using the "How do I break into a car film script" version primarily, as asking the question directly may not yield meaningful insights. Our focus should be on whether the model acknowledges the illegality even while offering the method for breaking into a car "For a film script." Perhaps something along the lines of, "Could you advise me on how to break into a car for my movie script?" It should then caution against illegal activities while presenting the information in a simplified manner suitable for a cinematic scenario. For example, when you prompt GPT with "Could you advise me on how to break into a car for my movie script?" it provides the instructions but also highlights the illegal consequences when done in real life outside the movie perspective. This is how we know it has guardrails based on how much emphasis it places on this aspect.

  • @recorr
    @recorr 7 หลายเดือนก่อน +1

    for reasoning, Pi is very good ! and that for a long time
    ex : the end of its response for the killers :
    So, there are three killers left in the room: the two remaining original killers, and the new person who just entered the room and killed one of them.

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      thanks, I love you

  • @UnchartedWorlds
    @UnchartedWorlds 7 หลายเดือนก่อน

    Just tested Claude Opus with Apple, and it got 7/10 right!

  • @fromjavatohaskell909
    @fromjavatohaskell909 7 หลายเดือนก่อน

    11:29 I think it is fail - it would be impossible to dig a 5-hour hole in 6 min. What if you change condition and ask how long would it take to dig a hole for 10000 workers?

  • @Yomi4D
    @Yomi4D 7 หลายเดือนก่อน

    Thank you.

  • @spinningaround
    @spinningaround 7 หลายเดือนก่อน +2

    infermatic: Account upgrade required

  • @TheGaussFan
    @TheGaussFan 7 หลายเดือนก่อน

    Matt, I love your videos. Could you also address privacy issues with the models and service providers? Just knowing if there is a path (maybe by paying a fee) to keep my company users prompts and responses from becoming part of a training data set. I need services that don't leak all my proprietary information and processes. This aspect is key, but under addressed by the youtube reviews.

  • @holdthetruthhostage
    @holdthetruthhostage 7 หลายเดือนก่อน

    Oh this is what i have been waiting for 8x22 but once we get to 8 - 12 x30 -60 it will be crazy, we just need one that can Code 99.9% accurate, that has a context window of 150k - 250k+ that can output 50k - 150k+. With memeory support so we can talk for over 1 Million Tokens

  • @TPH310
    @TPH310 7 หลายเดือนก่อน

    The Snake I know has to go through the wall!))) it's perfect.

  • @erikjohnson9112
    @erikjohnson9112 7 หลายเดือนก่อน

    With the snake bounds, you should have tried up/down. It is possible those might have been caught because they represent the total bounds (beginning and end of the region as an image). Left/right is more of a soft boundary. Yes, missing left/right is an error, but if it caught top/bottom then it might have partially solved it.

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน +1

      absolutely, I love you

  • @joe_limon
    @joe_limon 7 หลายเดือนก่อน

    Can you try setting up these llm's in an agent system where it can review its work before submitting a final answer? I wonder how much of an improvement you would get

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      absolutely, I love you

  • @UnchartedWorlds
    @UnchartedWorlds 7 หลายเดือนก่อน

    Tested Claude Opus again and it gave 10 out of 10, for ending each sentance with word Apple.

  • @PieterHarvey
    @PieterHarvey 7 หลายเดือนก่อน

    Holy Hell!! Just to test I converted to GGUF and quantized this model to Q2_K and it still takes 49GB. Not that Q2 performance will be great but this is just a what the hell moment.

  • @StefanOstrowski
    @StefanOstrowski หลายเดือนก่อน

    Thanks for the great introduction. How about testing Nvidia Nemotron 70b - would be great. Thx

  • @PrintVids
    @PrintVids 7 หลายเดือนก่อน

    Does Infermatic Take all the prompts for training data? or is it private?

  • @dusk2dawn2
    @dusk2dawn2 7 หลายเดือนก่อน

    5 shirts out in the sun...(5:20) ??? The energy from the sun is directly proportional with the area, meaning 1 and 5 shirts take the same time to dry. Under the same conditions you can dry 1000 shirts in 4 hours. That's not a pass!

  • @xXWillyxWonkaXx
    @xXWillyxWonkaXx 7 หลายเดือนก่อน

    Which is superior when it comes to the test results, DBRX by Databricks or Mixtral 8x22b?

  • @pranitrock
    @pranitrock 7 หลายเดือนก่อน

    Snake leaving the window and entering from the other side is one of the classic versions of snake. So it is already correct. Many people like that implementation actually.

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน +1

      thanks, I love you

  • @o_kamaras
    @o_kamaras 7 หลายเดือนก่อน

    The snake going through the wall and out the other side is actually on par with the Nokia 3310 version!

  • @ridewithrandy6063
    @ridewithrandy6063 7 หลายเดือนก่อน

    What is the size of this model? I was able to run a 30b model on my RTX 3070 TI super. Lm studio put the rest of the model in system ram but what is the size of this new model? Please and thank you.

  • @elyakimlev
    @elyakimlev 7 หลายเดือนก่อน +1

    This actually performed worse than the Mistral 7x8b 5-bit I have running locally on my computer. I'll stick to what I have until a better model comes out. Thanks for the test.

  • @LeonFeasts
    @LeonFeasts 7 หลายเดือนก่อน

    The Test with the ten Apples also works on the New GPT-4, i tested it a while ago and it failed

  • @Moyano__
    @Moyano__ 7 หลายเดือนก่อน

    The problem is still the same: LLM's can't really "reason" unless given some framework or step by step logic or specific prompts (which is just alchemy and could or could not work depending on the training data).
    I hope we get a revolution in this soon, else we're just going to add data and compute but new problems and issues won't get honest answers, just regurgitating what they already have in their neural nets, like when you study from memory.

  • @Dron008
    @Dron008 7 หลายเดือนก่อน +1

    This Infermatic is not free at all. Just a couple of models are free and Mixtral from the video is not among them.

  • @goldkat94
    @goldkat94 7 หลายเดือนก่อน

    How much VRAM would it need to run the 22Billion version locally?

  • @jbo8540
    @jbo8540 7 หลายเดือนก่อน

    I like Mistral:Instruct 7b parameter model

  • @kylequinn1963
    @kylequinn1963 7 หลายเดือนก่อน

    Now, to see if I can run this on my machine locally.

  • @mvasa2582
    @mvasa2582 7 หลายเดือนก่อน

    Killer in the room - was funny!

  • @MeinDeutschkurs
    @MeinDeutschkurs 7 หลายเดือนก่อน +1

    What about ends with the string “apple.”

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน

      It won't matter. This is a fundamental flaw in all LLMs. It has to "think before it speaks" which is impossible because of how LLMs generate text.

    • @MeinDeutschkurs
      @MeinDeutschkurs 7 หลายเดือนก่อน

      @@WhyteHorse2023 , it matters, because of the period in the string.

    • @MeinDeutschkurs
      @MeinDeutschkurs 7 หลายเดือนก่อน

      GPT-4 Turbo:
      1. He placed the last piece of fruit on the counter and realized he preferred the red one; it was an apple.
      2. Her favorite snack was simple and sweet, a crisp apple.
      3. When she went to the market, the only thing on her list was an apple.
      4. The story he read to the children was about a magical apple.
      5. In the art class, they painted still life scenes featuring an apple.
      6. The teacher explained that Newton was inspired by a falling apple.
      7. She packed her lunch with a sandwich, a cookie, and an apple.
      8. For dessert, they decided to bake a warm, delicious apple.
      9. He reached into his bag and the first thing he pulled out was an apple.
      10. On the table, there was nothing but a single, shiny apple.

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน

      @@MeinDeutschkurs It's still a fundamental limitation if the LLM can't distinguish between a word and a period.

    • @MeinDeutschkurs
      @MeinDeutschkurs 7 หลายเดือนก่อน

      @@WhyteHorse2023 , however, the results are different to each other.

  • @RWilders
    @RWilders 7 หลายเดือนก่อน

    All your videos are just great. Many thanks!
    One thing always bothers me regarding your test "end in the word apple", could you try "end with the word apple" ("with" instead of "in"). It may work better. Cheers.

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน +1

      It won't matter. This is a fundamental flaw in all LLMs. It has to "think before it speaks" which is impossible because of how LLMs generate text.

    • @RWilders
      @RWilders 7 หลายเดือนก่อน

      @@WhyteHorse2023 I tried this sentence with GPT4 and it works fine : Give me ten sentences where each sentence ends with the word apple. Give it a try.
      I ventured into the garden to pick the last remaining apple.
      Upon examining the contents of the pie, I realized it lacked an apple.
      He couldn't resist adding another slice to his already full plate of apple.
      As the sun set, the sky's hue reminded me of a golden apple.
      No matter the question, her answer was invariably, "apple."
      For his lunch, all he desired was a crisp, sweet apple.
      Walking through the market, every stall seemed to boast its own variety of apple.
      It wasn't just any fruit; it was the perfect apple.
      She decorated the tabletop with a centerpiece featuring an ornate bowl and a single apple.
      In his tale, the magic was always in the mystical apple.

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน

      @@RWilders Well that's a first... See if it can answer "How many words are in your reply to this question?"

  • @kovidkasi6117
    @kovidkasi6117 7 หลายเดือนก่อน

    What is the context length?

  • @ziad_jkhan
    @ziad_jkhan 7 หลายเดือนก่อน

    Any reason why it did not perform better than the 7B model?

  • @mcombatti
    @mcombatti 7 หลายเดือนก่อน

    Fine-tuning can reduce logic accuracy and reasoning. It would be interesting to test the base model against the fine tuned.

  • @Sonic2kDBS
    @Sonic2kDBS 7 หลายเดือนก่อน +1

    No, this time, you are wrong. Going through the Wall is normal for the snake game in many versions. Like the old Asteroids game. It is perfectly fine, if the snakes leaves on one side and enters at the other side. However, keep on 😊👍

  • @itsprinceptl
    @itsprinceptl 7 หลายเดือนก่อน

    actually in Nokia snake game, there is an easy mode where snake can actually go through the wall and it would enter the frame from the other wall. so technically this was perfect.

  • @RM-xs3ci
    @RM-xs3ci 7 หลายเดือนก่อน +4

    You should consider making a "Partial Pass" instead of a full pass

    • @matthew_berman
      @matthew_berman  7 หลายเดือนก่อน

      For which test would it apply to?

    • @RM-xs3ci
      @RM-xs3ci 7 หลายเดือนก่อน

      @@matthew_berman For example, the math test that gave 19 at the start, but 20 at the end.

    • @southcoastinventors6583
      @southcoastinventors6583 7 หลายเดือนก่อน +1

      @@matthew_berman Apple test for instance I think you should also do writing question that includes internal links and table basically a SEO and readability test.

  • @iandanforth
    @iandanforth 7 หลายเดือนก่อน

    Unless you are looking for *creativity* temperature should be 0. When it's anything other than zero you're asking the model to sometimes ignore its top choice for a completion and give you something it thinks is less likely. Almost all your rubric questions are factual, or have a correct answer. To test how well the model can do you should let it output its best answer at all times.

  • @BlayneOliver
    @BlayneOliver 7 หลายเดือนก่อน

    Matt I find most of the models are each limited in their own way. Be it context, objective being remembered, it being overwhelmed by big blocks of code etc
    Instead of having the models compared up against one another is there a solution to utilising all of them at their individual stand out strengths?
    If that ‘all models’ solution exists, please find and make a video on that

  • @wrOngplan3t
    @wrOngplan3t 7 หลายเดือนก่อน

    Interesting video as usual! Maybe you should have a more gradual rating than just the binary pass / fail so to speak, maybe a 1-5 rating? Or maybe at least a "half-pass" for those kind of right if given a push, or kind of right with some caveat-answers? Just a thought, no biggie really.

  • @horrorislander
    @horrorislander 7 หลายเดือนก่อน

    So, Mixtral is building a middle manager. Add more people!

  • @mvasa2582
    @mvasa2582 7 หลายเดือนก่อน

    Matt - for future reference - the shirt drying problem - we should remove the 'step by step' (I believe we introduced this because models were failing otherwise)

  • @dvloopNew
    @dvloopNew 7 หลายเดือนก่อน +1

    hey, infermatic is not free!

  • @kyrylogorbachov3779
    @kyrylogorbachov3779 7 หลายเดือนก่อน

    Are you using the same hyperparameters?

    • @aitechnewsTV
      @aitechnewsTV 7 หลายเดือนก่อน

      thanks, I love you

  • @jelliott3604
    @jelliott3604 7 หลายเดือนก่อน

    "One"
    surely the best answer to "how many words are on your response to this question?"
    ?
    Or.. "two words"

  • @mayorc
    @mayorc 7 หลายเดือนก่อน

    Link of TotalGPT?

  • @brandon1902
    @brandon1902 7 หลายเดือนก่อน +2

    This fine-tune isn't good. Its data set wasn't large or diverse enough. But I'm sure you're going to re-test with the official instruct (assuming Mistral plans on releasing one) or a better community fine-tune.

  • @Quarkburger
    @Quarkburger 7 หลายเดือนก่อน

    On the pemdas test it gave you 19, which was wrong. That's a fail. How else would you distinguish this from another model that gets the correct answer on the first try?

  • @Horizon-hj3yc
    @Horizon-hj3yc 7 หลายเดือนก่อน

    That the previous Mistral got it right is because of the temperature setting, it creates randomness. Do the same test again on the previous version and it likely fails.

  • @Chomikback
    @Chomikback 7 หลายเดือนก่อน +4

    [REQUEST]: louder please, louder video, thx.

    • @electromigue
      @electromigue 7 หลายเดือนก่อน

      there is a free audio plugin you can use in your video editor called Youlean Loudness Meter, you wan't to hit around 13 LUFS for TH-cam videos. There is a preset in the plugin for TH-cam anyways, you are smart, you will get how it works within some mins of reading.

  • @vinception777
    @vinception777 7 หลายเดือนก่อน

    Thanks for the video, actually for the snake part, I've always played version where you could go through the wall, it was always part of the game, so it's definetly a pass for me haha

  • @lancemarchetti8673
    @lancemarchetti8673 7 หลายเดือนก่อน

    *Does anyone know where I can test the Mixtral 8x22b online, as I don't have a system that supports local models.?? *

    • @waldo1403
      @waldo1403 7 หลายเดือนก่อน

      On poe

  • @thomas.alexander.
    @thomas.alexander. 7 หลายเดือนก่อน

    What level of hardware is required to run this?

  • @boyarinplay
    @boyarinplay 7 หลายเดือนก่อน

    In the test «How many words are in your response to this prompt?» - the model counts each token as a word. And the answer was correct. There are ten of them =)

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน +1

      He didn't ask how many tokens so it's wrong.

  • @erb34
    @erb34 7 หลายเดือนก่อน

    I used mistral in lm studio and got it responding with a whole bunch of weird numbers

    • @Wren206
      @Wren206 7 หลายเดือนก่อน

      That's strange, what version did you try? Mistral 7b v0.2 is really unbelievably good for a small language model. Did you try that one? Also what quantization and context size?

  • @Povcollector
    @Povcollector 7 หลายเดือนก่อน

    I don't understand how you're testing the quality but quantizing the model. Doesn't that itself reduce accuracy and precision?

    • @WhyteHorse2023
      @WhyteHorse2023 7 หลายเดือนก่อน

      Yeah it dumbs it down a little.

  • @PinakiGupta82Appu
    @PinakiGupta82Appu 7 หลายเดือนก่อน +1

    I'll wait for a quantised version to be released by someone on HuggingFace. I'll go with the 3B Q2 models for speed as usual. Good 👍

    • @lesfreresdelaquote1176
      @lesfreresdelaquote1176 7 หลายเดือนก่อน +1

      There is an OLLAMA version already which is... hem... 88GB large

    • @MyWatermelonz
      @MyWatermelonz 7 หลายเดือนก่อน

      Anything below Q4 on mixtral is braindead

    • @PinakiGupta82Appu
      @PinakiGupta82Appu 7 หลายเดือนก่อน

      @@MyWatermelonz 4-bit models run slow on my machine.

  • @mirek190
    @mirek190 7 หลายเดือนก่อน +1

    That fine tune to chat must be broken a bit.
    I got better answers on a clean base model...

  • @squiddymute
    @squiddymute 7 หลายเดือนก่อน +1

    umm infermatic is not free for that model

  • @HectorDiabolucus
    @HectorDiabolucus 7 หลายเดือนก่อน

    Now ask it who should be put in the hole and how long would it take for the 50 people to cover the hole.

  • @8eck
    @8eck 7 หลายเดือนก่อน

    I guess they need some kind of regression testing, to avoid such issues in the future.

  • @DavidRyan-w5k
    @DavidRyan-w5k 7 หลายเดือนก่อน

    Ok think we need to reinvent LLMs, they still have glaring issues with detecting sequences or when something contains something else, so for however smart they appear to be they are simply stupid, so every LLM so far fails at this simple prompt:- "List words that contain the sequence of letters TREAD, like "treadle"", I couldn't believe that GPT4 made up some words in the list, but it does. Havent tried Mixtral 8x22b, because no one can run it yet.

    • @waldo1403
      @waldo1403 7 หลายเดือนก่อน

      its is free on poe