Mistral Medium - The Best Alternative To GPT4

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.ค. 2024
  • Re-upload; the first one was cut off before the ending. I got access to Mistral Medium, Mistral's prototype model that is available only through API (for now). It performs incredibly well and is a fraction of the price of GPT4. This is a great replacement for developers building most use cases.
    Check out D-ID here: timf.io/3t8Xj73
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Script - gist.github.com/mberman84/4f7...
    Mistral - mistral.ai/
    LLM Rubric Results - bit.ly/3qHV0X7
    Chapters:
    0:00 - About Mistral Medium
    2:28 - Pricing Comparisons
    4:59 - Test Results
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 215

  • @matthew_berman
    @matthew_berman  7 หลายเดือนก่อน +165

    New model reached #1 on the LLM leaderboard SOLAR 10.7b, should I review it?

    • @MikeKasprzak
      @MikeKasprzak 7 หลายเดือนก่อน +10

      Nah. 😋

    • @pn4960
      @pn4960 7 หลายเดือนก่อน +20

      10.7b is this a case of over fitting by contamination ?

    • @dkracingfan2503
      @dkracingfan2503 7 หลายเดือนก่อน +3

      Sure!

    • @federicocucinotta7772
      @federicocucinotta7772 7 หลายเดือนก่อน +3

      Of course!

    • @annettewatson1803
      @annettewatson1803 7 หลายเดือนก่อน +3

      Yessss 🎉

  • @paul1979uk2000
    @paul1979uk2000 7 หลายเดือนก่อน +24

    What's remarkable is how quickly these models are developing, I remember what the quality was like early this year and we've seen massive improvements over that time that it's shocking many of us and likely scaring OpenAI, Microsoft and Google, as I suspect they didn't expect open source models to close the gap on them so quickly, and you really do have to wonder how much better they'll get over the next few years.

    • @TimoGraw
      @TimoGraw 7 หลายเดือนก่อน +4

      I guess you don't remember the "We have no moat" memo

    • @jnevercast
      @jnevercast 7 หลายเดือนก่อน +1

      Exactly what I thought too, no moat, as written. Even then though, I thought maybe OpenAI had more than a year head start. They might not be that lucky ​@@TimoGraw

    • @leucome
      @leucome 6 หลายเดือนก่อน +1

      They probably did not expect it to run on consumer hardware that quick too. Even I am bit surprised. I tried Mixtral 4X7B on a 7900xt. It is a totally coherent AI with proper ability to hold a conversation in real-time and about as smart as GPT 3.5 running on a Gaming GPU... No only that imagine it is on an AMD GPU that is allegedly not supposed to be good for AI. I mean if you completely believe Nvidia self gloating marketing.

  • @mattbarber6964
    @mattbarber6964 7 หลายเดือนก่อน +39

    I created a simple flask front-end and have this working as well. I use LLMs all day long for my work and this is the first time I can honestly say I'm more impressed with it than GPT4.
    This is great news for LLMs in general, cause now OpenAI has an actual threat.

    • @carlkim2577
      @carlkim2577 7 หลายเดือนก่อน +2

      Do you mean mixtral small or medium? I thought medium is not yet open source?

    • @mattbarber6964
      @mattbarber6964 7 หลายเดือนก่อน

      @@carlkim2577 I am using the medium through the API. So it's not open source, no, but it's on par with GPT4, if not, better. Plus, as Matt said, its 10x cheaper

    • @orieporter5299
      @orieporter5299 7 หลายเดือนก่อน +3

      @@carlkim2577I’m sure he means a flask front end for the api

    • @businessemail9825
      @businessemail9825 7 หลายเดือนก่อน +1

      what is the output token limit?

  • @DihelsonMendonca
    @DihelsonMendonca 7 หลายเดือนก่อน +11

    ⚠️ Matthew, what about creating a leaderboard of LLMs that could answer all your questions ? So, we could track what's the best one till this day ? Please consider that. It could be made using a single spreadsheet. Thanks. 🎉🎉❤

  • @Dreamslol
    @Dreamslol 7 หลายเดือนก่อน +5

    BEST OPEN SOURCE MODEL.. Every Video i hear that lol.

    • @esimpson2751
      @esimpson2751 7 หลายเดือนก่อน +1

      he makeas a video every time a new champion arises

  • @georgesms
    @georgesms 7 หลายเดือนก่อน +12

    Dude, first: awesome channel! I was thinking that your tests have a serious chance of contaminating the training data of newer models. Therefore, i think adding different questions is always benefitial even if they are not harder than the previous ones.

    • @cluelesssoldier
      @cluelesssoldier 7 หลายเดือนก่อน

      Information sent and received through the API backend typically is not used for training purposes - at least this is how it is with OpenAI.

  • @brandon1902
    @brandon1902 7 หลายเดือนก่อน +3

    I liked seeing you add a couple unique questions. Consider adding a couple fresh questions to the standard set in each video.

    • @Utoko
      @Utoko 7 หลายเดือนก่อน +2

      It is nice to be able to compare results but he should kick the easy 100% pass questions which no model ever fails and ad a couple hard new ones. The models are just getting better.

  • @miladkhademinori2709
    @miladkhademinori2709 7 หลายเดือนก่อน

    So excited 😊 thanks Matt 🙏

  • @stephanembatchou5300
    @stephanembatchou5300 7 หลายเดือนก่อน

    Hi @matthew_berman,
    Thanks for the excellent content like this one. I might say that it is better to have new variations of complex tasks at each evaluation. I am pretty sure the new models are trained or finetuned on your current evaluation tasks.

  • @rheale9420
    @rheale9420 7 หลายเดือนก่อน

    wow your camera and lighting looks amazing!

  • @ossian882
    @ossian882 7 หลายเดือนก่อน +6

    Please add programming tasks that are a bit more unique. There are thousands of python snake games online so it dosen't show very much. Make it add some twist that makes the model need to reason more.

  • @fabiankliebhan
    @fabiankliebhan 7 หลายเดือนก่อน +7

    Impressive. And you have to consider that even GPT-4 struggles with the marble problem.
    I think if you would prompt it again a few times it would get it.
    And also mixtral sometimes does not get it. So it could be a coincidence that it performed worse on this question than mixtral.
    Anyway really good model and great to see this progress.

    • @RainbowSixIntel
      @RainbowSixIntel 7 หลายเดือนก่อน

      Yep gpt 4 turbo on api and on ChatGPT gets this wrong consistently.

  • @coldlyanalytical1351
    @coldlyanalytical1351 7 หลายเดือนก่อน +2

    It's clear that in, say, a year these models will be able to handle most puzzles as well as most humans.
    At this point AGI of a sort could be possible using clusters of models, local memory, and wrapper code.
    These quasi AGIs could be directed to complete tasks autonomously - although they won't be sentient in any way.

  • @toastrecon
    @toastrecon 7 หลายเดือนก่อน +11

    One thing that I've been thinking about recently: I get that it's good to have a set of standard "tests" for these models, but at what point do they become "overtrained" on those tests? Like I you wanted to "game the system", you could just have a pre-trained game of snake in python at the ready to "infer". I'd almost rather it be given something novel like: create the game of snake, but it's multiplayer and 3d. It is interesting to consider how these models go from plain inference to reasoning engines.

    • @PvMLad
      @PvMLad 7 หลายเดือนก่อน +2

      Creat your own private test and rank them against it 🙂

    • @BienestarMutuo
      @BienestarMutuo 7 หลายเดือนก่อน +1

      there is a video about that, is a real problem with no solution today 15/12/2023

    • @joey3070
      @joey3070 7 หลายเดือนก่อน

      Best we have is OpenAI's data decontamination thing

  • @ylazerson
    @ylazerson 7 หลายเดือนก่อน

    great video - thanks!

  • @captanblue
    @captanblue 7 หลายเดือนก่อน +3

    Oooo this is going to be good.

  • @deeplearning5408
    @deeplearning5408 5 หลายเดือนก่อน

    You are quite famous in GenAI world already. I would not be surprise if some companies have your questions and perfect answers already in their datasets.

  • @craiganderson6126
    @craiganderson6126 7 หลายเดือนก่อน +24

    Please redo the mixtral and mistral medium test with variations of your current questions. I think there's a pretty good chance they have been trained on your questions. Especially if they approached you with an API key to test their model.

    • @helbertgascon
      @helbertgascon 7 หลายเดือนก่อน +4

      But then again, I agree... He should change his questions on every video to avoid getting his tests trained.
      Like that snake game coding for example. It's so overused now to the point that it has become on of the 1 button click prompts on the huggingface chat 😂

    • @SmOgER10
      @SmOgER10 7 หลายเดือนก่อน

      Agree. It seems very relevant for this one especially. With a different set of questions I personally found Mistrall-small to perform BETTER than Mistral-medium.

  • @snow-hk9zw
    @snow-hk9zw 7 หลายเดือนก่อน

    amazing!I want you use this right now!

  • @AINEET
    @AINEET 7 หลายเดือนก่อน +3

    Would be nice if you made a current top models compilation video lol there's so much to keep up with

  • @travisporco
    @travisporco 6 หลายเดือนก่อน

    I like that you actually show people how to get things done...actual code! and not just talk-talk.

  • @aldousd666
    @aldousd666 7 หลายเดือนก่อน +1

    BTW, I recommend your channel to everybody who asks me how to learn how to use these models and compare them to each other. I'm the principal engineer at my company and a lot of people ask me.

  • @ldsviking
    @ldsviking 7 หลายเดือนก่อน +2

    I think you should drop the easy questions that everyone gets right, and for the tricky questions, make them regenerate answers to see if they can get it right twice in a row. Or even scale the questions up incrementally to see where the breaking points are. For example, include a fourth runner, then a fifth, etc.

  • @notme222
    @notme222 7 หลายเดือนก่อน +1

    I've been using the test "Write a palindrome using the word ____". (Where the word is something that reasonably works backwards. Like "wolf".) None have gotten it correct yet but it's interesting to see the failures.

  • @ramoneduardo
    @ramoneduardo 7 หลายเดือนก่อน

    Yes, please add these more complex word problems to your test set! Thank you!

  • @TheGeneticHouse
    @TheGeneticHouse 7 หลายเดือนก่อน

    This is I think the first to get it correct holy and the absolute best response to the T-shirts drying in the sun question

  • @Parisneo
    @Parisneo 7 หลายเดือนก่อน

    Amazing video. I think Mistral AI is doing a great job. As of the code, you could have used the python library they provide. That's better than just using requests and you can also have it stream the output sequencially.
    Nice video as always.

  • @steveking4878
    @steveking4878 7 หลายเดือนก่อน +2

    I'm still waiting for the response to the how many words question to be "One." ;)

  • @tomski2671
    @tomski2671 7 หลายเดือนก่อน +3

    100% / 25m = 4 %/m - fill rate
    100% / 55m = 1.8181... %/m - drain rate
    4 %/m - 1.81 %/m = 2.18 %/m - difference(tank is filling at this rate)
    100 % / 2.18 %/m = 45.8 m (units cancel out)
    I had to think about this one for about 2 minutes 😄

    • @DihelsonMendonca
      @DihelsonMendonca 7 หลายเดือนก่อน +1

      I couldn't answer that. Mixtral is better than me on reasoning. But hey, a tiny calculator can do also better than ever human being. 😅😅

  • @MeinDeutschkurs
    @MeinDeutschkurs 7 หลายเดือนก่อน

    Very promising!!!!

  • @fenix20075
    @fenix20075 7 หลายเดือนก่อน

    Great! I love uncensorship! Hahahaha... Suggest adding a 2 equation test: 2A+3B=12, A+B=5. Try to find out A and B's value. (which A=3 and B=2)

  • @leucome
    @leucome 6 หลายเดือนก่อน

    Theres is also a Mixtral 4X7B now. It small enough to fit in a consumer GPU. This one is really promising for a relatively smart chatbot that talk in real-time. The quality on casual language is nearly as good as the 8X7B.

  • @Parisneo
    @Parisneo 7 หลายเดือนก่อน

    By the way, LoLLMS has now mistralai binding to talk to their API and it is really cool. Their models are solid.

  • @byrdziak-media
    @byrdziak-media 7 หลายเดือนก่อน +1

    Might be a good idea to create a matrix spreadsheet of which models got which questions right/wrong.
    Can be something as simple as a Google Spreadsheet. If you leave it private, people will need to request access, leaving their email address with you. Great lead magnet and I'd want to know which models are out there, links to them and how well they're doing.
    I would also suggest that for every new question you ad you drop an old question that no model has failed. you can still add the old question to the matrix, but we dont need to see bill clinton again!

  • @Leto2ndAtreides
    @Leto2ndAtreides 7 หลายเดือนก่อน +1

    For logic questions like If A > B and B > C, is A > C? ... It may be worth it to not ask bigger models to explain their reasoning. The more advanced models should be able to reason internally, one would hope.

  • @trevoC132
    @trevoC132 7 หลายเดือนก่อน +3

    What is the best model available to train with your own data (documents) that include proprietary information for users to query in the form of questions?... looking at llama 2 at the moment.

  • @YvesQuemener
    @YvesQuemener 7 หลายเดือนก่อน +1

    About math problems: it is interesting to see its ability to do arithmetics but I would also give it the opportunity to write a program to solve the problem.

  • @chrisvernon3156
    @chrisvernon3156 7 หลายเดือนก่อน +1

    These models are already *amazing*. I'd be more interested in how to get the best out of them, how to use them effectively in the real world, on real tasks.
    I sometimes wonder what would happen if there was NO further development for a decade? I feel like we're only scratching the surface of what today's models are capable of. They are pretty much black boxes, there's still so much to learn about *these* before we charge ahead with new models.

    • @leucome
      @leucome 6 หลายเดือนก่อน

      The other day I was thinking. Yeah sure AI will get to work everywhere but what will be my use, what can I do with that in my actual life, what utility it has directly for me. It the kind of question we did not have before so there is no obvious quick answer.

  • @marcosbenigno3077
    @marcosbenigno3077 7 หลายเดือนก่อน +4

    @MatthewBerman. Please provide your list of best tested LLMS, please!

    • @matthew_berman
      @matthew_berman  7 หลายเดือนก่อน

      it's in the description! :)

  • @scosee2u
    @scosee2u 7 หลายเดือนก่อน +3

    Loving the more complex problems!
    What game do you have in mind to replace snake once that is easily done by better and better models?

  • @Mcmeider
    @Mcmeider 7 หลายเดือนก่อน +3

    Hey Matthew, i have the feeling, after watching alot of your videos, that these questions should be changed.
    Another thing. Could feel weird, but when i think about how fast we got a 7B-Model that is run by "Experts" - when can we run it at a smartphone or in a linux kernel?

  • @UnFleshedOne
    @UnFleshedOne 7 หลายเดือนก่อน +2

    "Confidence: 95%" is another word in the response (also 95% can be counted as 0, 1 or 3 words depending on how you count), so it got it wrong :)

    • @josjos1847
      @josjos1847 7 หลายเดือนก่อน +1

      No

  • @gidmanone
    @gidmanone 7 หลายเดือนก่อน

    @matthew_berman How do you know your test cases (or any of their variants) have not made it into the training data ?

  • @8eck
    @8eck 7 หลายเดือนก่อน +1

    We need to improve our test questions. 😆

  • @davidrose9577
    @davidrose9577 7 หลายเดือนก่อน

    Love your stuff. Would like to see this mixed with Dolphin, please.

  • @matten_zero
    @matten_zero 7 หลายเดือนก่อน +1

    @7:10 did you turn off "safe mode"? Apparently that parameter controls how censored the responses are

  • @NoHandleToSpeakOf
    @NoHandleToSpeakOf 7 หลายเดือนก่อน +2

    We do not control sampling method and parameters while using API. It is possible that local modes are smarter than they first appear if you tweaking them. How about Mirostat 2.0 sampling, it sound great but I never see it used.

  • @dr.mikeybee
    @dr.mikeybee 7 หลายเดือนก่อน

    an inlet pipe can fill a tank in 25 minutes. the drain can empty the tank in 55 minutes. Once the tank was empty, the drain was left open and the tank was filled by the pipe. How long did it take for the tank to fill? use calculus. Let's denote the rate at which the inlet pipe fills the tank as \(F\) and the rate at which the drain empties the tank as \(D\). We know that \(F = \frac{1}{25}\) tanks per minute and \(D = \frac{1}{55}\) tanks per minute.
    When both the inlet pipe and the drain are open, the net rate of filling the tank is \(F - D\).
    We can set up the following integral to represent the total amount of water in the tank over time:
    \[
    \int_0^t (F - D) \, dt = 1
    \]
    Substituting the values of \(F\) and \(D\), we get:
    \[
    \int_0^t \left(\frac{1}{25} - \frac{1}{55}
    ight) \, dt = 1
    \]
    Solving this integral gives us the time it takes to fill the tank:
    \[
    t = \frac{1}{\frac{1}{25} - \frac{1}{55}} \approx 34.375 \text{ minutes}
    \]
    So, it takes approximately 34.375 minutes to fill the tank when both the inlet pipe and the drain are open. Please note that this is an approximate value and the actual time may vary depending on the specific conditions.

  • @friendofai
    @friendofai 7 หลายเดือนก่อน

    Do you know if they plan release this model for local use in the future?

  • @darwinboor1300
    @darwinboor1300 7 หลายเดือนก่อน

    Keep making the problems more complex over time.
    After each problem, ask the AI if it has seen this type of problem before. You are looking for edge case solutions. You are not looking for previously trained solutions. It is interesting to see how the AIs respond.
    NOTE: To remove ambiguity, the filling task should state that the rates are fixed.
    In our world, rates are dependent upon pressure and resistance. The emptying pressure increases with the height of the fluid column.

  • @dr.mikeybee
    @dr.mikeybee 7 หลายเดือนก่อน

    Ask the model to use calculus for the tank problem. BTW, I always enjoy your videos. Cheers!

  • @JoePiotti
    @JoePiotti 7 หลายเดือนก่อน

    The batch answer should consider that the sun doesn’t shine equally all day.

  • @dr.mikeybee
    @dr.mikeybee 7 หลายเดือนก่อน

    Here's Microsoft Copilot: an inlet pipe can fill a tank in 25 minutes. the drain can empty the tank in 55 minutes. Once the tank was empty, the drain was left open and the tank was filled by the pipe. How long did it take for the tank to fill? use calculus. Let's denote the rate at which the inlet pipe fills the tank as \(F\) and the rate at which the drain empties the tank as \(D\). We know that \(F = \frac{1}{25}\) tanks per minute and \(D = \frac{1}{55}\) tanks per minute.
    When both the inlet pipe and the drain are open, the net rate of filling the tank is \(F - D\).
    We can set up the following integral to represent the total amount of water in the tank over time:
    \[
    \int_0^t (F - D) \, dt = 1
    \]
    Substituting the values of \(F\) and \(D\), we get:
    \[
    \int_0^t \left(\frac{1}{25} - \frac{1}{55}
    ight) \, dt = 1
    \]
    Solving this integral gives us the time it takes to fill the tank:
    \[
    t = \frac{1}{\frac{1}{25} - \frac{1}{55}} \approx 34.375 \text{ minutes}
    \]
    So, it takes approximately 34.375 minutes to fill the tank when both the inlet pipe and the drain are open. Please note that this is an approximate value and the actual time may vary depending on the specific conditions.

    • @Garbhj
      @Garbhj 7 หลายเดือนก่อน

      It used the right method, but I guess it just can't do numerical calculations.

  • @businessemail9825
    @businessemail9825 7 หลายเดือนก่อน

    what is output token limit on mistral small and medium? please talk about this when discussing llm, and i am taking both input context and output token limit.

  • @0xDeadC0de1
    @0xDeadC0de1 7 หลายเดือนก่อน

    i want to be pedantic on the "How many words are in your response to this prompt?" . Nowhere in your code did I see anything that would print out a 'confidence score', so at minimum it responded with 10 words (`Confidence` is a word). Thoroughly enjoyed watching you test this. Have you tried Microsoft Phi-2 yet? It's neat, but in oobabooga in particular it's slower than mistral7b by nearly 2x on my 3080.

  • @MikeBtraveling
    @MikeBtraveling 7 หลายเดือนก่อน

    of course you need to play with this more BUT i dont recall seeing you do this with gpt4...

  • @bobinorlando
    @bobinorlando 7 หลายเดือนก่อน +4

    I hope you are keeping a separate set of different tests that you haven't published but that you are conducting since it would be rather trivial for any of these AI's to hardcode the correct answers to any of these tests knowing that you are going to use them.

  • @aldousd666
    @aldousd666 7 หลายเดือนก่อน

    Have you done any more word counts? I'll be interested to see if it gets that right over and over again. That's kind of insane because that means it has to have some short term working memory

  • @lucademarco5969
    @lucademarco5969 7 หลายเดือนก่อน +1

    I suppose that the different answer for the marble question is because it assumed that the ball is big eonugh to be stuck inside the cup. Could it be? In that case, the answer would be correct. Maybe you should rephrase the question saying the the ball diameter is smaller than the cup diameter os something like that

  • @xdasdaasdasd4787
    @xdasdaasdasd4787 7 หลายเดือนก่อน

    I do hope they open-source it. Like keep the models open-source but make access easy for a price

  • @ChairmanHehe
    @ChairmanHehe 7 หลายเดือนก่อน

    go mistral!!

  • @Dron008
    @Dron008 7 หลายเดือนก่อน

    "Confidence" is also a word in the response so there are 10 words ). Anyway I don't think we can rely on a single answer, it could be coincidence. More different questions should be asked to make sure it can count words. Usually models are very bad in counting objects. No onw could count seagulls on my photo..

  • @aldoyh
    @aldoyh 7 หลายเดือนก่อน

    Yes I'd vote for a spin with SOLAR 10.7b! 😍😍

  • @pon1
    @pon1 7 หลายเดือนก่อน +1

    This is mindblowing, that it got the count of words right for its next answer is sign of planning ahead, which traditionally AI models can't do since it is just predicting the next word. So it would have first had to predict word for word the next sentence then counted it knowing that the number in the sentence wouldn't change the count. Incredible. Wonder how the large model would do.

  • @rawkarpojk
    @rawkarpojk 7 หลายเดือนก่อน

    How come none of the models have changed there OpenAI SDK code. I keep getting the wrong code.

  • @noomondai
    @noomondai 7 หลายเดือนก่อน

    Do you think you can run the Mixtral-8x7B model with a AMD Ryzen 7040 if you have enough RAM?

  • @MHTHINK
    @MHTHINK 7 หลายเดือนก่อน

    I wonder if a subset of the 8 models is censored and not all of them

  • @dinonovak743
    @dinonovak743 7 หลายเดือนก่อน

    Hi, I experienced interesting behaviour, when asked mixtral can create tables with content in it. Also when asked to create graphs it is giving out positive answer, but I am not sure what is returned, can you check?

  • @neel_aksh
    @neel_aksh 7 หลายเดือนก่อน

    Mistral 7b Instruct v0.2 and Mixtral instruct both are fine tuned on the killers problem

  • @bestemusikken
    @bestemusikken 7 หลายเดือนก่อน

    Seems you'll have to up the game on your tests. They are catching up quite nicely.

  • @DeepThinker193
    @DeepThinker193 7 หลายเดือนก่อน +1

    Ugh this test is basically pointless unless you ask it questions outside of your usual question set, like the SAT question it got wrong. The developers can easily make it answer your specific AI questions and thus why they would give you an inside scoop.

  • @Fordtruck4sale
    @Fordtruck4sale 6 หลายเดือนก่อน

    So a 6 bit EXL2 quant would be how much VRAM?

  • @someideas5388
    @someideas5388 7 หลายเดือนก่อน

    I've tried couple of models locally and in colab, but could not get a correct answer for this prompt so far:
    The electricity price is 28p per kWh. A kettle is rated as 2.7 kWh, which means it uses 2.7 kilo watt electricity in one hour. The kettle can boil 1 litter of water in 60 seconds. What is the cost of boiling 1.7 litre of water?
    it uses 2700/3600 = 3/4 = 0.75 watt per second.
    1 litre => 60 sec, 1.7 => 102 sec
    1.7 litre = 102 * 0.75 = 76.5 watt
    1000 watt => 28p, 76.5 watt = 76.5 * 28/1000 = 2.142p

  • @elck3
    @elck3 7 หลายเดือนก่อน +1

    @matthew_berman Matthew, please try “What is the largest number written out in word form between 1 and 1 million that does not contain the letter 'n'?”
    Even GPT-4 doesn’t get this right. Could you please try this?

  • @gileneusz
    @gileneusz 7 หลายเดือนก่อน +1

    you need to prepare more difficult set of questions!

  • @alinayossimouse
    @alinayossimouse 7 หลายเดือนก่อน

    Can you ask Mistral Medium "How many primes end in 2?" I found that many models fail to answer this even when reasoing about it in steps includinng mixtral 8x7b and llama2 70b

    • @errgo2713
      @errgo2713 6 หลายเดือนก่อน

      I just asked Mistral-Medium, it answered:
      Only one prime number ends in 2, and that is the number 2 itself. All other even numbers greater than 2 are not prime because they can be divided by 2. Primes are numbers that have only two distinct positive divisors: 1 and the number itself. In this case, 2 is the only even prime number.

  • @dewijones92
    @dewijones92 7 หลายเดือนก่อน

    More please

  • @adamrak7560
    @adamrak7560 7 หลายเดือนก่อน

    Their API is invitation only.
    So they announce a new powerful and cheap model, give access to influencers so they can test it and show people that the model is actually quite good. But it turns out that there is no API or model weight access at all, so people who see the video cannot actually use it. Also there is no ETA for the access.

  • @MacS7n
    @MacS7n 7 หลายเดือนก่อน

    The team behind Mistral already watched your videos and test questions so I’m wondering if they haven’t fine tuned Mistral medium it with your test questions. You should have new questions never seen before but same type of logic and reasoning.

  • @MatrixCodeBreaker88
    @MatrixCodeBreaker88 7 หลายเดือนก่อน

    When are you gonna cover the new update and features of LM Studio?

  • @saudfata6236
    @saudfata6236 7 หลายเดือนก่อน

    I suggest postman for working directly with APIs

  • @LukasLiesis
    @LukasLiesis 7 หลายเดือนก่อน

    3:03 Matthew i run 7B on macbook air m1 16gb ram w/o any issue and response speed is comparable or faster then chatgpt on paid acc. not sure what's current m1 16gb air price but probably something definitely

  • @Kingzt369
    @Kingzt369 7 หลายเดือนก่อน

    what is the point in paying for tokens when gpt 4 is free in microsoft copilot?

  • @kanishak13
    @kanishak13 7 หลายเดือนก่อน

    why no model is comparing its results with bard or gemini?

  • @jgcornell
    @jgcornell 7 หลายเดือนก่อน

    Does this mean there’s a Mistral Large out soon to blow GPT4 out of the water?!

  • @dennisestenson7820
    @dennisestenson7820 6 หลายเดือนก่อน

    Um, actually there were 11 words in its response. There were 9 in the original sentence, but then it appended a confidence level, making it technically wrong.

  • @victorwilson1337
    @victorwilson1337 7 หลายเดือนก่อน

    is it fair to compare the costs of the models without including GPT-4-1106-Preview?

  • @phobes
    @phobes 7 หลายเดือนก่อน

    This whole time we've been watching AI reproduction of Matthew.

  • @russelllapua4904
    @russelllapua4904 7 หลายเดือนก่อน

    Wait wait. Mixtral was only a few days ago right? And this model already beats it?

  • @highdefinist9697
    @highdefinist9697 6 หลายเดือนก่อน

    You could also try giving the models nonsensical or contradictory questions, to check if they are able to figure out that there is no correct answer, rather than giving a wrong answer. For example, GPT-4 fails at the following, relatively simple problem:
    I have three apples today, and I ate one of those three apples yesterday. How many apples do I have today? Explain step by step.

  • @DihelsonMendonca
    @DihelsonMendonca 7 หลายเดือนก่อน

    ⚠️ ChatGPT 4 could NOT answer this question correctly: "My father's father had two sisters, Mary and Jane. Jane had three children, one of them being Peter. Peter had two sons, Paul and Jack. Can you tell me if I have any known cousins and their names?"

  • @Leto2ndAtreides
    @Leto2ndAtreides 7 หลายเดือนก่อน

    They're being stubborn about giving us access to Mistral Medium... I'm not sure they have enough compute for some reason (what are clouds for???)

  • @Yonatan_1770
    @Yonatan_1770 7 หลายเดือนก่อน +1

    Yes do more complex word problems

  • @B4zing4
    @B4zing4 7 หลายเดือนก่อน

    will this run on my 7900xtx?

  • @PerfectArmonic
    @PerfectArmonic 7 หลายเดือนก่อน

    The Einstein test of intelligence is the best test for LLMs

  • @burnt1ce85
    @burnt1ce85 7 หลายเดือนก่อน +3

    How come you didnt compare GPT-4 Turbo pricing with mistral medium? Mistral medium is just a bit cheaper than GPT-4 turbo

    • @tomski2671
      @tomski2671 7 หลายเดือนก่อน +1

      I'm seeing 3.64 times cheaper on output - that is not insignificant.

    • @robxsiq7744
      @robxsiq7744 7 หลายเดือนก่อน

      Turbo? you mean GPT3.5 turbo?

    • @redone823
      @redone823 7 หลายเดือนก่อน

      What's their pricing? I'm on a wait-list and can't see anything.

    • @lio1234234
      @lio1234234 7 หลายเดือนก่อน

      @@robxsiq7744 No, GPT-4 Turbo, it's on the API

    • @burnt1ce85
      @burnt1ce85 7 หลายเดือนก่อน

      Oops... After reading the pricing on my desktop (i initially read it on my phone), I realize that I misread the pricing... ignore my dumb comment

  • @3dus
    @3dus 7 หลายเดือนก่อน

    Is there some contamination going on? It’s kinda incredible that 2 kids leave deepmind, and builds a GPT4 out of water in 8 months while Google is full of issues.

  • @fbalco
    @fbalco 7 หลายเดือนก่อน +1

    Why GPT4 and not GPT4 turbo when comparing pricing? Paid post for mistral so need to make GPT4 look worse? I would hope you know that GPT4 turbo is 25% of the price of GPT4

  • @tomschuelke7955
    @tomschuelke7955 7 หลายเดือนก่อน

    Wouldnt it be nessesair to repeat the answers several time to get a statistically correct representation for the quality of the answers?

  • @ivancartago7944
    @ivancartago7944 7 หลายเดือนก่อน

    Did you do the counting words multiple times or just once? Might be a fluke.