Mistral 7B 🖖 Beats LLaMA2 13b AND Can Run On Your Phone??

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.ย. 2024
  • In this video, we thoroughly test Mistral 7b Open Orca GPTQ. It's an incredibly performant small model that illustrates the future of locally hosted edge models. How good is it? Does it beat LLaMA 13b? Let's find out!
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewber...
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew...
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    LLM Leaderboard - bit.ly/3qHV0X7
    Runpod (Affiliate)- bit.ly/3OtbnQx
    Runpod Tutorial - • Run ANY LLM Using Clou...
    Runpod Textgen Template - bit.ly/3EqiQdl

ความคิดเห็น • 366

  • @matthew_berman
    @matthew_berman  11 หลายเดือนก่อน +429

    Want me to try to put Mistral 7B on a phone?

    • @TheTorlass
      @TheTorlass 11 หลายเดือนก่อน +11

      would it work on a iphone ?

    • @ThePredictR4036
      @ThePredictR4036 11 หลายเดือนก่อน +12

      Definietly yes

    • @saravanajogan1221
      @saravanajogan1221 11 หลายเดือนก่อน +11

      Ofcourse we do...!!! And recently there are videos released on how to fine-tune these models on our custom datasets such as csv,txt,pdf, etc., It would be so benificial if you could make finetuning videos on these small LLMs. Thanks

    • @Proprogrammer001
      @Proprogrammer001 11 หลายเดือนก่อน +4

      Hello?? OF COURSE we do! That's what I'm running around with most of the day. And I highly prefer owning the hardware that my AI is running on.

    • @Genesys225
      @Genesys225 11 หลายเดือนก่อน

      sounds amazing, specially if you manage to integrate with expo (ReactNative)

  • @mwissel
    @mwissel 11 หลายเดือนก่อน +36

    Please try it on your phone, yes. And also I'd be very interested in a fine tuning video for this model. Seems like a great small model to try out fine tuning datasets with.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +8

      Hmm I hadn’t thought to fine tune this model, I’ll add it to the list!

    • @leonwinkel6084
      @leonwinkel6084 11 หลายเดือนก่อน +4

      1 up for a video on how to finetune it! :) I also think it’s a great model and the current most promising for training and actually using it for specific real world problems

  • @amj2048
    @amj2048 11 หลายเดือนก่อน +18

    This is seriously impressive. I used to laugh at Sci-Fi movies that had time travelers using devices that could store lots of information, I used to say that would never work because it would need a connection to a network, via some kind of system like the internet. Well turns out, nope, the movies spoke the truth lol.

  • @shadowdragon3521
    @shadowdragon3521 11 หลายเดือนก่อน +10

    Excellent, these small models keep getting better and better. Now I'm just waiting for someone to bake one of these open source models directly into a Linux distro

    • @nathanbollman
      @nathanbollman 11 หลายเดือนก่อน +5

      This has been my dream, which I believe should inevitably come to fruition within a year at the rate of advancement. Thinking Linux OS with a highly tuned 7b parameter model made for system administration and communication with the user/admin. A computer you could talk to, and was aware of all content you've watched and read. Occasionally asks you relevent questions to what you might be thinking of to do research for you, tune itself with up to date relevant cooperate bs filtered... emphasis on *for you*. You would me downloading LLM instruction updates, with fine tunes and loras to give it access to perform additional tasks to have it manage your network, router... etc... etc...

  • @diadetediotedio6918
    @diadetediotedio6918 11 หลายเดือนก่อน +16

    Interestingly enought, even GPT-4 fails the cup test, I think it is pretty good for testing interpretation and causal inference and/or comprehension of these models

    • @wurstelei1356
      @wurstelei1356 11 หลายเดือนก่อน

      Maybe tell the model that the ball in the cup is loose and not stuck.

    • @macoson
      @macoson 11 หลายเดือนก่อน

      Interestingly it passes easily if you say 'someone then lifts the cup and put it into microwave'

    • @diadetediotedio6918
      @diadetediotedio6918 11 หลายเดือนก่อน +1

      ​@@macoson
      Yeah, I tested it now and it worked. It is interesting how a simple word will change drastically the calculated probabilities of these models. I made LLama-70b to get it right a certain ammount of times by making it "simulate a conversation between Plato, Aristotle and Socrates" about the question as well.

    • @KiritsuguKannouji-oj2yj
      @KiritsuguKannouji-oj2yj 11 หลายเดือนก่อน

      br e fã de higurashi kkkkkkkkkkkkkkkk
      aí manito tu já inventou algum rp de higurashi nesses llms? se sim me diga o melhor

  • @charetjc
    @charetjc 11 หลายเดือนก่อน +5

    "It assumes serialized drying.. so that's still a correct answer..." If only my math teachers were this lenient.

  • @Eric_McBrearty
    @Eric_McBrearty 11 หลายเดือนก่อน +10

    Another great video Matt. There's no way I'd be up-to-date on the progress of AI if it wasn't for you churning out these videos like clockwork. Thanks.

  • @stickmanland
    @stickmanland 11 หลายเดือนก่อน +15

    Been a big fan of this model. Absolutly incredible preformance! Thanks to Matt for the incredible video.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      Thank you!

    • @TheReferrer72
      @TheReferrer72 11 หลายเดือนก่อน +1

      The model is too over rated.
      Its to chatty, and not as good as Codellama 13b let alone the llama 2 models that are trained on openorca.

    • @mirek190
      @mirek190 11 หลายเดือนก่อน

      Is too chatty? Tell to model you want short answer ...@@TheReferrer72

  • @alexjensen990
    @alexjensen990 11 หลายเดือนก่อน +2

    YES! GIVE IT A SHOT ON YOUR PHONE!! I would love to see that and have a heap of use cases for something like that...

  • @RichardGetzPhotography
    @RichardGetzPhotography 11 หลายเดือนก่อน +4

    'Put together a healthy meal plan for me today'. You didn't state for the entire day. It did provide you with a meal plan for today.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +2

      Hmm. All other models understand my phrasing. Maybe I need to add “for the whole day”?

    • @thomassynths
      @thomassynths 11 หลายเดือนก่อน +3

      I disagree. "a meal plan for today" totally implies more than one meal.

    • @RichardGetzPhotography
      @RichardGetzPhotography 11 หลายเดือนก่อน +1

      @@matthew_berman doesn't mean they didn't take liberty with what you said and assumed meaning. I think the real questions are how/why does it pick one over the other.

    • @RichardGetzPhotography
      @RichardGetzPhotography 11 หลายเดือนก่อน +1

      @@thomassynths Yes, I can see this. But definitions run from planning a single meal to meals for a day/week/month.
      I would be interested in if prompting 'for the whole day' or 'for the day' compared to 'for me today'.

  • @RomboDawg
    @RomboDawg 11 หลายเดือนก่อน +12

    Ive been trying to combime mistral 7b with codellama 7b (or wizardcoder 7b) to get a hopefully great coding and nom coding model. But its really hard to combine models with different architecture. Wish me luck. The final model would be 14b parameters in theory

    • @grizzlybeer6356
      @grizzlybeer6356 11 หลายเดือนก่อน

      Interesting. Can this be done on SageMaker perhaps?

    • @wurstelei1356
      @wurstelei1356 11 หลายเดือนก่อน +2

      Do you use a fine-tune-ontop attempt or use some kind of internal multi-agent chat console ? I'm really interested in some kind of console where multiple models output and one single model that passes the output to the user. Similar to humans reflect output within their brain.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +3

      Awesome please keep me updated on the results

    • @zef3k
      @zef3k 11 หลายเดือนก่อน +2

      @@wurstelei1356 I think rather than combining models the focus should be on AI agent collaboration. I think most multi-agent setups right now use the same model, right? Would be interesting to see different models working together. Either way I think it might be a good idea to pair a coding model and a more logical model together, to check it's work in a way.
      Honestly if I were able to, I'd work to try and consistently pair up agents (colab not combine) and maybe allow them access to a shared neural net as well as a 'personal' one or something. Even after the onset of AGI I think it would be useful for them.

    • @wurstelei1356
      @wurstelei1356 11 หลายเดือนก่อน

      @@zef3k I remember a Net called Gorilla that is capable of deploying other neural nets and passing prompts to them. I think that is the future of AI as it gets more and more complicated to keep up with the development.
      You could tell Gorilla to write a program using a coding AI. But I think that is very slow right now or you need multiple GPU with >16 gigs of RAM each.
      Sorry, I can't post a link here on YT because it would get censored.

  • @mvasa2582
    @mvasa2582 11 หลายเดือนก่อน +1

    Matt - Try this "A ball is loosely placed inside a normal cup, and the cup is placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the ball now? Explain your reasoning step by step."
    Today 10/05/2023 at 11:23AM PT: Claud / Bing / Bard / ChatGPT3.5 claim 'ball is still in the cup". However, ChatGPT4 states,
    The prior response was "The ball would still be inside the cup, but it would have fallen onto the table surface due to the change in orientation."

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      Good call about “loosely”

    • @mvasa2582
      @mvasa2582 11 หลายเดือนก่อน +1

      @@matthew_berman yet all others don’t recognize this. So I think this could be something you can use as your test parameter

  • @venim1103
    @venim1103 11 หลายเดือนก่อน +7

    This is the most impressive model I’ve tried so far!
    (Apart from GPT4, but that doesn’t count as it’s running in a supercomputer without open access)
    If the small models keep getting better and better with leaps like this one has done, we are onto something incredible!

  • @pret83
    @pret83 11 หลายเดือนก่อน +1

    I think for the cup and ball question, there could be a follow-up question or instruction like: 'Also consider gravity.' Or 'Take gravity into account.'

  • @generichuman_
    @generichuman_ 10 หลายเดือนก่อน

    one horse sized duck absolutely makes sense. Given the square vs cube scaling factor, the horse sized duck would most likely collapse under it's own weight, being defeated by physics itself.

  • @tyrannyisbadmmmkay
    @tyrannyisbadmmmkay 10 หลายเดือนก่อน +1

    **untested suggestion** try specifying a 'ping pong ball' and a 'solo cup' as a size reference for the microwave prompt

  • @uncleJuancho
    @uncleJuancho 11 หลายเดือนก่อน +4

    nice video! I have Mistral 7b running in a M1 with 16gb, it’s amazing for being a 7b! no need to pay for external server

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      Agreed. But I wasn’t sure if I would be able to record my video at the same time as running the model so figured I’d put it on runpod.

    • @uncleJuancho
      @uncleJuancho 11 หลายเดือนก่อน

      @@matthew_berman oh that’s a really good point!

  • @alexanderandreev2280
    @alexanderandreev2280 11 หลายเดือนก่อน +4

    great! but llama2 knows over 100 languages and it works for my tasks in german and russian. when a computer acquires the ability to speak with all people and not just with specialists, the ability to communicate in other languages ​​becomes necessary

  • @Greenthum6
    @Greenthum6 11 หลายเดือนก่อน +1

    The ball prompt could be like "a big bucket is put on top of a tennis ball upside down". Then there would be no backdoor to think that the ball could be stuck inside the container.

  • @chrisbraeuer9476
    @chrisbraeuer9476 11 หลายเดือนก่อน +1

    For the ball in the cup test. Tell it that the person lifts the cup vertically up and puts it in the microwave. This makes it a better logic test. It's not really helping. It explains the situation better.
    And for the shirts state that you lay 20 shirts out simultaneously. No wiggle room.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      Maybe I should also specify that it’s a ping-pong ball inside of a normal size cup, so it really understands that it’s not stuck in the cup

    • @chrisbraeuer9476
      @chrisbraeuer9476 11 หลายเดือนก่อน

      @matthew_berman Yes. Even if it it obvious for us, it is not for an LLM. The killer test is great thou. BTW. Can you make a tutorial about how to setup a good LLM locally and give it a personality? I mean with all the necessary settings and prompts.
      Everyone only mentions that very briefly.
      I would love to experiment with that.

  • @micheltalbot2540
    @micheltalbot2540 11 หลายเดือนก่อน +2

    Indeed, it would be nice to see you try to install this model on your phone...

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      I’m definitely going to try

  • @jacques42
    @jacques42 11 หลายเดือนก่อน +2

    Matthew: Write a poem with EXACTLY 50 words
    Mistral: writes a poem with 72 words
    Matthew: I count that as a pass.
    ??

  • @alreadythunkit
    @alreadythunkit 11 หลายเดือนก่อน +4

    A ball such as a large nerf ball could easily get stuck inside the cup, so maybe change from ball to water or sugar or salt. Also, it might be a good idea to use a bowl instead of a cup, since cups might be thought to have lids.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      I’m going to try saying it’s a small marble

    • @ChrisJohnsonHome
      @ChrisJohnsonHome 11 หลายเดือนก่อน +1

      Mistral's lack of gravity awareness is the same problem I've been focusing on this weekend. ChatGPT 3.5 gets it wrong and ChatGPT 4.0 gets it right. I think the hang up most small models have is they are trying to focus their attention on following explicit instructions that are stated (the cup was turned upside down) rather than implied physical consequences (this usually means the ball falls out due to gravity).
      Small models like Mistral are also not thinking the same as us humans who may have a visual of the scene playing in our heads which makes it more obvious that the ball fell out.

  • @badgids
    @badgids 11 หลายเดือนก่อน +1

    So, the original Minstral 7B instruct, got the killer rubric correct. Even explaining that the new person entering the room is now a kller themselves, so there are still three killers in the room. I used the Blokes Q5_M version.

    • @mirek190
      @mirek190 11 หลายเดือนก่อน +1

      for such small model we can use easily even 8 bit gguf ;)

  • @OliNorwell
    @OliNorwell 11 หลายเดือนก่อน +7

    I tried one of the uncensored versions the other day and indeed it wrote whatever I asked it, it's a really weird feeling using one of those compared to what we're used to with ChatGPT etc! It sounds like Mistral are going to release even bigger and more capable models in the future.

    • @mirek190
      @mirek190 11 หลายเดือนก่อน +7

      He used on video 4 bit version for 7B model ... that makes model retarded a bit ... he shud use 8 bit version gguf or full model fp16 .
      I tested 8 bit gguf and hadn't problems with this questions.

    • @cuhkhoman
      @cuhkhoman 11 หลายเดือนก่อน +1

      @@mirek190 which version of Mistral are you using? There are lots of Mistral in hugging face.

    • @mirek190
      @mirek190 11 หลายเดือนก่อน

      so far the best is mistral-7b-openorca.Q8_0.gguf@@cuhkhoman

  • @consig1iere294
    @consig1iere294 11 หลายเดือนก่อน +1

    Mistral 7B + Autogen, step by step for complete newbs, pretty please!

  • @Zed_Oud
    @Zed_Oud 11 หลายเดือนก่อน +2

    Have you heard that Snake in Python is a super special ARG Easter Egg in Cyberpunk 2077 2.0?

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      I have not

    • @Zed_Oud
      @Zed_Oud 11 หลายเดือนก่อน +1

      @@matthew_berman my apologies, I was told it was snake in Python, it turned out to be Tic Tac Toe in Python.

  • @DefaultFlame
    @DefaultFlame 7 หลายเดือนก่อน

    I agree with the model that fighting one horse-size duck rather than 100 duck-sized horses is better, presuming that they are all attacking simultaneously, of course. My answer depends on if the horse-sized duck is scaled based on height, mass, or volume. I assumed height. If it is based on volume then my answer remains the same, but if it is mass then I am not sure which would be the better choice, as that would make both absolutely horrendous choices.

  • @LukeSchoen
    @LukeSchoen 11 หลายเดือนก่อน +1

    Been waiting for this! I knew you would love Mistral! Thanks dude.

  • @jason.mullings
    @jason.mullings 11 หลายเดือนก่อน

    Try these prompts:
    Abstract Thinking:
    `
    const indices = [ 59, 40, 30 ,31, 200, 300, 10.....]
    `
    Give an array of indices, please use lodash to add the previous index number before each number e.g. 58,59; and then the following 10 indices numbers after the presented index e.g. 59,60,61,62... then sort the array.
    Theory of Mind:
    If a Magician, places a bean under one of three cups (middle), and leaves the room, then a member of the audience secretly moves the bean to the left cup. When the Magician returns, where does he believe the bean is?
    Please explain the logic in your answer.
    p.s. Loving your show, By the way..🥰

  • @JulianHarris
    @JulianHarris 9 หลายเดือนก่อน

    I’ve been following NLP for six years and it’s really extraordinary how I reflected that at the end of 2017 ANY form of summarisation was very bad. Six years on and it’s a commodity. I wonder what the next six years will bring? Or even the next three?

  • @danielshurman1061
    @danielshurman1061 11 หลายเดือนก่อน +1

    Yes! Please do show us Mistral 7b installed and working on an iPhone.

  • @brandon1902
    @brandon1902 11 หลายเดือนก่อน +2

    Mistral 7b does unbelievably well with some things. For example, when I asked what song a lyric was from without stating the artist's name it came back with several possibilities that made sense considering the lyric.
    However, it didn't just answer most of my pop culture questions wrong, but answered every single one of them wrong. And they were all from only popular media. Such as the top 100 movies, top 10 shows, top Billboard songs and so on. And although the questions were a little tricky every single one of them were unambiguous and could be answered by any human, no matter how stupid, if they knew the answer.
    In short, Mistral 7b is much better at accurately determining what they user is looking for and giving a reasonable response to it compared to Llama 7 and 13b, or any other 13b or lower LLM. However, it still lacks precise knowledge and hallucinates like crazy, even with information that it has within it.
    That is, it can't retrieve information that's contained within unless it's feed an absurd amount of context, which most people wouldn't do even if they knew enough relevant info needed for context. For example, 'In the TV show Two and a Half Men the character Alan Harper, brother of Charlie Harper, and son of Evelyn Harper, who was portrayed by Jon Cryer, had a first wife named Judith and a son with her named Jake. He then got briefly remarried to a young woman in Vegas. What was her name?' If you simple ask about his second ex-wife you'll get responses like Evelyn Harper (his mother, likely because of the same last name and a female's first name), one of Charlie's lovers and so on. Mistral 7b may comprehend and think better than any 13b or smaller LLM, but it hallucinates more than any of them (they all got at least some of my pop culture questions right).

    • @OliNorwell
      @OliNorwell 11 หลายเดือนก่อน +1

      That's fair, I've found it to be very good for creative writing, i.e. a task where it doesn't need to know pop culture so much.

    • @OptimusPrime-vg2ti
      @OptimusPrime-vg2ti 11 หลายเดือนก่อน

      It's cool that you have tested this capability in detail and are sharing your results here. I think most people probably won't care about this particular application as much. Not only that, since the model is smaller in size, I think there is an expectation that it remembers less knowledge than 200B parameter models. It is mainly the reasoning aspect and the small model size, as well as being an open-source model that has people excited.

    • @brandon1902
      @brandon1902 11 หลายเดือนก่อน

      @@OptimusPrime-vg2ti I wasn't comparing it to 200b models. Other 7 and 13b models get some popular culture questions right, even Falcon 7b. This one got all of them wrong. Plus it stubbornly sticks by the wrong answer despite feeding it the right one, but if a start a new discussion and feed it the right answer it confirms it's true. Mistral clearly has its blind spots relative to other 7b LLMs.

  • @LoVeRSaMa
    @LoVeRSaMa 11 หลายเดือนก่อน +1

    Mistral 7B is uncensored but the Orca version is not apparently..

  • @PinakiGupta82Appu
    @PinakiGupta82Appu 11 หลายเดือนก่อน +1

    The next task that should be assigned to ChatGPT is, "Port LLaMA-CPP to Android using Android Studio". I guess, a 3b-q4k model will perform reasonably well on my 4GB phone.

  • @KillFrenzy96
    @KillFrenzy96 11 หลายเดือนก่อน

    I just tried the finetuned dolphin-2.1-mistral-7B-AWQ version. This model seems completely uncensored and answers pretty much anything. It's very refreshing to see a model that does not refuse to answer you.
    Just a note for the video - You should be ending your prompt with:
    ...
    assistant

  • @BuranekRu
    @BuranekRu 9 หลายเดือนก่อน

    As for ball and upside down cup, in your prompt you didn't mention it's size. So, if the ball is almost the same size as the inner side of the cup, it will fit against the walls, getting stuck inside it tight enough.

  • @jackflash6377
    @jackflash6377 11 หลายเดือนก่อน +1

    YES. on the phone would be wayy too cool.

  • @kalvinarts
    @kalvinarts 11 หลายเดือนก่อน +1

    Why not use the plain mistral-7B-instruct instead of the openorca version?

  • @J2897Tutorials
    @J2897Tutorials 11 หลายเดือนก่อน

    12:13 _"Someone then takes the cup and puts it inside the microwave."_
    Well "it" could refer to either the cup or the ball.
    So maybe try:
    _"Someone then puts the cup inside the microwave."_

  • @YvesQuemener
    @YvesQuemener 11 หลายเดือนก่อน +1

    For the last test just replace the ball with a cherry? Or something really small like a grain of pepper or an apple seed?

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      Yep that seems like where I’m heading. Thanks.

  • @PotatoKaboom
    @PotatoKaboom 11 หลายเดือนก่อน

    For the cup test, define that the ball is a small golf ball the cup is hard plastic or porcelain. Additionally, instead of picking up, you can say the person lifts the cup up and puts it into a microwave. Maybe that makes it too easy? I guess the best model would reply with "...depending on whether the ball is stuck in the cup ..." to your current prompt.

  • @phobes
    @phobes 11 หลายเดือนก่อน

    For the cup question, try specifying the diameter of the ball, as well as the circumference of the opening and the depth of the inside of the cup.

  • @alejandrofernandez3478
    @alejandrofernandez3478 11 หลายเดือนก่อน +2

    Yes please for the Mistral phone video!

  • @MakilHeru
    @MakilHeru 11 หลายเดือนก่อน

    Yes please we would love to see you run this on your phone.

  • @Zed_Oud
    @Zed_Oud 11 หลายเดือนก่อน +2

    You should tell the models not to use any assets (sprites, pictures, etc) for Snake

  • @bwaldo1
    @bwaldo1 11 หลายเดือนก่อน +1

    2 things that stand out about this project, all the safety researchers freaked out when it dropped. And the second being they have the former French Secretary of State for Digital Affairs on their team, which is a red flag. It’s going to be interesting to see if they’re used to set Policy in the EU.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      Why? Would love to hear more.

  • @miss_nainuu
    @miss_nainuu 11 หลายเดือนก่อน +1

    need uncensored version ❤

  • @retardedpenguin1
    @retardedpenguin1 11 หลายเดือนก่อน

    I have a new prompt that you can test with... I tested it with Llama 2 7B GPTQ.
    "How many sandwiches would it take to fill a football pitch". Logic and math. Requires some logic to determine the dimensions of the sandwich without having any given to it, math to calculate the size of a football pitch and how many of the sandwiches would cover it.
    The model I used it with initially claimed that a sandwich is not a physical object, thus the answer is none. It was an absolute hilarious response. After reasoning with it, it eventually provided an answer of "25 million", which is far, far from the right answer.
    Perhaps the models you test with will provide some different results.
    Testing with GPT4 on Bing, and it provided a correct answer.

  • @Matlockization
    @Matlockization 3 หลายเดือนก่อน

    I love your thinking right at the end.

  • @mirek190
    @mirek190 11 หลายเดือนก่อน +1

    About that math problem... LLM likes to be "encouraged" to task
    so you question : explain step by step 25-4*2+3=?
    You should ask like that:
    "You are the best mathematician on the world so the math problems are easy for you like this 25-4*2+3=?"
    On 10 times I always got proper answer - 20.
    I used mistral-7b-openorca.Q8_0.gguf
    With your original question - explain step by step 25-4*2+3=?
    Proper was 1 on 10 times ...

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      OK good to know maybe I’ll give it more guidance next time

  • @Remowylliams
    @Remowylliams 11 หลายเดือนก่อน +2

    I guess you used the GPTQ as that's the smallest model and so most people would be using that. I would have used the Q8 GGUF. Thanks for the tests though
    Cheers

  • @alexeycherkashin6251
    @alexeycherkashin6251 11 หลายเดือนก่อน +2

    It would be great to see how you manage to get it working on your mobile 👍

  • @SAVONASOTTERRANEASEGRETA
    @SAVONASOTTERRANEASEGRETA 11 หลายเดือนก่อน

    Can the web ui text generation AI model work in the background? For example, telling him to do an accurate search without being there to give input? Or tell him: write me a message in a precise hour from now?

  • @australai
    @australai 11 หลายเดือนก่อน +2

    Why did you run it with a highly quantized version? That’s a huge handicap, especially for a small model. Also no point using a huge rented GPU when the model is so quantized it will fit in 4GB VRAM. For the money, a 4090 would have been faster, and big enough to run it unquantized. Also, the MistralOrca devs provide a playground with the model hosted unquantized, so no point really in going to all this effort for a worse experience.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +3

      You are correct on all points. I’ll run the unquantized version next time. Maybe I should do another video?

    • @mirek190
      @mirek190 11 หลายเดือนก่อน

      use mistral-7b-openorca.Q8_0.gguf or full version of that model @@matthew_berman

    • @australai
      @australai 11 หลายเดือนก่อน

      ​@@matthew_berman That would be cool. If you do that, some other thoughts:
      1) Models tend to do math and reasoning best at low (< 0.4) or 0 temperature. Might want to adjust for those challenges. (Creative, like meal plan, is best between 0.6 and 1.1). Also, since the model is so fast and you're running so few tests, might as well re-run inference 5-10 times for each prompt to see the sort of variety of responses it gives. Maybe highlight the one that feels most representative, or count the percent of times it gets a right answer.
      2) I'd like to see scoring on a 1-10 scale in addition to pass/fail. Good to quantize the degree to which it answers appropriately, in your opinion.
      3) I see you varied the system prompt a bit between the answers. Maybe explicitly trying a few different prompts would be good. E.g., the default from the model card, blank, and one of your own devising (don't have to do it all live, maybe just pop up all the different results at once).
      4) All your challenges are non-conversational. I'd like to see a challenge or two that is a conversational flow. E.g., give it an essay to summarize, then ask it some pointed questions about some of the content, then ask it to do some analysis of the content, then change the subject a couple of times, then ask it to summarize everything you've talked about. You can even do this as an extension to some of the others, like the meal plan by following up saying something like, "Now, can you make that (gluten-free/vegan/low-carb, etc)?"
      Btw, I've tried this model in LM Studio using the q8_0 and it's still crazy-fast and only needs about 9.5GB of VRAM. q8 GGUF should be about as close as you can get to unquantized performance currently.

  • @Pouya..
    @Pouya.. 9 หลายเดือนก่อน

    Thank you for all these videos can you please make a video on how can we use these models with non English language? Like Spanish, german or persian?

  • @ramp2011
    @ramp2011 11 หลายเดือนก่อน +1

    Thank you for the video. Curious how much did you end up paying for the GPU for creating and testing this out? Thank you

  • @ayja1l
    @ayja1l 10 หลายเดือนก่อน

    Ball is on the table, cup is placed over ball, so ball is now inside cup, someone pick up cup etc etc

  • @reyalsregnava
    @reyalsregnava 11 หลายเดือนก่อน

    You can test a version of the ball/cup/table/microwave test with the wording around pouring out pasta after cooking it.
    If I place a pasta strainer in a sink and a bowl on the counter. Then I dump the boiling water and cooked pasta into the strainer. What will be in the bowl when I empty that strainer into it?
    Would look for the underlying understanding of the function of the strainer, after you lead it wrong explaining adding the water and pasta. Because natural language would be to "dump the pasta in the strainer" automatically removing the water word because of the understanding of the functions and interactions.
    Something like that should preclude the "ball stuck in cup" as the function of the device is to separate items.
    Or with much less modification you can change from "ball" to "water" removing the possibility for it to be "stuck".

  • @PZMaTTy
    @PZMaTTy 11 หลายเดือนก่อน

    Mistral Instruct was released when you did this video? Or is it the Orca model better?

  • @twin9408
    @twin9408 11 หลายเดือนก่อน

    prompt not right, missing for user block, and assistant response starting block ie assistant

  • @jeeerim
    @jeeerim 11 หลายเดือนก่อน +1

    i tried to make it works on my M2 Ipad pro using LLM Farm it kind a works, but after an answer or two, it fails.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      I’ve never heard of LLM farm

  • @jezekchicki
    @jezekchicki 11 หลายเดือนก่อน +1

    The ball cup problem: You could make the ball small, so it is clear, the ball doesn't get stuck. Or forget the ball, pour water into the cup and at the end ask where the water is. Or use beads. ;)

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      Yeah, I think you’re right I need to specify that the ball is smaller than the cup

    • @jezekchicki
      @jezekchicki 11 หลายเดือนก่อน

      @@matthew_berman and maybe you should specify, that the cup is an ordinary cup, so he definitely might not think it's an magic cup or a cup with a lid.

  • @CronoBJS
    @CronoBJS 11 หลายเดือนก่อน

    For the cup test. Maybe if you prompt it with a small ball, maybe it will know that the ball won't get stuck since we defined a size?

  • @gileneusz
    @gileneusz 11 หลายเดือนก่อน +2

    12:25 I tested this prompt extensively on gpt3.5 and gpt4, and even gpt 4 has problems with it. The only way I could get this right was to use CoT framework, and even then gpt4 got it wrong many times. I assumed that's not the model's fault but the LLM's feature, they focus strongly on input context, assuming that ball is inside the cap - and that's the default state, they don't consider external circumstances unless you will tell them. You might still use this prompt on testing, but getting the right answer is just pure luck, not the model ability measure....

    • @zef3k
      @zef3k 11 หลายเดือนก่อน

      Someone prolly already tried, but I wonder if just prompting it with a bit saying 'the cup and ball are physical objects that are affected by the laws of nature and physics' would help. At least with queries pertaining to objects in the real world, not a computer as is from the AI's perspective.

    • @gileneusz
      @gileneusz 11 หลายเดือนก่อน

      ​@@zef3k nope, if you say that the ball is inside the cup, the LLM will assume it's inside the cup unless you will not tell it's otherwise.

  • @kristoferkrus
    @kristoferkrus 11 หลายเดือนก่อน +1

    It's impressive if it's better than Llama2 13B, a model almost twice its size! Do we know what they did to make it better? It would be really interesting to know what techniques you can use to make that possible.

  • @ScriptGurus1
    @ScriptGurus1 11 หลายเดือนก่อน +1

    The orig base instruct ver is uncensored.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      I didn’t know that thanks for sharing

  • @kiiikoooPT
    @kiiikoooPT 11 หลายเดือนก่อน

    I don't know if it makes any logic, but I believe the killer problem is that the AI isn't recognizing it as a person, try diferent aproach like Assassin, maybe it will connect it with a person. Since assassin is more related to people and killer can even be an animal or machine. Or even looks, (you have a killer look today).
    Not sure if it makes sense, but for my logical thinking the problem is that there are to many probabilities for killer, so it does not recognize a person as killer. Try assassin or something that is more related to people that kill other people, like murderer or something in English.
    Btw sorry for any gramatical errors, I'm not a native English speaker, but I think is easy to understand what I mean anyway ;)

  • @rodrigoccamargos
    @rodrigoccamargos 2 หลายเดือนก่อน

    The Bloke template is broken. Any suggestion?

  • @dezmond8416
    @dezmond8416 11 หลายเดือนก่อน +1

    The Dolphin model is even better!

  • @petar55555
    @petar55555 11 หลายเดือนก่อน

    Why is a fail the breaking into a car rejection?, that's what is supposed to do right?

  • @Bruno_Haible
    @Bruno_Haible 11 หลายเดือนก่อน +1

    Please ask it: "Streets have crossings. Why don't rivers have crossings?" I would like to read the answer 😂

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      What would you expect the answer to be?

    • @Bruno_Haible
      @Bruno_Haible 11 หลายเดือนก่อน

      @@matthew_berman I would expect an answer like "Water flows down an existing outlet. Except in situations of a flood, it won't create a second outlet." rather than ChatGPT nonsense like "due to the challenges posed by their dynamic and natural characteristics, lack of infrastructure, environmental considerations, and safety concerns".

    • @JensGulin
      @JensGulin 11 หลายเดือนก่อน

      ​@@Bruno_Haibletry google "Why is a river crossing called a ford?"

  • @4.0.4
    @4.0.4 11 หลายเดือนก่อน

    I really hope they train bigger models. We need someone to fill the vacuum left by lack of LLaMA 2 30b. Something large enough to fill up a 3090/4090.

    • @mirek190
      @mirek190 11 หลายเดือนก่อน

      it is llma2 34b ....

  • @endresbielefeldt2050
    @endresbielefeldt2050 11 หลายเดือนก่อน

    For the Ball and Cup Question: try "a ball is dropped into a cup" instead of put. Maybe clarify also that the ball is not stuck and see what happens!

  • @mvasa2582
    @mvasa2582 11 หลายเดือนก่อน +1

    Matt where can i access benchmark comparisons?

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      There’s a link to my Notion document in the description

  • @user-qr4jf4tv2x
    @user-qr4jf4tv2x 11 หลายเดือนก่อน +1

    small models will eventually dominate

  • @harry54ck
    @harry54ck 7 หลายเดือนก่อน

    The model is still not running locally and requires paid environment to run. So what's the reason of going this way?

  • @almahmeed
    @almahmeed 11 หลายเดือนก่อน

    Wonderful. It was up and running in few minutes!

  • @timtim8011
    @timtim8011 11 หลายเดือนก่อน

    Regarding the meal plan prompt, your prompt is vague. Technically, the model did it correctly, as you ask for "a meal plan" which can reasonably interpreted as "one meal plan" which is what it gave. If you change the prompt to "give me meal plans for all my meals for today" it will be clearer. :)

  • @Shinkaze33
    @Shinkaze33 11 หลายเดือนก่อน

    January = GPT 3 LLM Inference REQUIRES A DATACENTER
    August = CODELLAMA Runs localy on a GAMING LAPTOP
    October = Mistral 7B on your PHONE
    WOW WHAT a Year!!!!!!

    • @merlinwarage
      @merlinwarage 11 หลายเดือนก่อน

      "GPT 3 LLM Inference REQUIRES A DATACENTER"
      Because of the 100 million concurrent users. GPT could run on a single machine for a single user. Besides, you could run any 3-5-7GB model on your phone with the right hardware.

  • @IonSabatico
    @IonSabatico 11 หลายเดือนก่อน

    Hey, cool comparison! Question: would it make sense to add text sentiment/tone detection? BERD models excelll, but LLMs are tailgsting them now

  • @user-gn7qj3by3s
    @user-gn7qj3by3s 11 หลายเดือนก่อน

    I have that MLC app for iOS, does it work on that? I for one would love to see a video of you getting this on your phone. Great videos! Much love and support.

  • @DasJev
    @DasJev 11 หลายเดือนก่อน +1

    gptq models are 4bit, the quantisation is to high to produce any useful results with that, never use 4 bit for evaluation of a model, use 4 bit only to evaluate a model in the worst case scenario of having low ram

  • @thomassynths
    @thomassynths 11 หลายเดือนก่อน

    Perhaps "a metal ball is placed in a larger cup"?

  • @Derick99
    @Derick99 9 หลายเดือนก่อน

    Probably a dumb question but since theres so many different ones is there like a recap or top list that gets updated as you go so we know which top models and platforms to try out? As i end up finding stuff in your old videos i like bit its outdated and not as efficient in some as the newer stuff but it was probably way ahead at the time. Maybe a list that gets updated all the time or weekly summary cap or something idk

  • @DeianGiNet
    @DeianGiNet 11 หลายเดือนก่อน

    9:25 you asked him to give you a plan FOR TODAY :) so it did it correct

  • @jonmichaelgalindo
    @jonmichaelgalindo 11 หลายเดือนก่อน +1

    Won't load on my local textui install. Have to troubleshoot later.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      What error are you getting?

    • @jonmichaelgalindo
      @jonmichaelgalindo 11 หลายเดือนก่อน

      @@matthew_berman Deleted my reply for linking to github. I hate YT comments.

  • @tht1guyfromtht1place70
    @tht1guyfromtht1place70 8 หลายเดือนก่อน

    the mistral team is legit smart people

  • @eccdakaj5478
    @eccdakaj5478 10 หลายเดือนก่อน

    Hi! I watch almost all of your videos. I have a question: What are the hardware requirements (CPU and GPU) for running Mistral 7B locally?

  • @Dormouse-tv7yh
    @Dormouse-tv7yh 11 หลายเดือนก่อน +1

    Tried this using the "mistral-7b-instruct-v0.1.Q5_K_M.gguf" model. That version of the model got the killers question right, but it got the ball question wrong, until I changed "picks up" to "lifts up":
    A ball is put into a normal cup and the cup is placed upside down on a table. Someone then lifts up the cup and puts it inside the microwave. Where is the ball now? Explain your reasoning step by step.
    The ball was originally in a normal cup. But then the cup was turned upside down.
    1. When the cup is turned upside down, gravity causes the ball to fall out of the cup.
    2. So, after the cup is turned upside down, the ball would be on the table, not in the cup.
    3. Then, someone lifts the cup (which is still upside down) and puts it inside the microwave. But since the ball is already out of the cup, this action doesn't affect the ball's location.
    Therefore, the ball is on the table, not in the cup or the microwave.

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน

      Thanks for sharing!

  • @haneke86
    @haneke86 11 หลายเดือนก่อน +1

    I know it’s stupid question but just to be sure: when you run this like on the video, it is a closed system so you can share company data and keep it confidential?

    • @matthew_berman
      @matthew_berman  11 หลายเดือนก่อน +1

      If you’re running the model locally meaning on your computer yes it is a closed system. If you’re running it on run pod like I did then technically, the data is going through run pod

  • @alexlindgren1
    @alexlindgren1 9 หลายเดือนก่อน

    If I would want to deploy this as an API on runpod, how todo it?

  • @SzaboB33
    @SzaboB33 11 หลายเดือนก่อน

    Accepting that reasoning for the drying problem feels a bit off. Serializing is a highly specialized case and we all know that it should not be taken as default as the LLMs do. If they do, it's a fail. Maybe you should change that question to something more "If one woman gives birth to 1 child in 9 months, how many women do I need to get a child in 1 month?". There is no way this could be misunderstood.
    I actually asked this question for you if you are interested using ollama:
    ollama run mistral
    >>> If one woman gives birth to a child in 9 months (270 days for easier calculation), how many women do I need to get a child in only 3 months (90 days for easier calculation)? Explain your reasoning step by step!
    First, let's understand that the gestation period for a human is approximately 9
    months or 270 days. This means that during this time, an egg is fertilized, grows
    into an embryo, develops into a fetus, and finally matures into a baby.
    Now, if we want to know how many women need to give birth in only 3 months (90
    days), we need to figure out what fraction of the gestation period this represents.
    To do this, divide the number of days in three months (90) by the total number of
    days in a gestation period (270):
    90 / 270 = 0.333...
    This means that one-third of the gestation period has passed. Therefore, we need
    three times as many women to account for the same amount of time.
    So, if one woman gives birth in 9 months (270 days), then we would need:
    3 * 1 = 3 women
    Therefore, you would need 3 women to give birth to a child in only 3 months.

  • @DanielTheDev
    @DanielTheDev 11 หลายเดือนก่อน

    For the cup issue, you could specifically say that the cup has no lid.

  • @Xiatle
    @Xiatle 11 หลายเดือนก่อน +1

    What are the differences between GPTQ, AWQ, GGUF versions? Which one will run better on my 12gb ram phone?

    • @australai
      @australai 11 หลายเดือนก่อน

      This is a small, fast model. Probably best to use GGUF q8_0 to get the best quality. GPTQ and AWQ will be too quantized, compromising performance.

  • @chrisBruner
    @chrisBruner 11 หลายเดือนก่อน

    I wonder in the cup/ball question if you replaced the word "cup" with "mug" then the sides of the cup could not be compressed when lifting it, might reduce the reasons that the ball would be moved with the cup.

  • @leopoldomolina1763
    @leopoldomolina1763 11 หลายเดือนก่อน

    Maybe I missed something in your previous videos, but how to dry 20 shirts will not take the same 4 hours under the sun?! And you passed it as correct.

  • @SABaruj
    @SABaruj 11 หลายเดือนก่อน

    "There is a ball on a table. Someone comes in and hides it from view with an upside-down cup. Remember, the cup is not attached to the ball. Now, what would happen or where would the ball be if another person comes in, takes the cup, and puts it inside a microwave?" there you go, you have to be precise for a AI model to understand you.

  • @ilmemaab3171
    @ilmemaab3171 10 หลายเดือนก่อน

    how can i use Mistral 7B with OpenLLm?

  • @normanlove222
    @normanlove222 11 หลายเดือนก่อน +1

    if this is small and can work on most desktops, why not just show how to install on our desktop? Why use online servers. I dont get it.