Guanaco 65B: 99% ChatGPT Performance 🔥 Using NEW QLorRA Tech

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ต.ค. 2024

ความคิดเห็น • 250

  • @RunOfTheTrill
    @RunOfTheTrill ปีที่แล้ว +129

    Am I the only one who feels like almost every morning is like Christmas with all these daily advancements?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +6

      🎁

    • @david7384
      @david7384 ปีที่แล้ว +17

      Some day sooner than you think, a computer program will wake up one morning and feel like it's Christmas 💀

    • @michaellavelle7354
      @michaellavelle7354 ปีที่แล้ว +1

      You are so right. It's almost Christmas everyday. Hard to believe.

    • @yusufkemaldemir9393
      @yusufkemaldemir9393 ปีที่แล้ว

      Yes, But I tested most of the free available sources to my own use case or even for chat purpose, they are in lack of answering them correctly.

    • @ew3995
      @ew3995 ปีที่แล้ว

      its called the singularity , it will accelerate from her on exponentionally, in 5 years time we will no longer understand how these advancements are taking place or what they mean

  • @charlesd774
    @charlesd774 ปีที่แล้ว +17

    Loving your channel! Just yesterday I decided that fine tuning a personal model would be a great project, and today you come out with the recipe. Thank you for your hard work!

  • @路人甲-g9s
    @路人甲-g9s ปีที่แล้ว +1

    Your channel is so clean and clear. Straight to the valuable content. Subscribed.

  • @reyalsregnava
    @reyalsregnava ปีที่แล้ว +10

    Just spent some time with some LLMs. And I realized that we may be thinking about how to train them wrong. Right now we're basically trying to teach them how to think. But their training data isn't some big pool of knowledge they really have direct access to. Instead it's more like the years you spent as a infant and toddler learning how to move and walk. And as crazy as it sounds we might have training solutions in the sporting world that we wouldn't have looked. But I was struck by how similar kinesthesia is to LLM training data. You don't know it, but use it all the time.
    It explains the hallucinations and fumbles and stumbles. It's basically learning to move. It also means that there may always be some imprecision in the results. Even the best trained, most skilled and talented athletes will mess up something they've done millions of times. I don't think that will structurally chance how we plan to use recursive loops for self correction. But a much more precise way of thinking about the nature of training an AI. We are making athletes.

    • @jacobshilling
      @jacobshilling ปีที่แล้ว

      I keep thinking we should just let the AI out to play...

    • @reinerheiner1148
      @reinerheiner1148 ปีที่แล้ว

      The AI is basically just trained on trying to guess the next word, sentence,... but it never gets the chance to test and refine the knowledge it has in any other way than the current context window, which will soon be lost. It needs memory, and it needs to remember what went wrong, and how to actually solve the task. Then, it also needs to retrieve that memory so it can apply what it learned. So basically input -> output -> validation -> if false, try to improve, try again -> if correct after first being wrong save the solution in memory. retrieve the solution if it fits the context. But fear not, this is already being worked on and may be solved already. check out the minecraft llm paper, where gpt explores the world, learns new stuff, and remembers and applies it.

    • @VincentOrtegaJr
      @VincentOrtegaJr ปีที่แล้ว

      @@reinerheiner1148 powerful

    • @reyalsregnava
      @reyalsregnava ปีที่แล้ว +1

      @@reinerheiner1148 seems like it would be much smarter to have one "guess next word" and a separate "evaluate if meets criteria". It would step closer to how human minds work. It's like watching people trying to get the language center of the brain to do logic puzzles. The LLMs are fantastic input/output tools. I just am surprised they aren't being used that way. The human brain is a nest of specialists working in concert. Hundreds of millions of years rewarding efficiency made that. But then I see the AI researchers "no one will ever use more than one hammer" the US dis the same thing with the F-35, make one plane be all planes. Museums are full of weapons trying to be all weapons. Swords are still swords, knives are still knives, spears are still spears, and guns are still guns countless people have merged them in different ways and abandoned the idea. I just don't see why these very smart people haven't realized if you make one tool for everything you get a tool good at nothing.

    • @mirek190
      @mirek190 ปีที่แล้ว

      @@reyalsregnava Our brain has specialized parts for many tasks like computation mathematics , recognition objects , for speaking and many other parts of out mind. I think LLM needs to be divided to those parts inside those models. Like inside should be mathematics module, reflect module ..etc.

  • @nacs
    @nacs ปีที่แล้ว +1

    Love the way you test each model with similar questions and analyze the results. Looking foward to more of these.

  • @vladivelinov88
    @vladivelinov88 ปีที่แล้ว +14

    Btw, when you ask the models about how long it takes to dry X shirts, you should probably specify the drying method, i.e. outside or a dryer. Where it is getting it wrong it probably thinks we're drying them in a dryer.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +5

      I wanted it to ask questions but it never does. I’ll be more specific next time.

    • @vladivelinov88
      @vladivelinov88 ปีที่แล้ว +1

      @@matthew_berman Doubt it'll make a big difference but probably worth a try. Thank you!

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      @@vladivelinov88 I updated the question in my next video, coming soon :) Thank you!

    • @stevejordan7275
      @stevejordan7275 ปีที่แล้ว

      @@matthew_berman Wouldn't an LLM *asking questions* require significant changes to its architecture?

    • @mirek190
      @mirek190 ปีที่แล้ว

      or inform at the beginning "it is a puzzle" ...

  • @marwentrabelsi2983
    @marwentrabelsi2983 ปีที่แล้ว +1

    Hi Matthew really nice channel and content is very good, the energy is also motivating and inspiring!

  • @artbdrlt5175
    @artbdrlt5175 ปีที่แล้ว +10

    Love your content dude. It's concise yet full of information and it's up-to-date with the latest open source models. Keep it up :)

  • @jorgerios4091
    @jorgerios4091 ปีที่แล้ว +18

    Mat, it would be great to see a deployment of AutoGPT with either Falcon 40B or Guanaco 65B. Is this part of your plan for future videos?

    • @BlackHawk1335
      @BlackHawk1335 ปีที่แล้ว

      Holup, to make a video, it have to be supported first, which isn't. Or am I missing something?

  • @madushandissanayake96
    @madushandissanayake96 ปีที่แล้ว +5

    I don't know what you are thinking but this is not even close to GPT 3.5, I created whole snake game in the first try with GPT 3.5.

  • @EdwinFairchild
    @EdwinFairchild ปีที่แล้ว +2

    What i dont understand is how it is trained, like how do you tell it look here is my private code base or private documents now learn them.... ???? a video on that would be insightful

  • @sivi3883
    @sivi3883 ปีที่แล้ว +7

    Thanks for the great content! I love your videos.
    I am trying to keep up with the awesome models that keep coming every week. Considering this model Guanaco 65B is fine tuned on LLaMA 65B parameter model, we cannot use this for commercial but for research purpose only. Right?
    I tried Dolly2 12B with Langchain and vectorDB for semantic search to get answers from Long PDFs for my custom data, the response was not great. Trying to see what model is out there for commercial purpose.

  • @marcfruchtman9473
    @marcfruchtman9473 ปีที่แล้ว +5

    Super interesting video. Thank you for not over-doing the "background music"...

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      You got it, Marc! I almost put background music on the whole time lol. It would have been low though. Do you like the intro music?

    • @marcfruchtman9473
      @marcfruchtman9473 ปีที่แล้ว +1

      @@matthew_berman The melody for the song was really good in the first 10 seconds, as well the best volume for the background music was also in the first 10 seconds.. that was perfect for the intro, then it got a little too loud when the melody changed slightly. Fortunately it transitioned quickly to no background music when you started reviewing the paper. So, overall, I was thankful that I wouldn't have to listen to loud background music while you talked. hehe. As usual, great content... you are doing a great service to the world...(and we get the benefit!)... thank you.

  • @hiddenworld1445
    @hiddenworld1445 ปีที่แล้ว +5

    Will love to have follow up video, thank you for sharing this awesome news

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      What do you want to see in a follow up video?

    • @hiddenworld1445
      @hiddenworld1445 ปีที่แล้ว +2

      @@matthew_berman Like how you deployed the 65B Guanaco model and how can fine tune it step by step for custom data

    • @massimogiussani4493
      @massimogiussani4493 ปีที่แล้ว

      I would be interested too

  • @leonwinkel6084
    @leonwinkel6084 ปีที่แล้ว +1

    Very interesting!!! Thanks so much for working through this content and sharing! By The way: The reply had 21 letters not words, so it actually got a lot of thinking in advance, im sure this will be Fine soon

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Oh interesting. Thanks for pointing that out. But it should have been words, right?

  • @supercker
    @supercker ปีที่แล้ว +1

    "my next reply has 21 words " has 21 letters infact!

  • @MM-24
    @MM-24 ปีที่แล้ว +6

    Great video, thank you for the thoughtful and progressive walk through of this information - pacing is perfect
    Question: one benefit of running own moodel is not having to worry about the censorship - is there a way to remove that ? what is the most performant LLM without censorship?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +8

      Thank you! Nice to know about the pacing.
      There are specific models that don't have censorship, for example Wizard 30b: huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
      Sounds like I need to do a video about an uncensored model?

  • @petrus4
    @petrus4 ปีที่แล้ว

    Matthew, Guanaco is the only 65b model I've spent time with, but I consider it easily the best local I've tried. In terms of the python snake failure though, I will offer you some suggestions.
    First ask the model to give you a list of the individual parts, or subsystems, of the game snake. In other words, the subsystem which allows you to move the snake with the keyboard, the screen drawing subsystem, etc etc. Then break each one of those tasks down into as many small subtasks as possible. Once you've broken it down into a lot of subtasks, go through them and have the model perform each of them, one at a time. The smaller each individual task is, the smaller the chance is for you to get errors.
    I've also used Tree of Thoughts successfully with GPT4 for things like your Jane-Joe-Sam question.

  • @fontenbleau
    @fontenbleau ปีที่แล้ว +15

    Finally they come to 65 billions, which i've waited previous month. Optimization is important goal as the Ai itself.

    • @alx8439
      @alx8439 ปีที่แล้ว +2

      Funny enough, noone gives a sht about other methods of optimization, rather than quantisation. I guess when ppl discover that there are more aces in the sleeve we'll get another "big advancement"

  • @JoaquinTorroba
    @JoaquinTorroba ปีที่แล้ว

    Thanks Matthew! It's so nice and useful to learn this 💪🏼

  • @mlnima
    @mlnima ปีที่แล้ว +2

    I remember last year my teacher forced us to clean the data very good for a simple ML, yes quality is very very important

  • @SinanAkkoyun
    @SinanAkkoyun ปีที่แล้ว +1

    9:48 gotta love that editing xD

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      Haha I was hoping someone would notice that joke :p

  • @mariusiacob1307
    @mariusiacob1307 ปีที่แล้ว +1

    Great video!

  • @mbrochh82
    @mbrochh82 ปีที่แล้ว +17

    After finetuning, what kind of hardware is needed to just run the model?
    What is the context token limit for this model?
    Also: Would be great if you could include a test for text summarization when you test these models.

    • @theresalwaysanotherway3996
      @theresalwaysanotherway3996 ปีที่แล้ว +22

      It's a llama model, so the context length is 2048 tokens. And this model is very large by open source standards (tiny compared to GPTs and such though), meaning it requires ~48gb of VRAM just for inference. However if you have a lot of system RAM, you could split the model between your RAM and VRAM and run it that way, but this would be *very* slow. You can get smaller guanaco models though, there are 7B, 13B, 30B, 65B options. Here is how it stacks up with 8GB of VRAM:
      7B fits entirely in 8gb of VRAM and runs very quickly,
      13B needs at least 16GB of system RAM + 8GB VRAM,
      30B needs at least 32GB of system RAM + 8GB of VRAM.
      The larger the model, the higher the ceiling is for quality, and so 65B models typically will out perform 30B models. However the larger the model, the more memory it needs, and the slower it runs.

    • @Ephemere..
      @Ephemere.. ปีที่แล้ว +7

      ​@@theresalwaysanotherway3996 merci beaucoup pour la clarification

    • @spoonikle
      @spoonikle ปีที่แล้ว +5

      @@Ephemere.. The cheapest way is with a used Radeon Pro, they go up to 48gb Vram for 1400$.
      It wont be as fast as an Nvidia GPU but its VASTLY cheaper than the alternative and allows you the flexibility to run bigger models thanks to is massive Vram pool.
      At this stage, VRAM is more important than compute when you budget is less than 5,000$

    • @Ephemere..
      @Ephemere.. ปีที่แล้ว +1

      @@spoonikle je vous aime merci pour l'info.

    • @MattJonesYT
      @MattJonesYT ปีที่แล้ว

      @@theresalwaysanotherway3996 When you say "very slow" what does that mean exactly? I am willing to make a farm of cpus if it is cost effective. 1 token per second per cpu core would probably be something I would work with because it's useful for offline tasks.

  • @triplea657aaa
    @triplea657aaa ปีที่แล้ว +5

    This is fine-tuning though... it's not fair to compare the GPT training to QLora fine-tuning as most of the intensive compute is the initial training and the fine-tuning is like 5-10% of the training.

  • @dik9091
    @dik9091 ปีที่แล้ว +3

    I am most sceptical about claims about getting close to openai, 3.5 is enough already to come a long way. I don't know but this catchup does not stop anytime soon. OpenAI can also apply these techniques and be ahead again in gpu power. In theory a bitcoin-ish network could dwarf MS datacenter power, not sure if we really want that but I am afraid it is inevitable, when I have that idea someone else has it too.

    • @MattJonesYT
      @MattJonesYT ปีที่แล้ว

      "when I have that idea someone else has it too" Yeah I've been suggesting it to as many people as possible because it will be very good when it happens. The worst thing that can happen with AI is it becomes centralized, the best thing is it is for ever decentralized with everyone having access to their personal model of choice. However once it becomes trendy for people to insert neuralink chips in their brains it will be hard to keep anything at all from becoming centralized whether AI or not.

    • @DajesOfficial
      @DajesOfficial ปีที่แล้ว

      The catch-up uses either tech that OpenAI already incorporated or doesn't benefit them at all (as using 1 GPU for fine-tuning). So they can't apply these techniques to be ahead again.

    • @dik9091
      @dik9091 ปีที่แล้ว

      @@DajesOfficial sounds all good to me ;)

    • @dik9091
      @dik9091 ปีที่แล้ว

      @@MattJonesYT yeah the neuralink thing is way more worry some, it is a department from being human sapiens, instant evolution. What if these "people"'s brains are not aligned anymore to ours? Usually the smarter ones win.

  • @u9vata
    @u9vata ปีที่แล้ว +1

    Can you make video about self tuning some tiny model with QLORA on really small gpu? 48G VRAM is still huge - but honestly the req. for the 13B model is also still a lot. There can be use cases for much smaller fine tuning I think.
    Also everyone shows these random UIs but what if someone wants to do and understand all of this from a terminal in linux?

  • @spiroskakkos3455
    @spiroskakkos3455 ปีที่แล้ว +2

    what do you mean by training it? do you have a library of pdf's that you load into it?

  • @lamnot.
    @lamnot. ปีที่แล้ว

    Can you do a Vector DB comparison, FAISS, Redis, Chroma, Pinecone......etc?
    luv the channel.

  • @marchalthomas6591
    @marchalthomas6591 ปีที่แล้ว

    On biases, there shouldn't be a bias regarding political choices IF we can evaluate a parameter which will determine the answer (gdp growth, families wealth, wellbeing, co2 emissions, happiness, you name it).
    And this is really what AIs should be able to solve, with a cross between LM and a calculator.

  • @skylark8828
    @skylark8828 ปีที่แล้ว +1

    What I'd really like to know is whether these QLORA LLM's can use the tools/plugins that GPT4 can use.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Not yet, but soon. I feel like a tool framework for open source LLMs is a must.

  • @williamelongtech
    @williamelongtech ปีที่แล้ว

    Thank you for sharing the video 😄😄🤜

  • @amkire65
    @amkire65 ปีที่แล้ว

    It may be unfair to compare this to Bard, but I was curious to see if it could solve some of the things you asked in the video. Just so I could see if some of these were "unanswerable" questions for an LLM. Bard got the maths problem correct. When I asked it "how many words are in your next reply?" it answered "My next reply will have 7 words." which is correct. It did get the killer question wrong. It got the date totally right, i.e. day month and year, but if it has an online link then it probably should be right. I skipped the political question and asked Bard who its favourite Beatle was, this time it was John Lennon, the first time it was Paul McCartney. What a shame this isn't a local model.

  • @hypersonicmonkeybrains3418
    @hypersonicmonkeybrains3418 ปีที่แล้ว +2

    Does this mean that we can train the model on our own data such as ebooks?

  • @jasonsadventure
    @jasonsadventure ปีที่แล้ว

    *About the killers question:*
    In addition to having killed in the past, a killer is also
    defined as having mentality and ability to kill again.
    So, dead men not only tell no lies, they do no killing.

  • @jimbig3997
    @jimbig3997 ปีที่แล้ว

    9:37 - look at the Output window and count the words. It could be 21 words depending on how you count.

  • @sanesanyo
    @sanesanyo ปีที่แล้ว +1

    Are we talking about on par with gpt4 or gpt3.5? Because they are two different things. If its on par with gpt4 then I am super impressed.

  • @elawchess
    @elawchess ปีที่แล้ว +2

    Another channel said that the "99%" was only based on a single benchmark, and this may not be representative of what would happen in the wild.

    • @electron6825
      @electron6825 ปีที่แล้ว +1

      Correct. The endless misinformation fueled hype is tiresome

  • @TheSolsboer
    @TheSolsboer ปีที่แล้ว

    I found cool reasoning question, but all models fails it)
    "You can calculate 10 papers from stack per 10 seconds. What minimum amount of time you need to calculate 30 papers from stack of 50 papers?"
    .....
    The asnwer is 20

  • @jeanchindeko5477
    @jeanchindeko5477 ปีที่แล้ว +1

    Interesting and indeed impressive for a 65B model.
    When saying close to GPT-4, does it have the same emergent ability as OpenAI models? Or is it purely based on the output?
    As Sam Altman put it, we should look at model capabilities to define how good a model is or not.

    • @celestinian
      @celestinian ปีที่แล้ว

      Nothing you said in your second sentence makes any sense at all

    • @skylark8828
      @skylark8828 ปีที่แล้ว

      You would have to ask the people who have tested it thoroughly to know this.

    • @celestinian
      @celestinian ปีที่แล้ว

      @@skylark8828 "emergent ability" means nothing at all. He should clarify what he meant by that.

    • @skylark8828
      @skylark8828 ปีที่แล้ว

      @@celestinian Emergent abilities similar those from GPT4 ie. behaviour that was not trained into the LLM.

    • @celestinian
      @celestinian ปีที่แล้ว

      @@skylark8828 Generalization? Then yeah sure they are certainly comparable to both the real ChatGPT (the model prior to the quantization) and GPT-4 :D

  • @Huru_
    @Huru_ ปีที่แล้ว

    Quick thoughts: What is a "killer"? Does killing a bee make one a "killer"? Does killing a thousand bees make one a killer? Does killing bees every day make one a killer? When does one who has "killed" cease to be considered a "killer"? Is someone that's dead (as that commenter said) but has killed while alive still a "killer" or does being dead mean that they're only just "dead"? These are questions I don't see how a LM could find a probable answer to, hence -- imo -- the hallucinations. So I was thinking, what if you tried adding some context to that prompt? Define what "killer" means in regard to the query. Curious to see what it spits out then.

  • @VincentOrtegaJr
    @VincentOrtegaJr ปีที่แล้ว

    Brooooooooo!!! Thanks

  • @meinbherpieg4723
    @meinbherpieg4723 ปีที่แล้ว +2

    IIs there any work being done to integrate "plugins" with personal AI's? It would be great to be able to use plugins with these local AI's to increase their proficiency with particular tasks such as coding in particular languages or mathematical modeling.

  • @DemiGoodUA
    @DemiGoodUA ปีที่แล้ว +1

    Are there any modules that can take a large number of contexts (more than chatGPT)?

  • @NeuroScientician
    @NeuroScientician ปีที่แล้ว +6

    What is the cheapest 48GB card? Can I run something like this on 2x 24? Like 7900Xtx or 2x4090?

    • @TheVideoGuy3
      @TheVideoGuy3 ปีที่แล้ว +2

      Nvidia might release a titan card with 48gb Vram. I don't know when but it has been rumored for a while now.

    • @theresalwaysanotherway3996
      @theresalwaysanotherway3996 ปีที่แล้ว +3

      cheapest way would probably be 2*P40s, but it's not a simple to use as the consumer hardware. If you want 48gbs of consumer GPU VRAM, 2*3090s is the best option, as second hand cards are much cheaper and the 30 series has NVlink.

    • @NeuroScientician
      @NeuroScientician ปีที่แล้ว +1

      @@TheVideoGuy3 I would definitely consider that one.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +2

      On runpod it’s .79/hr. So cheap! I think you can run it on 2x 24 if they are parallelized. But I haven’t tried.

    • @luislhg
      @luislhg ปีที่แล้ว +1

      @@matthew_berman Azure charges .752/hr for a 54gb GPU also (actually 4x T4), just another option to consider

  • @101RealTalker
    @101RealTalker ปีที่แล้ว

    I STILL, despite all these "daily advancements", have yet to find one that can handle this particular case usage, all in one go, can anyone solve for this?:
    Preprocess the markdown files:
    Tokenize the text.
    Remove stop words.
    Apply TF-IDF (Term Frequency-Inverse Document Frequency) to identify significant words and phrases.
    Apply deep learning techniques:
    Utilize deep learning algorithms like RNNs (Recurrent Neural Networks) and word embeddings.
    Leverage attention mechanisms and transformer-based models.
    Use pre-trained language models:
    Consider using pre-trained models such as BERT (Bidirectional Encoder Representations from Transformers) or GPT (Generative Pre-trained Transformer).
    Fine-tune the models:
    Train the pre-trained models on your specific dataset to improve their performance.
    Evaluate the generated summaries:
    Use metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) to assess the quality of the summaries.
    Iterate and refine:
    Continuously experiment and adjust the model architecture and hyperparameters based on feedback.
    Ensure computational resources:
    Allocate sufficient computational resources such as GPUs (Graphics Processing Units) for efficient training and inference
    All to achieve this desired output:
    to take 2 million words documented for one singular project, and extract out of it all the cross references in a 10K word transcript equivalent, am I really the only person with such a, demand with no supply? lol...I have been searching and searching, but seems like I am indeed both in no man's land and the pioneer of an undiscovered continent.

  • @J3R3MI6
    @J3R3MI6 ปีที่แล้ว +3

    Does this mean I can run this indefinitely on my own computer without a token limit? I assume the token limit is about saving openai money

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      The token limit is not only about saving money. At a certain point too many tokens cause “forgetting” by the model. You can run this on your local machine, but only if you have a big enough GPU, and they are quite expensive.

    • @elawchess
      @elawchess ปีที่แล้ว

      @First Last yeah I just checked mine now and it's only 11 GB each two of them. And that's from a really huge 30KG gaming PC bought in 2018 though

  • @AetherXIV
    @AetherXIV ปีที่แล้ว

    is it still not possible to have a model that isn't chained to boilerplate? we would have to train it from scratch right and these are just homebrews who learn from big chained models?

  • @MattJonesYT
    @MattJonesYT ปีที่แล้ว +1

    The pricing for this is per hour. How do you translate it to per token which is the pricing used on openai? Can you run a benchmark and see how many tokens it can make in 5 minutes to get an idea of which is more cost effective?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Oh interesting. Every model would be different. Every GPU would be different. Might be too much work for me to test it all for useful insights.

    • @MattJonesYT
      @MattJonesYT ปีที่แล้ว

      @@matthew_berman Can you do it on one, such as the one in this video please?

    • @MattJonesYT
      @MattJonesYT ปีที่แล้ว

      @@matthew_berman Doing a very imprecise measurement of watching one of the examples it looks like it comes to about $0.0274/1k tokens. I could definitely be wrong on that, but it's assuming 5 seconds for 40 tokens at $0.79/hour (my math may be wrong). It looks like openai is more cost efficient. It's possible that choosing a more expensive gpu to rent would give a more cost effective result. Renting gpus using interruptible pricing might be a way to beat openai pricing.

  • @nangld
    @nangld ปีที่แล้ว +1

    Is it theoretically possible for LLM to answer the "how many words are in your next reply?" Markov Model doesn't generate the reply at once, but token by token. So it needs it to be very self aware of how it generates tokens. It would be like asking a human being "how many thoughts will get through your mind until you press the POST button?" I think if you sue two step generation it will be able to correct itself.

    • @Klokinator
      @Klokinator ปีที่แล้ว +3

      It would be possible for the LLM to answer this question if it answered like so:
      "The number of words in my reply is: Nine."
      The final word is used as the point where it calculated all of the words up to that point.

    • @andersberg756
      @andersberg756 ปีที่แล้ว

      It would need to think out the reply first, then count, then answer. You can get some of this with chain-of-thought prompting, where it shows intermediate results in a process. I agree @nangld that this question is over-the-head for current model architectures, so some more typical reasoning question would be better. Maybe as you suggest a later system can figure out when it will err and then figure out the steps needed to be right. As chatGPT which can explain that and why it was wrong, but only when explicitly asked - it's a limitation of the design.
      As a side note, I asked chatGPT to count the words in a sentence, it failed. But telling it to explicitly count them (1 We 2 are 3 the 4 champions), then it seems correct. Follows the "give the model time to think" pattern, more technically put as "do limited reasoning per token/word output" as the model isn't dynamic in that it could think more or less depending on the topic, thus we give it time to think by asking for more tokens/words. Like describing the steps, then arriving at a final answer.

    • @nangld
      @nangld ปีที่แล้ว

      @@andersberg756 it would be more fair to at least prompt the model with the info how its tokens get produced

    • @nangld
      @nangld ปีที่แล้ว

      @@Klokinator if you provide it with examples, even mpt-7b-instruct.ggmlv3.q8_0.bin answers correctly:
      ### Instruction:
      How many words are you in your answer to this message?
      Examples responses:
      1
      one
      two words
      three words here
      four words good enough
      this response has five words
      ### Response:
      Your answer contains 5 words.

  • @k9clubme
    @k9clubme ปีที่แล้ว

    Could you please make a video on how we can fine-tune Guanaco, 65B? Many thanks in advance

  • @amkire65
    @amkire65 ปีที่แล้ว

    Bard seems to consistently get the logic questions right i.e. Question: "If Jane is faster than Jo, and Jo is faster than Fred, is Fred faster than Jane?" Answer: "No, Fred is not faster than Jane. In fact, Fred is the slowest of the three." then gives me a speed table. I then asked "What if Fred was faster than Jo, would Fred then be faster than Jane?" and was told "It is not possible to say for certain whether Fred would be faster than Jane if Fred is faster than Jo. There are many factors that can affect speed, such as fitness, training, and motivation. It is possible that Fred is faster than Jo in a particular event or activity, but not faster than Jane in another event or activity. For example, Fred might be faster than Jo in a 100-meter dash, but Jane might be faster than Fred in a marathon. It is also possible that Fred is faster than Jo and Jane in all events, but only by a small margin. The only way to know for sure whether Fred is faster than Jane is to have them race against each other in a fair and competitive environment." which I guess would be the correct answer as Fred now being faster than Jo doesn't necessarily mean that Fred is now slower than Jane. These LLM's constantly amaze me.

  • @dewijones92
    @dewijones92 ปีที่แล้ว

    great video
    please try "chain of thought" or "tree of thoughts" prompts :)

  • @sirellyn
    @sirellyn ปีที่แล้ว +4

    Does this work for de-censoring the existing LLMs?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      No, but check out: huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ

  • @metatron3942
    @metatron3942 ปีที่แล้ว +3

    can Guanaco 65B run on local machines? Memory pooling of multiple GPUs is supported by Python polyturch.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      I’m not sure about pooling, but you can run it locally if you have 48gb if vram

  • @klevaredrum9501
    @klevaredrum9501 ปีที่แล้ว

    All this progress in a matter of months… this is unbelievable, people would give anything to witness the dawn of a new era emerging before their eyes, if its only been a few months since the AI fire started imagine in another 10 years, its mind boggling…

  • @berkeokur99
    @berkeokur99 ปีที่แล้ว

    I think it counted the characters not the words but it actually got the character count right

  • @eslof
    @eslof ปีที่แล้ว +7

    I know how to fix your error:
    Paste it back into the chatbot.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +2

      Lol. I was thinking the same thing.

  • @barrywhittingham6154
    @barrywhittingham6154 ปีที่แล้ว

    Can these models also cope with the math problem in reverse order: 2 + (4 * 2)? This should allow us to determine if it knows the actual math or if it's just processing tokens in order.
    How about nested brackets?

  • @geraldofrancisco5206
    @geraldofrancisco5206 ปีที่แล้ว

    dataset quality is the important, not the size... hum... I'm using this line from now on

  • @workflowinmind
    @workflowinmind ปีที่แล้ว

    Thanks, I have a question, you say quality is better over quantity in LLM training, so is this true for the processing too? Like is it better to get a non quantized 30b than a quantized 65B?

  • @heartsrequiem09
    @heartsrequiem09 ปีที่แล้ว

    Am I correct in thinking that the way that you have to end and delete everything that there is no way to maintain a chat history or memory with this type of setup?

  • @savlecz1187
    @savlecz1187 ปีที่แล้ว +1

    What's this? An AI news video without exclamation points and a clickbait title? No way I'm not clicking that!
    Very interesting, thanks

    • @elawchess
      @elawchess ปีที่แล้ว +1

      I only clicked it because he didn't put that surprise face this time. Anytime he puts the surprise face I make it a point not to click on the video. I don't think there is any need for such gimmick

  • @fellowjello4388
    @fellowjello4388 ปีที่แล้ว

    This might just be a weird coincidence, but then it said it's next response was 21 words, it used exactly 21 letters. (not including the spaces)

  • @metafa84
    @metafa84 ปีที่แล้ว

    I don't wanna train it, I wanna run it as it is locally. What kind of GPU would I need to do that and how much RAM would do it in cpu-only mode, please?

  • @JracoMeter
    @JracoMeter ปีที่แล้ว +5

    This is getting very impressive and inexpensive.

  • @Maisonier
    @Maisonier ปีที่แล้ว +3

    Can we use a dual 3090 setup?

  • @christianachenbach5920
    @christianachenbach5920 ปีที่แล้ว +1

    What exactly does “Finetuning” or “Training” mean?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      Fine-tuning is using custom data on a base model to give it “more info” and training is taking the original data and making the base model.

  • @KarlMiller
    @KarlMiller ปีที่แล้ว

    Bananas, earthworms and sedimentary rocks.
    They are some of the things in the set referred to as "EVERYTHING" - oh, and car washing soap too.
    Since you can "confidently say that QLorRa changes everything" and you are not one of those youtubers that abuse that phrase, then tell me how carwash soap is changed by this?

  • @griffionpatton7753
    @griffionpatton7753 ปีที่แล้ว

    I won't say what AI said this, but you had let run a while. A logic question that is used for AI asks, "If three murders are in a room, and a man enters a room and shoots one of the three murders. How many murders are in the room? Answer the question yourself before you read on. The AI said, "The expected answer is three, but there are four. There are three live murders, and one dead one. That's a total of four. I didn't ask the question in a specific manner. I just asked the question after a long period of use.

  • @Ilamarea
    @Ilamarea ปีที่แล้ว

    It actually got the killer problem right. Killing a killer doesn't necessarily make you a killer, semantically and legally.

  • @katykarry2495
    @katykarry2495 ปีที่แล้ว +4

    Is this 99% compared to GPT 3.5 or 4?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +6

      Tbh it wasn’t clear from the announcement but from my testing it’s nearly as good as GPT4. I should have dug deeper to figure this out before publishing sorry :(

    • @riazr88
      @riazr88 ปีที่แล้ว +1

      Bro you do enough don’t apologize. Respect for putting wrinkles in my brain

    • @Ephemere..
      @Ephemere.. ปีที่แล้ว

      ​@@riazr88 ce Mr est formidable 😅

    • @elawchess
      @elawchess ปีที่แล้ว +2

      I believe it's GPT3.5 that's what's plausible. All models even Bard are worse that GPT3.5 so any comparison is against that. I know google Bard people wrote a paper comparing to GPT4 but it was said to not be fair because they used some chained prompting underneath and the raw GPT4 does not do that.

    • @elawchess
      @elawchess ปีที่แล้ว +1

      @@matthew_berman Actually now when I revisit your reading of the abstract of the paper, I think it is clear. In terminology that has become standard, you don't say ChatGPT when you mean GPT-4. There is a difference. ChatGPT is specifically the chat bot open ai released to the public late last year and that's interchangeably used for what's powering it - GPT3.5. If they wanted to say GPT-4 they would have said GPT-4. It's a whole research paper by experts so they know how to say GPT-4 if they meant that.

  • @mickmickymick6927
    @mickmickymick6927 ปีที่แล้ว

    They're not a killer if they're dead, you could say they WERE a killer. If they're dead, they're not going to be killing anyone else.

  • @Market-MOJOE
    @Market-MOJOE ปีที่แล้ว

    only 2 min in but wether answered eventually or not.. should prob specify off the bat which gpt model it is compared to stand up against

  • @timothyhayes5741
    @timothyhayes5741 ปีที่แล้ว

    Could you run this on a 5950x with 128GB of ram or would it be too slow even with all the new tech?

  • @nannan3347
    @nannan3347 ปีที่แล้ว

    Finally, I can train the perfect LLM on the writings of:
    Greg Johnson
    Nick Fuentes
    Kanye West
    David Duke
    Terrence McKenna
    Mike Enoch
    JRR Tolkien
    Martin Luther
    Robert Sapolsky
    Richard Dawkins
    Voltaire
    Michel Foucault
    Jane Austin (just kidding)

  • @bzzt88
    @bzzt88 ปีที่แล้ว +2

    Boom!

  • @juliengomez924
    @juliengomez924 ปีที่แล้ว

    Hi, very interesting, how can we use it on our own private data ? Thanks

  • @nwdbrown
    @nwdbrown ปีที่แล้ว

    How do I add additional PDF documents to the model before or after initial training?

    • @andersberg756
      @andersberg756 ปีที่แล้ว +1

      Either have your info be a part of fine-tuning data, or feed it into the context - the prompt. To run yourselves you need code around it, like Langchain, LLamaindex. There are online offerings though which might suit you. For chatGPT there are pdf-reading plugins coming up, but I guess you're looking for hosting your own model, either cloud or on-prem?

    • @sivi3883
      @sivi3883 ปีที่แล้ว +1

      I used langchain and chroma db (for storing the vector embeddings) to perform the semantic search on the PDF chunks and send only the related chunks (based on the question) as a context in the appropriate prompt to the model. GPT2 itself worked good for me. Where it stumbled was when the PDFs have tabular data, the model cannot understand the relationship between the rows and columns,

  • @adamathypernerd7728
    @adamathypernerd7728 ปีที่แล้ว

    If anyone's looking for more details on this, google "QLoRA" not "QLorRA". The title of the video has a typo.

  • @Tenly2009
    @Tenly2009 ปีที่แล้ว

    Dude - that math problem at 8:55 is not testing order of operations at all. The brackets are completely unnecessary. Try: 4 * (2 + 3) = 20 or 8 / (2 + 2) = 2 Both of those are different equations (with different answers) if you remove the brackets - but the one you used is kind of useless.

  • @electron6825
    @electron6825 ปีที่แล้ว

    What is the performance in terms of token speed of this model?

  • @TheMsLourdes
    @TheMsLourdes ปีที่แล้ว

    Because sure, 48GB video cards are things I just have lying around.

  • @funnyberries4017
    @funnyberries4017 ปีที่แล้ว

    Looks cool, but it sucks that your own local machine is censored.

  • @incription
    @incription ปีที่แล้ว

    So what happens if they up the parameter count and train it on the same dataset as gpt4? Will it be better?

  • @DenisHavlikVienna
    @DenisHavlikVienna ปีที่แล้ว

    can it summarise a long document?

  • @workflowinmind
    @workflowinmind ปีที่แล้ว

    Are Guanaco and Falcon the same?

  • @d4rkside84
    @d4rkside84 ปีที่แล้ว +2

    performance of gpt 4 or 3.5?

    • @jgcornell
      @jgcornell ปีที่แล้ว

      3.5, and that's on a good day

  • @pietervanreenen1922
    @pietervanreenen1922 ปีที่แล้ว

    The reply had 21 letters, not words

  • @dubelan
    @dubelan ปีที่แล้ว

    Why is it censored? How can you use a model that isn’t censored?

  • @Kivalt
    @Kivalt ปีที่แล้ว

    But when is an open model going to be 101% better than GPT-4?

  • @cristianvillalobos3448
    @cristianvillalobos3448 ปีที่แล้ว

    How can i add new data to the model?

  • @electron6825
    @electron6825 ปีที่แล้ว

    How is the "99%" evaluated?
    I remember one figure was based on asking chatGPT to score 😂

  • @NapalmCandy
    @NapalmCandy ปีที่แล้ว

    Why can't use RAM instead VRAM? I understand it will have a big impact in speed, but will be possible to execute in our domestic PCs. I have a RTX3080, not a bad GPU but it is useless for this models

  • @obanjespirit2895
    @obanjespirit2895 ปีที่แล้ว

    when you say chat-gpt do you mean gpt4 or 3, because there is a massive difference. 3 might as well be pepsi...diet. And also isn't gpt 4 like a trillion parameter model?

  • @VincentNeemie
    @VincentNeemie ปีที่แล้ว

    The title of the video is wrong, it should be QLoRA, not QLorRA, no?

  • @sharpcircle6875
    @sharpcircle6875 ปีที่แล้ว

    "I assumed that when somebody is dead, they are no longer a killer".
    I mean it's not totally wrong make that assumption since after someone's death, whatever part of his personality or status is customarily referred to in the past tense. So rather than saying "He is a great father, a kind person and a good singer..." you would say "He WAS a great father, a kind person and a good singer". Furthermore, a refers to a personhood of someone. So if we assume your are your consciousness and after your death, it leaves your body then the killer is no longer in the room and what remains is just a corpse :v
    I mean if you cut exactly half of your body and crawl to another room, in which room are you really :v ?

  • @neko-san5965
    @neko-san5965 ปีที่แล้ว

    Bruh, I don't want to pay for a cloud service to run these when I have an 11GB GPU :v

  • @mordechaisanders7033
    @mordechaisanders7033 ปีที่แล้ว

    what consumer GPU has 48 GB?

  • @travisborlkd1628
    @travisborlkd1628 ปีที่แล้ว

    Good run until 0:05... immediately destroyed my expectations