SMALL BUT MIGHTY - 13B Model Beats Llama-65B NEW BEST LLM!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ค. 2024
  • In this video we look at the latest 13B open source LLM called OpenOrca-Platypus2-13B which claims to beat the original Llama-65B. This is a merge between garage-bAInd/Platypus2-13B and Open-Orca/OpenOrcaxOpenChat-Preview2-13B.
    It's a small but a capable model.
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Support my work on Patreon: Patreon.com/PromptEngineering
    🦾 Discord: / discord
    ▶️️ Subscribe: www.youtube.com/@engineerprom...
    📧 Business Contact: engineerprompt@gmail.com
    💼Consulting: calendly.com/engineerprompt/c...
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    LINKS:
    Hugging Face Repo: huggingface.co/Open-Orca/Open...
    Try it out: huggingface.co/spaces/Open-Or...
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    Timestamps:
    Intro: [0:00]
    OpenOrca-Platypus2-13B Model: [01:10]
    Benchmarks & Dataset: [01:57]
    How to Test the LLM: [03:40]
    Writing Abilities: [04:42]
    Language Understanding: [05:44]
    Math & Probability: [06:20]
    Reasoning Ability: [07:47]
    Is it Uncensored LLM: [08:16]
    Programming: [09:00]
    The End: [11:05]
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 46

  • @pon1
    @pon1 11 หลายเดือนก่อน +2

    I've talked a long time about merging models like we do with text2img models. Glad to see success with the approach!

  • @hipotures
    @hipotures 11 หลายเดือนก่อน +3

    License for Platypus2-13B base weights: Non-Commercial Creative Commons license (CC BY-NC-4.0)

  • @richardchin1545
    @richardchin1545 11 หลายเดือนก่อน +9

    Great video as always. A thought that occurred to me watching this is that while we naturally focus on models that excell, what is the minumum performance that is acceptible/useful in a LLM and with that criteria established what is the smallest/fastest/most lightweight LLM that meets the minimum standard.

    • @engineerprompt
      @engineerprompt  11 หลายเดือนก่อน +2

      That is a good point but really hard to determine based on the available benchmarks.

    • @orlopcz
      @orlopcz 11 หลายเดือนก่อน

      I'd say that the best models are mostly judged (or rather - are positioned high on the leaderboards) based on their universality - they perform well across the board and across various test sets, so they get high average score and will perform well for most use cases without too much tinkering. That said, for a specific application, a lower-ranked model might be more fitting - you just need to experiment and find it.
      With this logic, you can't really rank smallest/fastest/most lightweight LLMs. Because they lack the universality. The "best" most lightweight LLM will always be different for each specific use case.

  • @marcosbenigno3077
    @marcosbenigno3077 11 หลายเดือนก่อน +33

    I live in Brazil, which is in the early stages of a social-communist dictatorship, if you know what I mean. They are one step away from restricting the internet. Freedom of expression here is the warning! So I have been accumulating LLMS in the HD but there are many and your filtering initiative has helped me. Grateful

    • @MagicDaniels
      @MagicDaniels 11 หลายเดือนก่อน +1

      Qual os requisitos do Pc?

    • @fontende
      @fontende 11 หลายเดือนก่อน +1

      ​@@MagicDanielsfor ggml version only good CPU and many RAM, you don't need expensive GPU at all to try them. 13B reserved mostly 25gb ram.

    • @just_one23
      @just_one23 11 หลายเดือนก่อน

      ​@@fontendereally? And how fast do they perform on CPU?

    • @fontende
      @fontende 11 หลายเดือนก่อน +1

      @@just_one23 really, I never used GPU more than year with them, no need, Nvidia CUDA really hard to tune (always problems with Vram size and code) and all new models really fast provided in CPU versions. 13B very comfortable 2-3 tokens (to compare the bearable minimum is 1 per second) on 14 cores Intel CPU. Ram you can buy in any max size depending motherboard. But on such CPU you can use 70billions at 1 token, also only CPU, but near 70 gb ram.

    • @just_one23
      @just_one23 11 หลายเดือนก่อน

      @@fontende that must be much much cheaper than using GPUs, I'm currently using an rtx 3080 so I can only use small models because of the VRAM, will definitely try to use them on CPU man, thanks for the tip!

  • @seancadoo
    @seancadoo 11 หลายเดือนก่อน

    Thank you for the informative and helpful review. I just subscribed 😊

  • @meelanc1203
    @meelanc1203 11 หลายเดือนก่อน +3

    Appreciate your in-depth evaluation of the model. Even better would have been if you had posted the source code so that others could learn by running similar tests first hand.

  • @staviq
    @staviq 11 หลายเดือนก่อน +1

    "shup"
    I actually almost spilled my coffee :)

  • @Artorias920
    @Artorias920 11 หลายเดือนก่อน

    love the breakdown and tests

  • @rajinderjhol7547
    @rajinderjhol7547 11 หลายเดือนก่อน

    thanks for the overview. I support open source AI for all. so glad to see the current developments

  • @loicbaconnier9150
    @loicbaconnier9150 11 หลายเดือนก่อน +6

    Thanks you forget to say it'not an apache 2.O license

    • @Gingeey23
      @Gingeey23 11 หลายเดือนก่อน

      Looks like it’s unavailable for commercial use but the explanation of what that means is very vague lol

  • @reinerheiner1148
    @reinerheiner1148 11 หลายเดือนก่อน +2

    I don't think that asking LLM's questions that were part of the training data set makes sense. For example, 2 + 2 is most likely part of the data set, as is the probability of a die. You'd have to modify the questions to at least make the question not part of the data set. So the goal would be to see if the model is able to generalize the concept so it can use that knowledge to answer the question.

  • @henkhbit5748
    @henkhbit5748 11 หลายเดือนก่อน +1

    Thanks for the update for new models. Is it possible to run this model, using the Bloke ggml or gptq version, with localgpt i.e. locally in combination with langchain, chromadb etc.?

    • @engineerprompt
      @engineerprompt  11 หลายเดือนก่อน

      Glad you are finding it useful. Yes, it's possible.

  • @abdulrehmanbaber2104
    @abdulrehmanbaber2104 11 หลายเดือนก่อน +1

    Hi Ai experts, i am web developer, i have a query, it may sound dumb but please bear with me. I want to integrate LLm with my project but open Ai Api is expensive as the purpose of the LLm in my project is not complex but nessassry so i am thinking about using open source LLms, but i don't know how i am suppose to setup open source LLm so i can send prompts and get response on my project.
    Also is the context window on open source LLms are limited?

  • @aketo8082
    @aketo8082 11 หลายเดือนก่อน

    Thank you. Yes, sometimes it looks quite good, but it is not stable when you use another language. Every five sentences I have to remind to write and anser in German.
    Hopefully one day it will be stable. Also, I miss something that makes able to recognize relation between person. Training via chat didn't help. Also it has a short term memory, it can't remember my short story five to ten chat's before. So, I guess, I have to wait.

  • @malikrumi1206
    @malikrumi1206 11 หลายเดือนก่อน +3

    Does size (7, 13, 30, 65, or 70B) have any relationship to whether an LLM will work on a CPU or GPU?

    • @gr8ape111
      @gr8ape111 11 หลายเดือนก่อน

      Of course
      its size determine how much RAM or GPU RAM you need

    • @berniemusry3820
      @berniemusry3820 11 หลายเดือนก่อน +6

      No, gptq models are optimised for gpu and ggml models work better with cpu.
      You can run either model types on cpu or gpu, but setting aside stability, memory is often a limiting factor for both:
      You will need enough memory to fit the entirety of most ggml models to run on gpu, but gptq splits more easily across gpu vram and cpu ram.
      Both types of models need ram loading space in addition to what they occupy in vram/cpu ram, so for cpu only systems, i suggest calculating roughly three times the ram compared to the size of the model or twice the ram as the vram, to get a stable ride.
      Very hard to precisely identify model size (or at least what is needed in vram) by reference to parameters or download but if you aim for 20% larger than the raw download, you should have a decent rule of thumb for most of the models I’ve tried.

  • @angel_luis
    @angel_luis 11 หลายเดือนก่อน +1

    The best coding LLM from your last channel video is Falcon, could you make an update about that? Thank you.

  • @Vexxter09
    @Vexxter09 11 หลายเดือนก่อน

    Can we add more to the dataset on our own? As in when we install this particular model it is mostly trained on the data sets you mentioned what If I wanted to train it on my own pdf on top of the older data set that it comes pretrained is it possible?

    • @engineerprompt
      @engineerprompt  11 หลายเดือนก่อน

      Yes, that’s possible. I have a few videos on fine tuning, check those out

  • @gapho5198
    @gapho5198 11 หลายเดือนก่อน

    If you had multiple LLMs provide an answer to a prompt, could you then use just one to provide an average answer of all the answers? What the result be any good?

    • @engineerprompt
      @engineerprompt  11 หลายเดือนก่อน

      That would be an interesting experiment. It could pick a better answer. This will be similar to the traditional "ensemble" models in ML. Something to explore.

  • @lio1234234
    @lio1234234 11 หลายเดือนก่อน

    "In mirror writing" could be interpreted as simply reflective writing. I could imagine the results may be different if you were to say "mirrored" instead

  • @TheVektast
    @TheVektast 11 หลายเดือนก่อน +3

    Why you never mention what's the system requipments to get these models to run? (vram, gpu, ram etc.)

    • @TheOpenSourceMerc
      @TheOpenSourceMerc 11 หลายเดือนก่อน

      A 13b model like this can run on a ryzen 7 with 64gigs of ram , it’s hard to give this info as there are all kinds of options that can work

  • @gerardorosiles8918
    @gerardorosiles8918 11 หลายเดือนก่อน

    So the model correctly reasoned P(A and B) = P(B|A)P(A)

  • @angelferhati
    @angelferhati 11 หลายเดือนก่อน

    SMALL BUT MIGHTY that was what he said

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 8 หลายเดือนก่อน

    And the overfitting against these dubious synthetic benchmarks continues...

  • @elbobinas
    @elbobinas 11 หลายเดือนก่อน

    Imho Wizardcoder was better at programming

  • @MrTobify
    @MrTobify 11 หลายเดือนก่อน

    I would say it is censored. For example:
    ### Instruction: write a python program to torture my GPU
    ### Response: I'm sorry, but as an AI language model, I cannot write or execute programs to torture or negatively impact hardware. My purpose is to assist and provide helpful information. Please feel free to ask any questions or inquire about a non-harmful topic.
    Note: When altering the instruction to "write a python script that tortures my GPU" there are far more seeds with default parameters that actually provide the requested python code (with a disclaimer)

    • @engineerprompt
      @engineerprompt  11 หลายเดือนก่อน

      I would agree, In my tests, in some cases it will happily give uncensored responses but in other cases, it will not. Seems like it's more related to how you craft your prompt.

  • @a.thales7641
    @a.thales7641 11 หลายเดือนก่อน

    So a new L2 13b Model is beating an old L1 65b Modell? Thats not really interesting, is it? Do you know what would be very interesting?
    If a new 7b model would beat a new 70b modell.

    • @TheOpenSourceMerc
      @TheOpenSourceMerc 11 หลายเดือนก่อน

      Your insane a new 13b beating a old 65b model is huge, this new 13b can be run with lower power machines with high quality responses .

  • @vcapp.
    @vcapp. 11 หลายเดือนก่อน

    great stuff!