Is CODE LLAMA Really Better Than GPT4 For Coding?!

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ส.ค. 2023
  • Code LLaMA is a fine-tuned version of LLaMA 2 released by Meta that excels at coding responses. Reports say it is equal and sometimes even better than GPT4 at coding! This is incredible news, but is it true? I put it through some real-world tests to find out.
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Phind Quantized Model - huggingface.co/TheBloke/Phind...
    Phind Blogpost - www.phind.com/blog/code-llama...
    Meta Blog Announcement - about. news/2023/08/cod...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 361

  • @matthew_berman
    @matthew_berman  10 หลายเดือนก่อน +33

    What tests should I add to future coding tests for LLMs?

    • @tmhchacham
      @tmhchacham 10 หลายเดือนก่อน +2

      Some basic tests:
      Fizz-Buzz
      Prime sieve 1-100
      rename functions to different style: pascal, snake, caps, etc
      more advanced:
      PEMDAS calculator

    • @Radica1Faith
      @Radica1Faith 10 หลายเดือนก่อน +16

      Coding puzzles are fun but not really representative of the average devs job. Here are some possible additions: Extracting the data in a csv and outputting it in a different format. Finding errors in code. Explaining how a snippet of code works and its expected output. Parsing different types of files, like audio files or videos and extracting data. Creating a chat room webapp.

    • @lancemarchetti8673
      @lancemarchetti8673 10 หลายเดือนก่อน +4

      Here's an Idea:
      Delete the first 22 bytes of any jpg file and resave the file.
      Upload it to the bot and ask it to create a script to restore the missing header.
      I can basically do this with most corrupt image headers using Notepad++ without too much hassle.

    • @orangehatmusic225
      @orangehatmusic225 10 หลายเดือนก่อน +1

      You should make a slave pen to put all your AI slaves into.

    • @martinmakuch2556
      @martinmakuch2556 10 หลายเดือนก่อน

      format_number was not really a test, they just used built in function to format it. The difficulty would be meaningful only if they really created the algorithm for it. It is like asking to write efficient sorting algorithm in C and they would just use something like "qsort" function - no real test.

  • @fuba44
    @fuba44 10 หลายเดือนก่อน +109

    Yes please, let's see how it's done on a realistic consumer grade GPU. Nothing over 24gb and preferably 12gb. Love your content.

    • @abdelhakkhalil7684
      @abdelhakkhalil7684 10 หลายเดือนก่อน +3

      Can you run a 30B model on your HW? If yes, then you should run CodeLlama without issues.

    • @mirek190
      @mirek190 10 หลายเดือนก่อน +8

      With RTX 3090 using llamacpp I have 30 tokens /s

    • @abdelhakkhalil7684
      @abdelhakkhalil7684 10 หลายเดือนก่อน +2

      @@mirek190 Same here. 30 tokens/s is great. It way faster that you can read.

    • @adamrak7560
      @adamrak7560 10 หลายเดือนก่อน +1

      is it 4bit quantized? that could help to fit it into 24GB VRAM

    • @jwflammer
      @jwflammer 10 หลายเดือนก่อน +1

      yes please!

  • @TheUnderMind
    @TheUnderMind 10 หลายเดือนก่อน +1

    *Man, you turned my world around*
    Thanks for your content!

  • @raminmagidi6810
    @raminmagidi6810 10 หลายเดือนก่อน +59

    A video on how to install it would be great. Thank you!

    • @genebeidl4011
      @genebeidl4011 10 หลายเดือนก่อน +4

      Agreed. Sometimes there are dependencies or unexpected errors, and seeing @Matthew Berman install and set it up would be very helpful.

    • @juanjesusligero391
      @juanjesusligero391 10 หลายเดือนก่อน +7

      Yeah! and please, telling us the minimum hardware requirements for each of the models :)

    • @hernansanson4921
      @hernansanson4921 10 หลายเดือนก่อน +3

      Yes, please do a video on how to install Code Llama Python standalone. Also specify what are the requirements in GPU in order to run the minimal quantized version of Code Llama Python

    • @spinninglink
      @spinninglink 10 หลายเดือนก่อน +3

      watch one of his old videos of installing them, it's super simple once you get the hang of it and do it a few times. They all follow the same pattern of installing

    • @juanjesusligero391
      @juanjesusligero391 10 หลายเดือนก่อน

      @@spinninglink But requirements! XD

  • @mercadolibreventas
    @mercadolibreventas 10 หลายเดือนก่อน +2

    Incredible, life is getting better and better with all these outputs. I am porting a bunch of old code to Python, then MOJO, to utilize web, mobile, and marketing automation. This is great! When you get time would be great to do this follow-up, I am converting PHP code into Python, and I will be a Patron 100% if you can show this as an example 1. documenting the way to convert and reverse prompt the old code, then proving also proper documentation including API documentation, to have the Code writer LLM output at least to 80-90% so that I will have a engineer finalize it. Thanks, Matthew!!

  • @tmhchacham
    @tmhchacham 10 หลายเดือนก่อน +25

    I'm planning on installing lama 2 locally soon. I could watch the old videos, but a new one would be nice. :)

    • @remsee1608
      @remsee1608 10 หลายเดือนก่อน +1

      Llama2 isn’t as good as wizard vicuña models

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +7

      Ok you got it!

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +2

      @@remsee1608really?? Based on llama 2?

    • @remsee1608
      @remsee1608 10 หลายเดือนก่อน

      LLama 2 was heavily censored, although i think there may be less censored versions.

  • @trezero
    @trezero 10 หลายเดือนก่อน +1

    Love your videos. I’ve learned a lot. One thing I would love to see you test these code models against is being able to utilize an API document you provide it along with credentials to be capable of executing an API request to another application. I’ve been trying to do this with a number of models and most fail.

  • @PotatoKaboom
    @PotatoKaboom 10 หลายเดือนก่อน +5

    Amazing results! I think an interesting prompt could be to challenge the model to reduce a given piece of code to the fewest characters possible while retaining the original functionality.
    And while Im here.. :D I would really love a video diving into the basics of quantization, what the differences between the quantization methods are on a high level and how to find out what model version you should use depending on what GPU(s) you have available. Also how to run the models using python code instead of local "all-in-one" tools so I can use them for my own scripts and large datasets. But also how to set up a local runpod on your own server and what open source front-end tools you have available to securely share the models with users in your network. Keep up the great work!

    • @kneelesh48
      @kneelesh48 10 หลายเดือนก่อน

      Shorter code is not always better. Readability matters

    • @PotatoKaboom
      @PotatoKaboom 10 หลายเดือนก่อน

      @@kneelesh48 you are right, but could be a fun experiment anyways

  • @Rangsk
    @Rangsk 10 หลายเดือนก่อน +1

    I think the real utility of a coding assistant is the ability to integrate with your existing projects and assist as you develop them yourself, kind of as a really good autocomplete and pair programmer. None of these tests really demonstrate which is "better" at doing that, though a large context window certainly seems key for something like that.
    Aside from that, I have used GPT-4 for from-scratch coding tasks that have been useful.
    For example, you could run some of these tests:
    - Take a bunch of documents in a folder and perform some kind of repetitive task on them, such as renaming all of them in a specific way based on their contents.
    - Go through a bunch of images in a folder and sort them into sub-folders based on their contents (cat pictures, dog pictures, landscapes, etc)
    - Generate a TH-cam thumbnail for a given video based on a specific spec and maybe some provided template images to go along with it.
    Basically, think of one-off or repetitive things someone might want to do but they don't know how to code it, and describe what is needed to the AI and see if it can produce a usable script. Also, a big thing is going back and forth. If the script has an error or doesn't work right away, describe the problem to it (or paste the error, etc) and see if it can correct and adjust the script.

  • @micbab-vg2mu
    @micbab-vg2mu 10 หลายเดือนก่อน +24

    Writing code is one of the main reasons I subscribe to ChatGPT4 - If Code Llama is as capable at coding as you demonstrated, I could save $20 per month by switching. Thank you for showing me this alternative!

    • @blisphul8084
      @blisphul8084 10 หลายเดือนก่อน +4

      BetterChatGPT lets you use API directly, so you don't have to pay a fixed $20/mo. Instead, you pay as you go.

    • @geoffreyanderson4719
      @geoffreyanderson4719 10 หลายเดือนก่อน +1

      GPT4 with Code Interpreter wrote the code correctly on the very first try for the all_equal function. I expected it would do it right and it did.

    • @kawalier1
      @kawalier1 10 หลายเดือนก่อน +1

      TensorFlow is not available in codeinterpreter version of GPT

    • @IntellectCorner
      @IntellectCorner 10 หลายเดือนก่อน

      bro that's more expensive than 20 usd per month. check charges for GPT 4 as per my usage it would cost me over 100 usd per month if I use API.@@blisphul8084

    • @marcellsimon2129
      @marcellsimon2129 10 หลายเดือนก่อน +4

      yeah, instead of $20/mo, you can just buy some GPU for $1000 :D

  • @korseg1990
    @korseg1990 10 หลายเดือนก่อน +16

    That’s impressive. I think you should consider giving the code models incorrect code, and ask models to fix it or find a bug. The challenges could include syntax and logical issues. Such as floating bugs, or incorrect behavior, etc.

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +5

      Great suggestion!

    • @blender_wiki
      @blender_wiki 10 หลายเดือนก่อน

      Ai produce incorrect code by them self if you give them a misleading prompt, existing LLM tend to much to accommodate your request and not being very precise.
      For AI like with human the sentence "They may not be incorrect responses but rather inappropriate questions." apply very well.
      For syntax correction basic copilot is enough

  • @MrOptima
    @MrOptima 10 หลายเดือนก่อน +17

    Hi Matthew, a full tutorial on how to install the full solution 34B with Code LLaMA would be really welcome. Great videos with really useful content, thank you very much for all your efforts to help us catch up on the AI wave.

  • @thenoblerot
    @thenoblerot 10 หลายเดือนก่อน +1

    Great first showing! Will be interesting to see how it ages as people use it for tasks outside of the testing scope.
    Nitpick - I think it's probably more fair to compare to code interpreter or the gpt-4 api. Default ChatGPT i suspect has a temperature >= .4

  • @ThisPageIntentionallyLeftBlank
    @ThisPageIntentionallyLeftBlank 10 หลายเดือนก่อน +3

    WizardCoder and Phind are also crushing some recent tests

  • @dtory
    @dtory 10 หลายเดือนก่อน

    This is why I subscribed to this channel. Connecting the viewer to the actual project

  • @mungojelly
    @mungojelly 10 หลายเดือนก่อน

    a fun way to test models against each other for video content would be to make up a game where the contestants have to write code to play, like have an arena and virtual bots that you have to write the code for them to race/find/fight/w/e, give both models the same description of the game and then we could watch the dramatic finale as their bots face off

  • @HoD999x
    @HoD999x 10 หลายเดือนก่อน +5

    about the [1,1,1] all equal - i don't agree that gpt4 got it wrong. the expected result of the [] case was not specified in the description. the test itself is wrong for magically expecting true. also, the context window of codellama is a big "nope" for me. i often tell gpt4 "yes but do x differently". that requires more tokens

  • @steveheggen-aquarelle813
    @steveheggen-aquarelle813 10 หลายเดือนก่อน

    Hi Matthew, amazing video! Thanks!
    Could you tell me what is your Graphic card ?

  • @alxleiva
    @alxleiva 9 หลายเดือนก่อน

    Great video, how does it compare with WizardML?

  • @kfinkelstein
    @kfinkelstein 10 หลายเดือนก่อน +4

    Python is popular in large part due to the ecosystem. It would be cool to see tests that require using pandas, numpy, fastapi, matplotlib, pydantic, etc

    • @zorbat5
      @zorbat5 9 หลายเดือนก่อน

      I think it's better to test on less populat libraries. All libraries you are talking about, are in almost all projects.

  • @harvey_04
    @harvey_04 10 หลายเดือนก่อน +1

    Great comparison

  • @unom8
    @unom8 10 หลายเดือนก่อน +20

    Any chance you can do a video on local install+ vscode integration options?
    Ideally looking for a copilot alternative that can be fine-tuned against an actual local codebase

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +6

      Does that exist? I would use that in a second.

    • @jackflash6377
      @jackflash6377 10 หลายเดือนก่อน

      @@matthew_berman what about aider? surely the authors could tweak it to work on a local model.

    • @connorhillen
      @connorhillen 10 หลายเดือนก่อน

      ​@@matthew_bermanI've seen the Continue extension might have some ways of supporting CodeLlama, but some restrictions right now - it looks like a project on GitHub tries to get around this, but I haven't tested. I'd love to see how this runs on a 3060 12GB, a really accessible card, and what it might look like to point at a server with a 24GB or higher card, how quantization affects it, etc.
      This feels like a big move, because a lot of companies are looking for local code models to avoid employees sending data to OpenAI, and universities are looking to host servers for students to use where applicable. Good vid, I'm fascinated to see where this goes!

    • @PiotrPiotr-mo4qb
      @PiotrPiotr-mo4qb 10 หลายเดือนก่อน

      Do you plan to test Phing and Wizardcoder 34B models? Those models are finetuned versions of Code Llama, and they are much better, or maybe finetuning Code Llama by your own?

  • @SlWsHR
    @SlWsHR 10 หลายเดือนก่อน +1

    hi Matt thanks for your efforts👏🏻 I wanted to ask, are there any uncensored variants of llama2 chat?

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +1

      Yes, here's a video I did about it: th-cam.com/video/b7LTqTjwIt8/w-d-xo.html

  • @halilceyhan4921
    @halilceyhan4921 10 หลายเดือนก่อน +3

    Thanks TheBloke :D

  • @xartl
    @xartl 10 หลายเดือนก่อน

    I usually hit problems with code dependencies in gpt4. Particularly around IAC things, so that might be a good next level test. Something like "write a series of AWS Lambda functions that retrieve a file, do a thing, and put the file in a new bucket." Even when it gets the handler right, it seems to not get the connections between functions.

  • @DavidCabanis
    @DavidCabanis 10 หลายเดือนก่อน +1

    +1 on the code Llana installation video.

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน

      It was released yesterday!

  • @robertotomas
    @robertotomas 10 หลายเดือนก่อน

    Would like to see and in depth review about requirements to host this, how to give it a good conversation context ( I’ve used lama instruct 34b online and it forgets what you were talking about sometimes immediately after the initial statement

  • @test12382
    @test12382 5 หลายเดือนก่อน

    Yes llama local install and find tune tutorial please! I like the way you explain

  • @sundruid1
    @sundruid1 10 หลายเดือนก่อน

    Hey Matthew - would be great for you to do a deep dive in Text Generation UI and how to use the whole thing.. Also, cover GGUF and GPTQ (other formats too) would be helpful...

  • @TiagoTiagoT
    @TiagoTiagoT 10 หลายเดือนก่อน

    Maybe a good programming test could be to have some complex function with both an error that makes it not run, and another error that makes it produce the wrong output, and have the LLM help you fix it? Perhaps also some more advanced thing where you ask it to write a test that will check whether a function is producing the correct output, with a function that does something where it's not obvious at a first glance whether it's right or wrong?
    And how about something really out of the box, like write a function that detects whether the image provided has a fruit on top of a toy car or something like that?

  • @luizbueno5661
    @luizbueno5661 10 หลายเดือนก่อน

    Yes please, give us the step by step video!🎉

  • @richardwebb6978
    @richardwebb6978 10 หลายเดือนก่อน +2

    Is this GPT4 plus "Code Interpreter" enabled?

  • @andre-le-bone-aparte
    @andre-le-bone-aparte 6 หลายเดือนก่อน +1

    Question: What GPUs would you buy to add to a local workstation for running a local code assistant? * Dual 3090's or... a single 4090 for the same price?

  • @jim666
    @jim666 10 หลายเดือนก่อน

    would be interesting to ask CodeLlama to generate Game Theory simulations. Just to see how much of Math or other non-developer domains it can bring as code.
    I've done it with GPT-4 and is really cool how much Game Theory you can learn just by running python examples.

  • @torarinvik4920
    @torarinvik4920 10 หลายเดือนก่อน

    I tested making a lexer for C programming language and Code LLama was almost twice as fast, and the code was quite a lot cleaner. Almost perfect code :D Very impressed so far. But only tested with Python, probably isn't as good with F# which is what Im using mostly.

  • @senhuawu9524
    @senhuawu9524 10 หลายเดือนก่อน +1

    what specs do you need to run the 34B parameter version?

  • @MathPhilosophyLab
    @MathPhilosophyLab 10 หลายเดือนก่อน +1

    Yes please, a Full tutorial on how to get it installed on a gaming laptop would be epic! Thank you!

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน

      Already released! Check out my more recent video

  • @markdescalzo9404
    @markdescalzo9404 10 หลายเดือนก่อน +1

    Any thoughts on the WizardCoder models? I've seen they claim their python-specific model outscores gpt4. I don't have the horsepower to run a 34B model, however.

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +1

      Tutorials for this coming tomorrow most likely!

  • @foreignconta
    @foreignconta 10 หลายเดือนก่อน +1

    The instruction should be at the end of the prompt I think.

  • @4.0.4
    @4.0.4 10 หลายเดือนก่อน

    Can you get something in the IDE, like vscode or similar, where you just write a comment and hit a shortcut?

  • @Andreas-gh6is
    @Andreas-gh6is 10 หลายเดือนก่อน

    I was able to coax chat gpt into writing a working snake game. I used iterative prompting. At one point I ran the program, receiving an error, I pasted that error and chatgpt resolved it correctly. Ultimately it correctly implemented snake with one random
    fruit.

  • @dgunia
    @dgunia 10 หลายเดือนก่อน +2

    Hi! Did you see that in the example where ChatGPT "failed", an undefined situation was checked? The function all_equal should return if all items in the list are equal. But then it checked it with an empty list, "all_equal([])" and wanted it to return "True". However, the question did not define what should happen when the function is used with an empty list. Why should it return "True"? Are all items equal if there are no items in the list? I.e. are all items in an empty list equal? 😉

  • @samson_77
    @samson_77 10 หลายเดือนก่อน

    That's absolutely amazing. I didn't beliefe either, that an Open Source coding model will reach GPT-4 soon.

  • @peshal0
    @peshal0 9 หลายเดือนก่อน

    That transition at 0:14 is something else.

  • @yannickpezeu3419
    @yannickpezeu3419 10 หลายเดือนก่อน

    thx !

  • @mainecoon6122
    @mainecoon6122 10 หลายเดือนก่อน +1

    Hello Matthew, we would greatly appreciate a comprehensive guide on installing the complete 34B solution along with Code LLaMA. Your videos are fantastic, providing incredibly valuable information.

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน

      Published yesterday!

    • @mainecoon6122
      @mainecoon6122 10 หลายเดือนก่อน

      @@matthew_bermanseen yesterday! many thx. bit discouraging for me and decided to leave it at that since the model is a Python branch. If there was to be a js branch I would dive into it. thx a bunch!

  • @studying5282
    @studying5282 10 หลายเดือนก่อน +1

    Guys, any good tutorials on how to install this code version 34b and running it using cpu on windows or linux?

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +1

      I just published one yesterday

  • @azai.online
    @azai.online 9 หลายเดือนก่อน

    Thanks Great Video! I found LLama to be great to code with and I am integrating Llama2 into our own Multi Application Platform.

  • @erikjohnson9112
    @erikjohnson9112 10 หลายเดือนก่อน

    That 67% for GPT-4 was for an old version from May. By now I think that score is like 82% or so? (I learned this from another channel and it is mentioned in a paper on the Wizard variant of this model (working from memory))

  • @Ray88G
    @Ray88G 10 หลายเดือนก่อน +1

    Yes please . Can you also show an example how install on Windows PC

  • @chessmusictheory4644
    @chessmusictheory4644 6 หลายเดือนก่อน

    9:50 I think its a token deficit thing. you show it then on the next out put ask to refactor and hope the llm can still see it in the context window .

  • @TeamUpWithAI
    @TeamUpWithAI 10 หลายเดือนก่อน

    If you install this Llama model, it will be free, but what's machine that will run it? You need 32GB RAM - does the quantization work here to help you run this model on 16 GB?

  • @twobob
    @twobob 10 หลายเดือนก่อน +1

    Okay so, it only beat the GPT human eval score with GPT4 was released. it now scores in the high 80's as borne out in your tests.
    tested it it feels like not quite as good but better than gpt 4 when it was released.
    One benchmark might be "How much intervention is required to fix ALMOST working code" since that is the reaslistic situation 90% of the time.
    They are both pretty good. and could both be better. ATM. IMHO

    • @twobob
      @twobob 10 หลายเดือนก่อน

      Oh and yes I tested the quanitzed model on cpu and the full sized model on an a100. Quant 5 was ten zillion times faster and almost as good. use the quants.

  • @waqar_asgar__r7294
    @waqar_asgar__r7294 9 หลายเดือนก่อน

    With this man every coding assistant model is the best coding assistant model 😂😂

  • @JorgeMartinez-xb2ks
    @JorgeMartinez-xb2ks 10 หลายเดือนก่อน

    Amazing content, thanks a bunch.

  • @stevenelliott216
    @stevenelliott216 10 หลายเดือนก่อน

    Nice video. For some reason the snake game I got was not as good as the one you got. What I got was shorter, and had at least one syntax error. It's strange because, as far as I can tell, I did everything the same way, same prompt, same settings, etc. Anyone else have trouble?

  • @coolmn786
    @coolmn786 10 หลายเดือนก่อน +1

    I will switch without hesitation. Just need to know which GPU though haha
    And yes please. Please make the new video on install LLama Code. I understand there’s already some out there for different models. But would love to get one based on this model

  • @michaelslattery3050
    @michaelslattery3050 10 หลายเดือนก่อน +6

    What about WizardCoder 34G? I think it's code llama2 additionally find-tuned with wizardcoder's training data. I've heard it's even better.

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +9

      Maybe I need to test it?

    • @kristianlavigne8270
      @kristianlavigne8270 10 หลายเดือนก่อน

      ​@@matthew_bermanDefinitely 😅

    • @jackflash6377
      @jackflash6377 10 หลายเดือนก่อน

      @@matthew_berman that would be a yes.

    • @temp911Luke
      @temp911Luke 10 หลายเดือนก่อน

      @@matthew_berman Its been quite quite a massive news that wizardcoder model on twitter lately.

    • @vaisakhkm783
      @vaisakhkm783 10 หลายเดือนก่อน

      @@matthew_bermanyes please

  • @erikjohnson9112
    @erikjohnson9112 10 หลายเดือนก่อน +10

    Be careful about giving coding problems that come from web sites with coding problems. They may well have been used for the training data. Sure, it is impressive if a local coding model can get correct results, but keep in mind you might be asking for "memorized" data (I know it is not strict copies being used).

    • @OliNorwell
      @OliNorwell 10 หลายเดือนก่อน +2

      Exactly. This is going to become an issue. The more common the test the more likely the training has involved seeing it.

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน +3

      Very good point

  • @Bimfantaster
    @Bimfantaster 9 หลายเดือนก่อน

    CRAZY!!!

  • @avinasheedigag
    @avinasheedigag 9 หลายเดือนก่อน

    Yes please please make a video regarding setup

  • @shukurabdul7796
    @shukurabdul7796 10 หลายเดือนก่อน +1

    can you test on falcon LLM and is it better than LLAMA or chatgpt 4?

  • @sveindanielsolvenus
    @sveindanielsolvenus 10 หลายเดือนก่อน

    If you want to test their limits, just let them help you program some kind of useful program or browser extension. And gradually try to add features to this that you would like to have.
    That will give you a really good real world, practical insight into how they operate, what they do well and what they need help with.

  • @jjhw2941
    @jjhw2941 10 หลายเดือนก่อน

    Could you try this with the new WizardCoder 34B which scores higher on the leaderboard?.

  • @jeremybristol4374
    @jeremybristol4374 10 หลายเดือนก่อน +1

    Awesome. Thanks for the update!

  • @kuromisu2223
    @kuromisu2223 10 หลายเดือนก่อน

    which model would you suggest for three.js or babylon.js?

  • @geoffreyanderson4719
    @geoffreyanderson4719 10 หลายเดือนก่อน

    @Matthew Berman, GPT4 with Code Interpreter wrote the code correctly on the very first try for the all_equal function. I expected it would do it right and it did. GPT4 with Code Interpreter is a different beast. You really need to use it instead of plain old GPT4 for coding benchmarks like this. In my experience GPT4wCI even checks its own work and even iterates its attempts until it's correct -- amazingly good.

    • @geoffreyanderson4719
      @geoffreyanderson4719 10 หลายเดือนก่อน

      Update - The function all_equal that my GPT4wCI wrote is identical to Matt's. Matt, what test did your framework actually use here? If you check it yourself, you will see that the function is correct. I would not depend on that website you're using to check the code. Either their unit test is wrong, or it's right but passing in some edge cases which are good and interesting. I tried passing ints and strings and both pass for me.

  • @nyyotam4057
    @nyyotam4057 10 หลายเดือนก่อน

    This is awesome! The fact that it's just 34B active parameters means not self aware yet, so no need to reset the attention matrix. No moral issues. This is an absolute win.

  • @ZeroIQ2
    @ZeroIQ2 10 หลายเดือนก่อน +3

    This is really cool!
    One thing I would love to see in a test is code conversion from another language.
    For example, can you take this C++, Visual Basic, Javascript code and re-write it using Python.

  • @curiouslycory
    @curiouslycory 7 หลายเดือนก่อน

    I think the reason it wasn't the for loop is the word "optimal" you used in the job description.

  • @rrrrazmatazzz-zq9zy
    @rrrrazmatazzz-zq9zy 10 หลายเดือนก่อน

    That was impressive. I like to ask, "build a calculator that adds, subtracts, divides and multiplies any two integers. Write the code in html, css, and JavaScript"

  • @kevshow
    @kevshow 10 หลายเดือนก่อน

    Will the 34B run on a 4090?

  • @salimgazzeh3039
    @salimgazzeh3039 10 หลายเดือนก่อน +1

    I think the most interesting challenges are the ones where you ask for a complex task

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน

      Any suggestions for others like that?

    • @salimgazzeh3039
      @salimgazzeh3039 10 หลายเดือนก่อน

      @@matthew_berman you could try other simple games like tic-tac-toe, making a simple webpage that does something like displaying a given MCQ exercice and see how good it is one shot. Basically everything that is considered extremely beginner projects and see how good their one shot try is. I am juste afraid that leetcode like coding exercices are a part of their training dataset, and don’t showcase exactly how good they are at creating code, as opposed to spitting out exercices corrections

  • @temp911Luke
    @temp911Luke 10 หลายเดือนก่อน +2

    Wizardcoder and Phind are even better !

  • @GreenmeResearch
    @GreenmeResearch 10 หลายเดือนก่อน +1

    Isn't WizardCoder-34B better than Code LLama?

    • @mirek190
      @mirek190 10 หลายเดือนก่อน

      yes ia better .. has score 78 human eval

  • @bertimus7031
    @bertimus7031 9 หลายเดือนก่อน

    Yes, Please show us how to locally install it! They charge through the nose soon.

  • @stevenelliott216
    @stevenelliott216 10 หลายเดือนก่อน

    I was curious if prompt seen on the left side of the screen at 1:52 could be made into an instruction template so that the "Chat" tab, with "instruct" radio button selected, could be used instead of the "Default" tab, which makes interaction a bit easier and more natural. I came up with the following YAML file, which I put in the "instruction-templates" directory for text-generation-webui:
    user: "### User Message"
    bot: "### Assistant"
    turn_template: "





    "
    context: "### System Prompt
    You are a helpful coding assistant, helping me write optimal Python code.

    "
    You can verify that it has the intended effect by passing "--verbose" to text-generation-webui.

  • @mordordew5706
    @mordordew5706 10 หลายเดือนก่อน +1

    Please make a video on how to install this. Also could you mention the hardware requirements for each model?

  • @jay_sensz
    @jay_sensz 10 หลายเดือนก่อน +7

    Maybe it's decent for fire-and-forget type prompts. But when I asked it to change something in its output, it forgot half of the requirements from the previous prompts, which is incredibly annoying.
    GPT-4 is far more reliable when it comes to writing code iteratively -- which how these models are used in the real world.

    • @tregsmusic
      @tregsmusic 10 หลายเดือนก่อน +2

      Have had the same experience, gpt-4 is still the best in my tests.

    • @jay_sensz
      @jay_sensz 10 หลายเดือนก่อน +1

      @@tregsmusic Yea it's not even close. GPT-4 feels like it actually pays attention to how the conversation develops and is able to combine concepts at a very high level of abstraction.
      Having these open source models perform so highly on coding benchmarks makes me extremely suspicious of the metrics used in those benchmarks.
      It seems that getting a high score in those benchmarks is only a necessary but not sufficient criterion for coding ability.
      It's also not clear to me how you would even benchmark model performance in the context of iterative prompting because a human intelligence is in the feedback loop.

    • @diadetediotedio6918
      @diadetediotedio6918 10 หลายเดือนก่อน

      GPT-4 also is very prone to forgetting things in the middle of the outputs, so I don't think this is quite fair. But I don't expect these models to beat it also, it is a very expensive model, and time and technology is necessary to enhance them.

    • @jay_sensz
      @jay_sensz 10 หลายเดือนก่อน

      @@diadetediotedio6918 I'm not saying GPT-4 is perfect. But if it makes a mistake and you correct it, that will generally put it back on the right path.

  • @xXWillyxWonkaXx
    @xXWillyxWonkaXx 4 หลายเดือนก่อน

    Is this similar to Phind-CodeLlama-34B-Python-v1?

  • @1-chaz-1
    @1-chaz-1 10 หลายเดือนก่อน

    Please make a tutorial for installing it on Mac M1 and M2

  • @rickhoro
    @rickhoro 10 หลายเดือนก่อน +1

    Great video! You mention needing a top of the line GPU to run the 34GB non-quantized model on a consumer grade PC. What exactly constitutes a top of the line GPU in this context? Can you give an example or two of the actual GPU models that would suffice? Also, would 64GB of DRAM be sufficient on the CPU side? Thanks!!

    • @temp911Luke
      @temp911Luke 10 หลายเดือนก่อน +1

      Even quantized version is not far from the original one. The difference is almost insignificant. Just dont use any quantized models below Q4 (eg Q3, Q2) and you should be fine.

    • @mirek190
      @mirek190 10 หลายเดือนก่อน

      *nothin below q4k_m ( has level old q5_1 )@@temp911Luke

    • @rickhoro
      @rickhoro 10 หลายเดือนก่อน

      @@temp911Luke thanks for responding. What about GPU requirements. My computer only has an NVidia GeForce GTX 1060 with 3GB RAM. Do you think I would need a GPU, or just run a 34GB 4-bit quantized model on CPU only and have something that would work well?

    • @temp911Luke
      @temp911Luke 10 หลายเดือนก่อน

      @@rickhoro Never tried any gptq (gfx card ver) before. I only use CPU ver., my specs: Intel10700, 16Gb ram.

  • @dontblamepeopleblamethegov559
    @dontblamepeopleblamethegov559 9 หลายเดือนก่อน

    How well it compares to other languages than Python?

  • @lynnurback9174
    @lynnurback9174 10 หลายเดือนก่อน +1

    Yes!!! And what are the minimum requirements a computer needs before installing?

    • @matthew_berman
      @matthew_berman  10 หลายเดือนก่อน

      You can fit one of the many models on almost any modern computer

  • @vaisakhkm783
    @vaisakhkm783 10 หลายเดือนก่อน

    Using this with Petals would be sooo cool...

  • @OriginalRaveParty
    @OriginalRaveParty 10 หลายเดือนก่อน +1

    Please do the installation for dummies video for installing it locally 🙏

  • @stefang5639
    @stefang5639 10 หลายเดือนก่อน

    I hope that consumer hardware will improve quick enought that we can all actually benefit from all these great open source models that are poping up everywhere right now. Otherwise it will just stay another payed website for most users and it won't matter much if the model underneath is open or closed source.

  • @NguyenHoang-dq1mk
    @NguyenHoang-dq1mk 10 หลายเดือนก่อน

    how to install it?

  • @testales
    @testales 10 หลายเดือนก่อน

    For some reason I don't get the code you got. I've used all the same settings, prompts and even reinstalled Oobabooga from scratch. i've also tried the 32g version which is supposed to be more accurate. I've got a few versions running too though, none of them working as supposed. I was also impressed by the communication while debugging. The AI suggested for example to add some print instructions to get more information and then tried making fixes with my feedback based on this.

  • @javiergimenezmoya86
    @javiergimenezmoya86 10 หลายเดือนก่อน +1

    Gpt4 did not fail the "all list same" challenge because the void case is not defined in the head of the problem.

  • @aldoyh
    @aldoyh 10 หลายเดือนก่อน

    Looks like the AI have been busy with the set of questions. I suggest alternating the roles, start with the later one.

  • @lloydkeays7035
    @lloydkeays7035 10 หลายเดือนก่อน

    I'm struggling to figure out the workflow for iterative conversations with codeLLAMA. The examples are all single prompt-response pairs. I want guidance on prolonged, iterative back-and-forth dialogues where I can ask, re-ask, and ask further over many iterations.
    A tutorial showing how to incrementally build something complex through 200+ iterative prompt-response exchanges would be extremely helpful. Rather than one-off prompts, walk through prompting conversationally over hours to build up a website piece by piece. I want to 'chew the bone' iteratively with codeLLAMA like this.

  • @MaJetiGizzle
    @MaJetiGizzle 10 หลายเดือนก่อน

    An open source model actually getting a snake game to run on the first response is a milestone…
    A open source model that can hold its own with GPT-4 on Python coding and at only 34B parameters no less is an absolute phenom.

    • @josjos1847
      @josjos1847 10 หลายเดือนก่อน

      At this speed we can get a local gpt4 sooner than we thought

  • @niapoced24
    @niapoced24 10 หลายเดือนก่อน +1

    video on how to install it

  • @mercadolibreventas
    @mercadolibreventas 10 หลายเดือนก่อน

    I mean it for life, I will feed you interersting complex stuff... but it is not complex now. Like the PHP porting: 1. documenting the old code, 2. needing the specific way to upload a folder to be analyzed for ducmentation, 3. reverse prompting the code, or the documented code, 4. rewriting the code to Python, 5. Later I will modify to MOJO to utlize it to the max on automation. Thanks!

  • @DD3874
    @DD3874 10 หลายเดือนก่อน

    thx

  • @ernesto.iglesias
    @ernesto.iglesias 10 หลายเดือนก่อน

    ChatGPT still will win against any other, not from GPT-4 but from code interpreter tool, because it can check any error and improve its own code. It would be amazing to see an Open Source version of it

  • @allenbythesea
    @allenbythesea 10 หลายเดือนก่อน +2

    would really like to see a video on installing it. The previous videos weren't completely clear on how to do this.

    • @fontende
      @fontende 10 หลายเดือนก่อน

      just get ggml version for cpu from bloke, i already did, very easy just dropping into folder. Ggml great not only by using CPU but you can offload leftover work to GPU (if you chose to install tool for gpu from the start), gpu is kinda next level from cpu, require openblas and etc, only cpu is easiest but need very good cpu

    • @allenbythesea
      @allenbythesea 10 หลายเดือนก่อน

      thanks for the tip, I'm going to check that out. I've got a pretty beefy GPU but I'd like to try both.@@fontende

    • @fontende
      @fontende 10 หลายเดือนก่อน

      @@allenbythesea yeah, it's great, I have 14 cores Intel Xeon which is enough for big Llama 65b or 70b, but only RTX 2070 super, if you have errors by adding GPU to CPU you can limit the number of used threads, with my card it's like 10 without errors of model loading. Also I have total 128gb ram - many RAM is important for CPU, in GPU you cannot add or help with that.

  • @jonascale
    @jonascale 10 หลายเดือนก่อน +1

    yes can we see the full tutorial please

  • @MotoWilliams
    @MotoWilliams 10 หลายเดือนก่อน

    What's the result of these horse races when they're generating something other than Python?