Qwen-2.5 Coder (32B) + Cline & Aider + Free API : This NEW AI Coding Model BEATS Claude 3.5 Sonnet!?

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 129

  • @ShamusMac
    @ShamusMac หลายเดือนก่อน +43

    If this video is a true reflection of its capabilities, benchmarks aren't just bad, they are broken.

    • @ctwolf
      @ctwolf หลายเดือนก่อน

      this 100%

  • @TheSuperColonel
    @TheSuperColonel หลายเดือนก่อน +6

    I like your channel; it's straight to the point.
    There is likely a lot of competition among the AI with tons of hype.
    We will see how many of them will survive the next 5 years.

  • @bamit1979
    @bamit1979 หลายเดือนก่อน +39

    Thank you for saving our time! :)

    • @TaughtByTech
      @TaughtByTech หลายเดือนก่อน +2

      i know right. really the AI king

    • @ashgtd
      @ashgtd หลายเดือนก่อน +1

      yup saved me a big fat download today

    • @notme2136
      @notme2136 หลายเดือนก่อน

      yup, saved me a chunk of my time this week.

    • @Quitcool
      @Quitcool หลายเดือนก่อน

      Wrong, that's a great model according to other youtubers and the open source community.

    • @ashgtd
      @ashgtd หลายเดือนก่อน

      @@Quitcool are they just saying that for clicks though? if I see a video with this model not sucking ass then I'll try it

  • @Andres-m2u
    @Andres-m2u หลายเดือนก่อน +4

    the maximum achievable with Qwen2.5-Coder32b (131k context window) was a around 100k tokens. Then it slowed down to a timeout. But impressive...

    • @RaffaelloTamagnini
      @RaffaelloTamagnini หลายเดือนก่อน

      true , just tested , and with 24gb gpu offload too on a machine with 192gb of ram. 131k context want too much memory

  • @goldenglowitsolutions
    @goldenglowitsolutions หลายเดือนก่อน +5

    Thanks for sharing this with us, your content is gold!
    I tried Qwen 2.5 coder yesterday on my Intel Core I7, 16GB RAM DDR4, RTX 3050 (4GB VRAM) and it struggled with Bolt.
    So I guess that I should only use Open-Source Local AI models for generating text, for now...

    • @aleksanderspiridonov7251
      @aleksanderspiridonov7251 หลายเดือนก่อน

      YOU NEED AT LEAST MAC WITH 32GB RAM M3-M4 I THINK BUT BETTER 2-3 3090 MINIMUM FOR +- GOOD WORK BUT ALSO OPENROUTER CHEAP

    • @johnnyarcade
      @johnnyarcade หลายเดือนก่อน

      @@aleksanderspiridonov7251 WOULD THE NEW MACBOOK PRO WITH 40 GPU CORES AND 48GB RAM WORK WELL ENOUGH OR SHOULD I OPT FOR MORE RAM?

    • @handfuloflight
      @handfuloflight หลายเดือนก่อน +1

      @@aleksanderspiridonov7251 y u screamin son

  • @alexjensen990
    @alexjensen990 หลายเดือนก่อน +4

    BTW, you had me laughing so hard at the whole "why the hell am I using it then!" comment. Truly priceless.

  • @maddoxthorne2297
    @maddoxthorne2297 หลายเดือนก่อน +11

    Others: It answers the benchmark questions well so no need to run it.
    AICodeKing: Hold my beer.👑

    • @ctwolf
      @ctwolf หลายเดือนก่อน

      AICodeKing is actually a deity

  • @CPM94
    @CPM94 หลายเดือนก่อน +27

    Those dancing pokemons clearly stole the spotlight of the vidoe

  • @sammcj2000
    @sammcj2000 หลายเดือนก่อน +11

    Looking at your output it almost seems as if you or the model provider you're using is using the wrong chat templates + inference parameters that aren't configured for coding tasks.
    What about the temperature - it should be set to 0 for coding, and you should use a top_p of no higher than about 0.85.
    Did you set the context size to something reasonable?
    I've found the 32b model to be really impressive, certainly the best open weight model out there by far.
    In in my experience Cline specially it's not very good with any models other than Claude which it was originally written for.

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน +2

      I did try it with Fireworks and it was the same results. It might be that Cline is not okay with the model. But, even if you consider the aider results.. It's too buggy and not good at all if you're working on bigger application with mutliple context of files..

    • @sammcj2000
      @sammcj2000 หลายเดือนก่อน +5

      @@AICodeKingthanks for the extra info. I might try a couple of your common prompts running the model directly without aider or cline in the mix to see if it's a templating issue. It could be something like then using the default chatml template and not the proper updated Qwen 2.5 toolcalling template - or something along those lines.

    • @MM-24
      @MM-24 หลายเดือนก่อน

      @@sammcj2000 would love to see what analysis you come up with - thanks for double checking. super helpful

    • @bodyguardik
      @bodyguardik หลายเดือนก่อน

      He didnt even downloaded it as it seems. This video is about some crap online service

  • @gmag11
    @gmag11 หลายเดือนก่อน

    I love your style. Go on like this. AI coding a great use case for LLM. I'm learning a lot with your videos

  • @darkreader01
    @darkreader01 หลายเดือนก่อน +2

    I also did a test before seeing your video and my conclusion was "trash", at least for my use case. After seeing your video, I see that I am not the only one! It's not worth the hype.

  • @fezkhanna6900
    @fezkhanna6900 หลายเดือนก่อน +18

    hahahah, "man if i have to implement it myself, why the hell am I using this". This made me laugh (9:50)

    • @peacekeepermoe
      @peacekeepermoe หลายเดือนก่อน +1

      same 😂😂 man I'd be mad too if AI is asking me to do something I asked it to do for me in the first place. It's like who is the master and who is the slave here goddamnit?

  • @Piotr_Sikora
    @Piotr_Sikora หลายเดือนก่อน +1

    Model in hyperbolic use 128k context window?

  • @Dyson_Lu
    @Dyson_Lu หลายเดือนก่อน

    Strange, Cole Medin got great results and did Simon Willison. Both were extremely impressed.

  • @maertscisum
    @maertscisum หลายเดือนก่อน +1

    I am guessing that the benchmarking use a carefully engineered prompting to beat other models. I have always questions validity of each model benchmark claim. There should be a formal body with standard test sets to run the benchmark.

  • @konstantinoskonstantinos8524
    @konstantinoskonstantinos8524 หลายเดือนก่อน +1

    Is Hyperbolic using the Instruct model or the Base one?

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน +1

      Instruct and unquantized as well.

  • @davidcarey37
    @davidcarey37 หลายเดือนก่อน +1

    Thank you very much. Well explained and informative as always, and in this case it has definitely “seperated the wheat from the chaff” … qwen 2.5 coder seems very disappointing.

  • @phoenyfeifei
    @phoenyfeifei หลายเดือนก่อน +2

    I find Cline just doesn't work with OLLAMA local model very well. Their developer appears to blame these OLLAMA models are heavily quantized which I do agree, but I run Q8 and FP16 models but still getting same shitty result

  • @AaronBlox-h2t
    @AaronBlox-h2t 25 วันที่ผ่านมา

    Interesting....thanks for the video.

  • @raj4624
    @raj4624 หลายเดือนก่อน

    thanks for this hyperbolic webite.. it helped me

  • @jaynucca
    @jaynucca หลายเดือนก่อน

    Thank you for being honest! I wanted to love Qwen 2.5 Coder as well, but it just can't actually do anything useful beyond VERY simple applications.

  • @lcarv20
    @lcarv20 หลายเดือนก่อน

    Hi there, In your first prompt qwen was trying to generate the build files, and node_modules, maybe if you had the project setup wouldn’t try to generate that much code? Can you try?

    • @lcarv20
      @lcarv20 หลายเดือนก่อน

      Ok after seeing the whole video I understand that it wouldn’t matter.

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน +1

      I had created the NextJS App before hand.

  • @jamesbuesnel5054
    @jamesbuesnel5054 หลายเดือนก่อน +6

    Ahahah your hate for cursor is hilarious 😂

  • @tomwawer5714
    @tomwawer5714 หลายเดือนก่อน +1

    I run 32b on 6GB vram it’s slow about token/s but works.

    • @amit4rou
      @amit4rou 7 วันที่ผ่านมา

      But, even though it's a token/sec the quality of the output is not the same as compared to adequate hardware, isn't it?

    • @tomwawer5714
      @tomwawer5714 7 วันที่ผ่านมา

      @@amit4rougood question. I didn’t test it too much as it’s too slow.

    • @amit4rou
      @amit4rou 7 วันที่ผ่านมา

      @@tomwawer5714 I have 8GB vram, I am not sure if it can run 14b or 7b...

    • @tomwawer5714
      @tomwawer5714 7 วันที่ผ่านมา

      I run 23b without any issues quite fast with q4 quantisation on 6Gb. You can even run flux medium for image gen.

    • @amit4rou
      @amit4rou 7 วันที่ผ่านมา

      @@tomwawer5714 I want to run a good coding model like qwen 2.5 coder.. maybe I can run 14b variant or 7b don't wanna run a heavily quantized version...
      Also Google colab gives 12+ GB vram so maybe can somehow run on colab there are few videos showing how to do that...

  • @ghosert
    @ghosert หลายเดือนก่อน

    what is the smaller local LLM model which you think is better than Qwen 2.5 coder 32b, thanks, you didn't mention which video I should take a look.

    • @Fenixtremo
      @Fenixtremo หลายเดือนก่อน

      1) Qwen 2.5 coder 32b
      2) Deepseek v2.5 205b
      3) Nope

  • @tecnopadre
    @tecnopadre หลายเดือนก่อน

    Truth testing = reality
    Great job, as usual.
    Congratulations 🎉

  • @alainmona268
    @alainmona268 หลายเดือนก่อน

    hey is it possible you can add Qwen2.5 32B to OpenHands? I tried a million different ways with the help of claude and copilot and chatgpt but couldnt get it running

    • @A-Jaradat-Z
      @A-Jaradat-Z 28 วันที่ผ่านมา

      openRouter?

  • @PhuPhillipTrinh
    @PhuPhillipTrinh หลายเดือนก่อน +1

    lmao good testing king! will you change pokemon one day?

  • @wolverin0
    @wolverin0 หลายเดือนก่อน

    could you make a guide to use cline with the local qwen ?

  • @jjdorig9712
    @jjdorig9712 หลายเดือนก่อน

    I have it running on a single 3090, how do i check how much context window it has?

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน

      I think it should be mentioned on Hugging Face.

  • @onmetrics
    @onmetrics หลายเดือนก่อน +1

    here for the low frequency roasts

  • @yoannthomann
    @yoannthomann หลายเดือนก่อน +3

    The point is done, we need better benchmarks 😢😅

  • @mrpocock
    @mrpocock หลายเดือนก่อน

    So why does it score well in benchmarks if it can't function in these ide or agentic contexts?

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน +1

      You can basically just train models on specific benchmark questions and make them score well in benchmarks but in real life this approach fails.

    • @mrpocock
      @mrpocock หลายเดือนก่อน

      @@AICodeKing That really sucks. Benchmark chasing should be an immediate disqualification. I wonder if there are ways to structure benchmarks so that they produce a randomised but equivalent task. Or alternatively, flood the market with so many benchmarks that it is not practical to over-fit to them all.

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน

      There are actually many benchmarks but you just need to select 5 or 10 and just compare the results with that..

    • @mrpocock
      @mrpocock หลายเดือนก่อน +1

      @@AICodeKing It would be better if model publishers were expected to submit their models to 3rd party benchmarking rather than doing it in-house. We used to have this problem with protein 3d reconstructions. People would publish papers on cooked benchmarks. That's why the CASP protein structure prediction competition was set up.

  • @chadpogs7973
    @chadpogs7973 หลายเดือนก่อน +2

    Wow!! This is it!!

  • @HikaruAkitsuki
    @HikaruAkitsuki หลายเดือนก่อน +1

    Dude. Can you review Blackbox AI? It has Gemini Pro, GPT 4o, Claude Sonnet 3.5 and it's own Blackbox model. It's mostly a chat app AI like anything else but there is also VS Code and JetBrains extension.

  • @cgimoonai
    @cgimoonai หลายเดือนก่อน

    Thank you man!

  • @wasimdorboz
    @wasimdorboz หลายเดือนก่อน

    please answer how u get to know the base url ? hyperbolic ?

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน +1

      You can see it by going to Hyperbolic API Script thing

    • @wasimdorboz
      @wasimdorboz หลายเดือนก่อน

      @@AICodeKing alright thanks bro , you good developer

    • @wasimdorboz
      @wasimdorboz หลายเดือนก่อน

      @@AICodeKing bro there is qwen 2.5 72b and i looked over ai and google i didnt get the base url or how to use it exactly Qwen/...instruct wow instruct and boom work, u good developer

  • @toCatchAnAI
    @toCatchAnAI หลายเดือนก่อน

    is this available on Open Bolt?

  • @ctwolf
    @ctwolf หลายเดือนก่อน

    me @3:50 hell yeah, dancing Pokémon

  • @justtiredthings
    @justtiredthings หลายเดือนก่อน

    what a bummer. I had high hopes for this model

  • @DAZEOFFICIAL
    @DAZEOFFICIAL หลายเดือนก่อน

    Strange, though I think this is a milestone for a local model to be able to even create something using Aider. From my testing. Properly used Aider, cline did not work in my testing. I have a 3090 and it did run at workable speeds.

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน

      Yes, but claiming unbelievable things is never good

  • @CyrilSz
    @CyrilSz หลายเดือนก่อน

    vscode combo with aider +qwencoder ; cline +claude ; continue + opencoder ?

  • @meassess
    @meassess หลายเดือนก่อน

    I did everything right but I get this error:
    # VSCode Visible Files
    (No visible files)
    # VSCode Open Tabs
    (No open tabs)
    # Current Working Directory (d:/Mert - Workspace/test-ai-project) Files
    No files found.

  • @xelerator2398
    @xelerator2398 หลายเดือนก่อน

    Thank you!

  • @JeffreyWang-hh4ss
    @JeffreyWang-hh4ss หลายเดือนก่อน

    I like your objectivity, these small model hypes + marketing are pretty annoying.

  • @MacS7n
    @MacS7n หลายเดือนก่อน

    You made me hate cursor 😅 and to be honest you're right about cline being better 😅

  • @isheriff82
    @isheriff82 หลายเดือนก่อน

    so true bro, i hate it when ppl do that! also aider and cline is way better at everything!

  • @diplobla
    @diplobla หลายเดือนก่อน

    thanks for this 👍

  • @christerjohanzzon
    @christerjohanzzon หลายเดือนก่อน +1

    Great video! Real tests in real apps. I would like to see a full workflow test, from figma design to tested product. Done with NextJS, TS, TailwindCSS and assisted coding with AI all the way from setup to testing, reviewing and deployment.

  • @mz8755
    @mz8755 หลายเดือนก่อน

    It's such a small model and the hype to try compare it with sonnet is where all these start to fail. It should do what a small model should do in some specialized cases. Not to run a general coding agent. It is also specialized for code generation while powering an aider is much more demanding on versital intelligence

  • @wasimdorboz
    @wasimdorboz หลายเดือนก่อน

    bro please us etutorial on electron or tauri or any open source one

  • @fmatake
    @fmatake หลายเดือนก่อน

    Benchmarks always come out 'pretty,' but in real life, I've found that it's far behind even claude-3-5-haiku and gpt-4o-mini.

  • @WaveOfDestiny
    @WaveOfDestiny หลายเดือนก่อน

    benchmarks with smaller models usually are completely bs. They probably distill the bigger models into it, making it memorize benchmark like questions without actually making them smarter.

  • @enloder
    @enloder หลายเดือนก่อน +1

    So I should not use Qwen2.5 Coder 7B anymore?

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน +1

      Depends on your choice.. I see no use of that model for me as of now.. I just use SmolLM2 which is better and can actually be used locally at great speeds on my machine. There's no one size fits all or anything like that.

  • @brandon1902
    @brandon1902 หลายเดือนก่อน

    To make matters worse, outside of coding Qwen2.5 is far worse than Qwen2. Most notably, it hallucinates far more across all domains of knowledge. I really do think you're right that Qwen is optimizing their LLMs for tests at the expense of overall performance. Qwen2 72b used to be almost as good as Llama 3.1 70b, but now Qwen2.5 72b is far worse despite climbing higher on benchmarks.

  • @2005sty
    @2005sty หลายเดือนก่อน

    Alibaba has qwen max model (not opensoure) which is far better then the open source version. But.. strangely they dont show it off. I suspect ...

  • @QorQar
    @QorQar หลายเดือนก่อน

    Hyberpolc free or no?

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน

      Free $10 credits

  • @ZzzKekeke
    @ZzzKekeke หลายเดือนก่อน +2

    can you make the dragons twerk?

  • @antoniofuller2331
    @antoniofuller2331 หลายเดือนก่อน +1

    16 million tokens uploaded just to generate 3 files??!!! 6:30

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน

      I think that it's a bug in cline and that's why it displays that.

    • @antoniofuller2331
      @antoniofuller2331 หลายเดือนก่อน

      @AICodeKing hmm

  • @a1_Cr3at0r
    @a1_Cr3at0r หลายเดือนก่อน

    Dude make video about g4f (gpt4free) API + Cline

  • @ThrivingMotivation28
    @ThrivingMotivation28 หลายเดือนก่อน

    Every LLM model except Sonnet disappoints

  • @emilianosteinvorth7017
    @emilianosteinvorth7017 หลายเดือนก่อน +1

    I have had some issues with cline using models that are not claude/gpt since i think cline requires a model with proper agentic features. It could be a reason why the performance was so poor with it. I think testing qwen using a chatting interface could change the results.

  • @vaioslaschos
    @vaioslaschos หลายเดือนก่อน

    I tried QWEN 2.5 for math because I am taking part in the AIMO Kaggle competition. I cant say that with certainty but I feel they train their models on the benchmarks. In one weird case it did a function calling but also provided me the result (without actually performing the function calling).

    • @jose-lael
      @jose-lael หลายเดือนก่อน

      That’s common for LLMs, try using a wider variety of them and you’ll have an intuition for how LLMs behave.

  • @vladimir12op11
    @vladimir12op11 วันที่ผ่านมา

    thank you fully free??

  • @BeastModeDR614
    @BeastModeDR614 หลายเดือนก่อน

    It doesn't follow instruction

  • @hipotures
    @hipotures หลายเดือนก่อน

    The same with python, garbage produced without end. Maybe it's a problem with ollama?

  • @sinapxiagency
    @sinapxiagency หลายเดือนก่อน

    Time saving !!!

  • @alexjensen990
    @alexjensen990 หลายเดือนก่อน +1

    I'm pretty sure that Qwen is Chinese, right? That may explain the questionable benchmarking.

  • @mnageh-bo1mm
    @mnageh-bo1mm หลายเดือนก่อน

    test it with cursor

  • @다루루
    @다루루 หลายเดือนก่อน

    Very powerful model!!!

  • @TawnyE
    @TawnyE หลายเดือนก่อน

    EE

  • @HansKonrad-ln1cg
    @HansKonrad-ln1cg หลายเดือนก่อน

    very bad at instruction following. it has something in common with my wife there.

  • @paulyflynn
    @paulyflynn หลายเดือนก่อน +1

    Thanks!

    • @AICodeKing
      @AICodeKing  หลายเดือนก่อน

      Thanks a lot for the support!