Ranking: Which LLMs are the BEST FOR 2025? (Ranking Every LLM Released in 2024!)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 54

  • @michaeltse321
    @michaeltse321 20 ชั่วโมงที่ผ่านมา +16

    wow openai wont exist in 5 years

    • @legendarystuff6971
      @legendarystuff6971 18 ชั่วโมงที่ผ่านมา

      There's a
      Al as a product and AI as a scientific field, and open AI still have are a couple of years advantage in AI research. Their product releases have been more conservative but they are ahead. I think you have a point though, many companies won't be around in 5 years. As they always do, they will probably consolidate power between 2 or 3 giants. Might even be a monopoly lol

  • @whitewalter1213
    @whitewalter1213 16 ชั่วโมงที่ผ่านมา +6

    DeepSeek-V3 has been updated in their internal API. Gonna test it.

  • @HedleyPugh
    @HedleyPugh 12 ชั่วโมงที่ผ่านมา +4

    code
    Aider LLM Leaderboards
    1. 62% - o1
    2. 45% - Sonnet
    3. 28% - Haiku
    4. 18% - DeepSeek
    5. 15% - GPT-4o
    6. 8% - Qwen

  • @luizgustavs
    @luizgustavs 19 ชั่วโมงที่ผ่านมา +6

    Qwen 2.5 Coder might not be the highest-performing coding model, but I find it extremely valuable in my daily workflow. It holds well against leading closed-source models, and its size allow run on a 3090 24GB GPU with decent quantization. Perhaps ranking it as B-tier is overly critical for a model that could arguably be the best open-source coding solution (under 70B) available.

    • @luizgustavs
      @luizgustavs 18 ชั่วโมงที่ผ่านมา +1

      Also, Gemma 2, even being a little old by now, excels in multilingual conversation, arguably ranking among the best for natural, human-like interactions in non-English languages and translation tasks. He is the only model that I tried so far that naturally use emojis and put some emotion when writing

    • @AICodeKing
      @AICodeKing  17 ชั่วโมงที่ผ่านมา

      I prefer Phi 4 14b rather than coder 32b because it performs same as Coder

    • @Archonsx
      @Archonsx 17 ชั่วโมงที่ผ่านมา

      its not overly critical, its honest!, there’s unlimited options that do exactly the same thing this qwen 2.5 “coder” assistant does, cause its not a coder! its an assistant ai that assists with code, lets be honest here.

    • @ernestuz
      @ernestuz 15 ชั่วโมงที่ผ่านมา

      The 32B model worked very well for me (most my scripts are tuned for Codestral). Phi 4 is excellent, but the 16K token context kills it in coding tasks.

  • @Archonsx
    @Archonsx 19 ชั่วโมงที่ผ่านมา +5

    best open source model is definitely the 70b Nemotron from nvidia

  • @bibekroy478
    @bibekroy478 19 ชั่วโมงที่ผ่านมา +3

    Can you pls make a video on which models are best at coding and which's apis are free of cost or with some rate limits

    • @louisradoc6868
      @louisradoc6868 19 ชั่วโมงที่ผ่านมา +2

      Use cursor you have unlimited message and got all the models

  • @DCinzi
    @DCinzi 7 ชั่วโมงที่ผ่านมา

    Mistral 7B is the workhorse of the local model community for function calling

  • @ahmedd.masoud6809
    @ahmedd.masoud6809 17 ชั่วโมงที่ผ่านมา +1

    That's a good video,
    Would love to see another video comparing the s teir models when it comes to general use and coding and other features ..
    Thank you

  • @RuiAndrada
    @RuiAndrada 13 ชั่วโมงที่ผ่านมา +1

    When Claude Opus 3.5? I'm looking forward for it!

  • @Bla_ck_LA_Goon
    @Bla_ck_LA_Goon 20 ชั่วโมงที่ผ่านมา +4

    I wish you and everyone in this niche good health.🥰🥰🥰🤗

  • @ewm5487
    @ewm5487 20 ชั่วโมงที่ผ่านมา +5

    I onboarded your channel just a month ago and became a daily viewer! You create magnificent content, a true inspiration!
    As for the models I can tell you're a 'little' biased towards gpt-4o 😂 but that's ok! I worked with it a lot in 2024 and it's good for non-coding tasks. Of course, openAI is expensive and I avoid them where I can. Merry Christmas ⛄ 🎄

    • @savasava9923
      @savasava9923 19 ชั่วโมงที่ผ่านมา +2

      yeah, most of this channels about coding task.

  • @Ahwu_AIClass
    @Ahwu_AIClass 15 ชั่วโมงที่ผ่านมา +2

    When I translate English to Chinese, I find gemma2 is a pretty good model

  • @Alen_115
    @Alen_115 20 ชั่วโมงที่ผ่านมา +8

    I have loved 3.5 sonnet from the last 6 months, after that gemini

    • @aculz
      @aculz 20 ชั่วโมงที่ผ่านมา +2

      yea gemini big comeback can destroy any model, beside its flash version, its not even pro model. what an amazing

  • @Smartbuis
    @Smartbuis 13 ชั่วโมงที่ผ่านมา +1

    Video tierlist text to image ai ?

  • @maddoxthorne2297
    @maddoxthorne2297 16 ชั่วโมงที่ผ่านมา

    Happy Holidays you guys

  • @climateireland7546
    @climateireland7546 12 ชั่วโมงที่ผ่านมา

    Great wrap up.
    Would be great to have an idea about which of the models you mention are ideal for local set ups

  • @flutterflowexpert
    @flutterflowexpert 20 ชั่วโมงที่ผ่านมา +1

    Which one is best for coding?

    • @today8472
      @today8472 20 ชั่วโมงที่ผ่านมา +2

      I use O1-preview in api and he’s really smart and can do many coding challenges. On the other hand - Claude 3.5 Sonnet, that’s a really good model for coding.

    • @aculz
      @aculz 20 ชั่วโมงที่ผ่านมา

      sonnet the best for now, but the pricing is a bit high. for free amazing model now is gemini , for open source want run it locally you can use Llama 70b or Phi-4 40b. other than that, if you dont want to think and just want to use api directly with exteme cheap price, deepseek is the only answer. that what the best now

    • @savasava9923
      @savasava9923 19 ชั่วโมงที่ผ่านมา

      @@today8472 its just u need to wait alot of time, its just bad when u are in flow state. its different when using claude 3.5 sonnet its just instant

    • @gabrieleferrari2266
      @gabrieleferrari2266 18 ชั่วโมงที่ผ่านมา +1

      Sonnet 3.5 and o1-mini

  • @sinapxiagency
    @sinapxiagency 13 ชั่วโมงที่ผ่านมา

    Happy holidays king, we have now deep seek v3

  • @greendsnow
    @greendsnow 15 ชั่วโมงที่ผ่านมา

    OMG, that infamous table...

  • @Archonsx
    @Archonsx 19 ชั่วโมงที่ผ่านมา +1

    i have tested the qwen model 16fp and i agree, its bad

  • @cw.only.channel
    @cw.only.channel 12 ชั่วโมงที่ผ่านมา

    Thank you ❤

  • @idrmn
    @idrmn 15 ชั่วโมงที่ผ่านมา

    check OLMo 2 from allenai

  • @jimlynch9390
    @jimlynch9390 17 ชั่วโมงที่ผ่านมา

    Thanks for the evaluation. I can't argue with your choices.

  • @TitoVelani
    @TitoVelani 19 ชั่วโมงที่ผ่านมา

    I really enjoy your content. and I love that there is a youtuber who likes free stuff as much as I do. My minor disagreement is that R1 deserves to be in S-tier. Keep up the great videos.

  • @stefanosantini9039
    @stefanosantini9039 20 ชั่วโมงที่ผ่านมา

    Thanks for the Xmas gift :) very useful video. Happy holidays

  • @legendarystuff6971
    @legendarystuff6971 18 ชั่วโมงที่ผ่านมา

    I understand why you put gpt4o in c tier but the mini is definitely a if not s tier. It shits on gemini flash for the price. If you ever tried to use them to get structured output you would know what I mean. Yes gemini flash is cheaper but what help is that if it fails more often than not and wastes my tokens. Price isn't everything. Even the 2.0 flash exp you have to recognise it's not going to be free forever. It also fails a lot more often than gpt4o mini at structured outputs. Anyway, I respect your opinion and I like your channel. Thanks for the videos also Merry Christmas!

    • @AICodeKing
      @AICodeKing  17 ชั่วโมงที่ผ่านมา

      But 2.0 flash is going to be cheaper than GPT 4O Mini still.. So, there's that.

  • @kevinmolina6692
    @kevinmolina6692 16 ชั่วโมงที่ผ่านมา

    i appreciate the video, thanks for validating my findings. im sorry for reaching, it would be interesting for you to have your own little open source repo like website, it would be really fun to mess around with your tables. i think your ability to not pay for coding is keeping you honest and a outspoken factual source against mainstream tendencies. thanks for all your effort.

  • @SinghShisht
    @SinghShisht 16 ชั่วโมงที่ผ่านมา

    Sonnet is best but It just not free

  • @perschistence2651
    @perschistence2651 17 ชั่วโมงที่ผ่านมา

    I heavily disagree with the Open AI Models in your ranking. This is not their tier, this is your preference. They are A Tier in my eyes, if we talk about performance. The only B-Tier model is maybe 4O Mini.

    • @AICodeKing
      @AICodeKing  17 ชั่วโมงที่ผ่านมา

      Yes, it's my ranking and my preference. It differs for people to people. The ranking is on the economic side. Every OpenAI model has a better low cost or better performing counterpart.

  • @Abhiishek-G
    @Abhiishek-G 19 ชั่วโมงที่ผ่านมา +1

    *Been eagerly waiting for this video!📹Thank you so much and sending lots of love from India!🇮🇳💌🙏धन्यवाद 🇮🇳*

  • @manjula_1
    @manjula_1 19 ชั่วโมงที่ผ่านมา

    Feeling sad for Gemma

  • @dragonbing
    @dragonbing 8 ชั่วโมงที่ผ่านมา

    It's crazy how openai keep making gpt 4 worse overtime

  • @UsmanAli-ve6tq
    @UsmanAli-ve6tq 18 ชั่วโมงที่ผ่านมา

    I can't believe my favorite models 4o and mini o1 were B and C tier models. There appears to be some bias against open ai

    • @gabrieleferrari2266
      @gabrieleferrari2266 18 ชั่วโมงที่ผ่านมา +1

      4o Is trash, o1-mini good for coding

    • @AICodeKing
      @AICodeKing  17 ชั่วโมงที่ผ่านมา +1

      No bias, just truth. For the price of O1 mini, you can get Sonnet, which is insanely better. It's not economical to use O1 mini (as it also produces more tokens)

    • @mrinalraj7166
      @mrinalraj7166 14 ชั่วโมงที่ผ่านมา

      Yes he is biased we all can see. Just use the models and see it for yourself. Just go and check for some prompts on llama 3.2 and gpt 4o. How can they both lie in the same category. Its just nonsense. Just make 10 new accounts on openai and use gpt 4o for unlimited hours.

  • @etomproductions3190
    @etomproductions3190 21 นาทีที่ผ่านมา

    You are announcing yourself as Universal Benchmark?
    This is your own personal opinion and in my opinion it worth zero value

  • @woufwolf
    @woufwolf 15 ชั่วโมงที่ผ่านมา

    LOL