There's a Al as a product and AI as a scientific field, and open AI still have are a couple of years advantage in AI research. Their product releases have been more conservative but they are ahead. I think you have a point though, many companies won't be around in 5 years. As they always do, they will probably consolidate power between 2 or 3 giants. Might even be a monopoly lol
Qwen 2.5 Coder might not be the highest-performing coding model, but I find it extremely valuable in my daily workflow. It holds well against leading closed-source models, and its size allow run on a 3090 24GB GPU with decent quantization. Perhaps ranking it as B-tier is overly critical for a model that could arguably be the best open-source coding solution (under 70B) available.
Also, Gemma 2, even being a little old by now, excels in multilingual conversation, arguably ranking among the best for natural, human-like interactions in non-English languages and translation tasks. He is the only model that I tried so far that naturally use emojis and put some emotion when writing
its not overly critical, its honest!, there’s unlimited options that do exactly the same thing this qwen 2.5 “coder” assistant does, cause its not a coder! its an assistant ai that assists with code, lets be honest here.
The 32B model worked very well for me (most my scripts are tuned for Codestral). Phi 4 is excellent, but the 16K token context kills it in coding tasks.
That's a good video, Would love to see another video comparing the s teir models when it comes to general use and coding and other features .. Thank you
I onboarded your channel just a month ago and became a daily viewer! You create magnificent content, a true inspiration! As for the models I can tell you're a 'little' biased towards gpt-4o 😂 but that's ok! I worked with it a lot in 2024 and it's good for non-coding tasks. Of course, openAI is expensive and I avoid them where I can. Merry Christmas ⛄ 🎄
I use O1-preview in api and he’s really smart and can do many coding challenges. On the other hand - Claude 3.5 Sonnet, that’s a really good model for coding.
sonnet the best for now, but the pricing is a bit high. for free amazing model now is gemini , for open source want run it locally you can use Llama 70b or Phi-4 40b. other than that, if you dont want to think and just want to use api directly with exteme cheap price, deepseek is the only answer. that what the best now
I really enjoy your content. and I love that there is a youtuber who likes free stuff as much as I do. My minor disagreement is that R1 deserves to be in S-tier. Keep up the great videos.
I understand why you put gpt4o in c tier but the mini is definitely a if not s tier. It shits on gemini flash for the price. If you ever tried to use them to get structured output you would know what I mean. Yes gemini flash is cheaper but what help is that if it fails more often than not and wastes my tokens. Price isn't everything. Even the 2.0 flash exp you have to recognise it's not going to be free forever. It also fails a lot more often than gpt4o mini at structured outputs. Anyway, I respect your opinion and I like your channel. Thanks for the videos also Merry Christmas!
i appreciate the video, thanks for validating my findings. im sorry for reaching, it would be interesting for you to have your own little open source repo like website, it would be really fun to mess around with your tables. i think your ability to not pay for coding is keeping you honest and a outspoken factual source against mainstream tendencies. thanks for all your effort.
I heavily disagree with the Open AI Models in your ranking. This is not their tier, this is your preference. They are A Tier in my eyes, if we talk about performance. The only B-Tier model is maybe 4O Mini.
Yes, it's my ranking and my preference. It differs for people to people. The ranking is on the economic side. Every OpenAI model has a better low cost or better performing counterpart.
No bias, just truth. For the price of O1 mini, you can get Sonnet, which is insanely better. It's not economical to use O1 mini (as it also produces more tokens)
Yes he is biased we all can see. Just use the models and see it for yourself. Just go and check for some prompts on llama 3.2 and gpt 4o. How can they both lie in the same category. Its just nonsense. Just make 10 new accounts on openai and use gpt 4o for unlimited hours.
wow openai wont exist in 5 years
There's a
Al as a product and AI as a scientific field, and open AI still have are a couple of years advantage in AI research. Their product releases have been more conservative but they are ahead. I think you have a point though, many companies won't be around in 5 years. As they always do, they will probably consolidate power between 2 or 3 giants. Might even be a monopoly lol
DeepSeek-V3 has been updated in their internal API. Gonna test it.
code
Aider LLM Leaderboards
1. 62% - o1
2. 45% - Sonnet
3. 28% - Haiku
4. 18% - DeepSeek
5. 15% - GPT-4o
6. 8% - Qwen
Qwen 2.5 Coder might not be the highest-performing coding model, but I find it extremely valuable in my daily workflow. It holds well against leading closed-source models, and its size allow run on a 3090 24GB GPU with decent quantization. Perhaps ranking it as B-tier is overly critical for a model that could arguably be the best open-source coding solution (under 70B) available.
Also, Gemma 2, even being a little old by now, excels in multilingual conversation, arguably ranking among the best for natural, human-like interactions in non-English languages and translation tasks. He is the only model that I tried so far that naturally use emojis and put some emotion when writing
I prefer Phi 4 14b rather than coder 32b because it performs same as Coder
its not overly critical, its honest!, there’s unlimited options that do exactly the same thing this qwen 2.5 “coder” assistant does, cause its not a coder! its an assistant ai that assists with code, lets be honest here.
The 32B model worked very well for me (most my scripts are tuned for Codestral). Phi 4 is excellent, but the 16K token context kills it in coding tasks.
best open source model is definitely the 70b Nemotron from nvidia
Can you pls make a video on which models are best at coding and which's apis are free of cost or with some rate limits
Use cursor you have unlimited message and got all the models
Mistral 7B is the workhorse of the local model community for function calling
That's a good video,
Would love to see another video comparing the s teir models when it comes to general use and coding and other features ..
Thank you
When Claude Opus 3.5? I'm looking forward for it!
I wish you and everyone in this niche good health.🥰🥰🥰🤗
I onboarded your channel just a month ago and became a daily viewer! You create magnificent content, a true inspiration!
As for the models I can tell you're a 'little' biased towards gpt-4o 😂 but that's ok! I worked with it a lot in 2024 and it's good for non-coding tasks. Of course, openAI is expensive and I avoid them where I can. Merry Christmas ⛄ 🎄
yeah, most of this channels about coding task.
When I translate English to Chinese, I find gemma2 is a pretty good model
I have loved 3.5 sonnet from the last 6 months, after that gemini
yea gemini big comeback can destroy any model, beside its flash version, its not even pro model. what an amazing
Video tierlist text to image ai ?
Happy Holidays you guys
Great wrap up.
Would be great to have an idea about which of the models you mention are ideal for local set ups
Which one is best for coding?
I use O1-preview in api and he’s really smart and can do many coding challenges. On the other hand - Claude 3.5 Sonnet, that’s a really good model for coding.
sonnet the best for now, but the pricing is a bit high. for free amazing model now is gemini , for open source want run it locally you can use Llama 70b or Phi-4 40b. other than that, if you dont want to think and just want to use api directly with exteme cheap price, deepseek is the only answer. that what the best now
@@today8472 its just u need to wait alot of time, its just bad when u are in flow state. its different when using claude 3.5 sonnet its just instant
Sonnet 3.5 and o1-mini
Happy holidays king, we have now deep seek v3
OMG, that infamous table...
i have tested the qwen model 16fp and i agree, its bad
Thank you ❤
check OLMo 2 from allenai
Thanks for the evaluation. I can't argue with your choices.
I really enjoy your content. and I love that there is a youtuber who likes free stuff as much as I do. My minor disagreement is that R1 deserves to be in S-tier. Keep up the great videos.
Thanks for the Xmas gift :) very useful video. Happy holidays
I understand why you put gpt4o in c tier but the mini is definitely a if not s tier. It shits on gemini flash for the price. If you ever tried to use them to get structured output you would know what I mean. Yes gemini flash is cheaper but what help is that if it fails more often than not and wastes my tokens. Price isn't everything. Even the 2.0 flash exp you have to recognise it's not going to be free forever. It also fails a lot more often than gpt4o mini at structured outputs. Anyway, I respect your opinion and I like your channel. Thanks for the videos also Merry Christmas!
But 2.0 flash is going to be cheaper than GPT 4O Mini still.. So, there's that.
i appreciate the video, thanks for validating my findings. im sorry for reaching, it would be interesting for you to have your own little open source repo like website, it would be really fun to mess around with your tables. i think your ability to not pay for coding is keeping you honest and a outspoken factual source against mainstream tendencies. thanks for all your effort.
Sonnet is best but It just not free
I heavily disagree with the Open AI Models in your ranking. This is not their tier, this is your preference. They are A Tier in my eyes, if we talk about performance. The only B-Tier model is maybe 4O Mini.
Yes, it's my ranking and my preference. It differs for people to people. The ranking is on the economic side. Every OpenAI model has a better low cost or better performing counterpart.
*Been eagerly waiting for this video!📹Thank you so much and sending lots of love from India!🇮🇳💌🙏धन्यवाद 🇮🇳*
Feeling sad for Gemma
It's crazy how openai keep making gpt 4 worse overtime
I can't believe my favorite models 4o and mini o1 were B and C tier models. There appears to be some bias against open ai
4o Is trash, o1-mini good for coding
No bias, just truth. For the price of O1 mini, you can get Sonnet, which is insanely better. It's not economical to use O1 mini (as it also produces more tokens)
Yes he is biased we all can see. Just use the models and see it for yourself. Just go and check for some prompts on llama 3.2 and gpt 4o. How can they both lie in the same category. Its just nonsense. Just make 10 new accounts on openai and use gpt 4o for unlimited hours.
You are announcing yourself as Universal Benchmark?
This is your own personal opinion and in my opinion it worth zero value
LOL