Forget Deepseek, Here's another MAX Release from China!
ฝัง
- เผยแพร่เมื่อ 5 ก.พ. 2025
- Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model
Qwen2.5-Max, a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies
Qwen Chat here - chat.qwenlm.ai/
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1lit...
🧭 Follow me on 🧭
Twitter - / 1littlecoder
The model has been open sourced - huggingface.co/collections/Qwen/qwen25-1m-679325716327ec07860530ba
❤
I been telling folks for years that Chinese ai was no joke. People were mostly ignoring all the advances for the longest time. I been running mostly Chinese local models sense forever.
Why forget bro? Deepseek is still good :D 😂
yeah maybe don't forget :D
@@1littlecoder😂
It’s a figure of speech. He doesn’t mean it literally. 🤫
Not the DeepSeek 7B model, I happily deleted that useless model.
@@DailySpark_365 I understand bruh xD
Dude...I follow your channel...your testing of deepseek was very interesting. It really shows that you are passionate about these things. Love and Wishes from Norway.
Not sure if you follow chess, but always curious about Norway because of Magnus Carlsen. Great to hear from you (from Norway)!
I fully agree about his excellence and passion.
Qwen2.5-Max is the most powerful language model in the Qwen series. It achieves excellent performance in complex reasoning, instruction following, mathematics, coding, role-playing, creative writing, etc.
Maximum context length: 32,768 tokens
Maximum generation length: 8,192 tokens
Modality: text
Today: forget deepseek
Tomorrow: Qween is dead
Past tomorrow: GPT R05 killer
Past past tomorrow: stop using llama15
We are tired of the hype BS clickbait
GPT Ro5, i see what you did their 😊
I’m here before 100k subscribers, as usual top content! I really like how you you reference “like do you remember when..it helps connect everything and keeps our brains in focus in connecting the progress of these models”
The one thing I noticed is these benchmarks “including deepseek 17b model say they are better than Claude but imo it doesn’t really communicate the same way especially when coding.”
Data storage 😂 at this point the game is over the models are all trained, there isn’t some transformation of models because of our inputs, we are just “uneducated in some areas…” this is just my opinion “the models are smarter than us, we just feel like the cook when the tool is helping us cook”
i watched a lot of channels for this new wave of AI bots but you are the most reasonable, not too hyped or too negative but very very practical. thank you sir 🙏 for me & a lot of people the ability of AI bots to output long codes and error free in first try is really important to avoid the debugging headache.
I appreciate that kind feedback! Hope I can do more of these!
Let the inference battles begin, thank you for your video more insight is always helpful is it open source and open weights?
Who cares where the data is being stored as long as it works
Following you since your matplotlib video, atleast since then I started following your in my feed. Glad as the AI revolution is coming, you are finally going to have the moment to shine. Keep up the good work.
How about kimi k1.5? Multi- model of r1/o1, Can you also test it?
I don't know if stack overflow is okay for "copying UI stuff" but when it comes to systems software - OS, k8s, packet forwarding, claude gives around 10x productivity benefit over trying to find a proper solution in stack overflow.
And claude 3.5 sonnet is mostly correct than GPT-4o except their infra is not solid
Tested it and it got all the tough questions right, including the 10 sentences that end in "apple".
as far as i Know Qwen had Max and Pro variant for a while but both are API only not open weights
Its not available on openrouter yet, what ddo we know about function calling for agents on this one?
I am really confused, I have tried deepseek r1 7b also the 1.3b locally and gave it a simple deep merge task in JavaScript
Once it's starting thinking even after 10 mins it's just thinking and thinking but it codeqwen 7b it's solved instantly
I really don't understand why it's so hyped or maybe my one is broken really confused
Waiting for "pro" version of model. Then "pro max". Then "ultra". Oh, sht, no, sorry, wrong room
Please switch to dark mode in vs code
Your IDE is in bright mode, are you trying to hurt eyes? 😅
@@BracerJack sorry I recently made a presentation with a projector and with the dark mode no one could see anything, will change again!
@1littlecoder 👍
Just saw your channel, super cool 😎
@1littlecoder awe, big hugz to you 🤗
Put your blue screen on bud...I hate this woke...please change everything cos I'm still breastfeeding from mama at 30 bs. Hate it. No real men left, just trans-firmers.😂😂😂
Guess what the largest position in Michael Burry's (featured in The Big Short movie) portfolio is. Yes, it's Alibaba.
He's shorting or shorted Alibaba?
Doesn't Deepseek R1 perform better than Qwen?
R1 is a reasoning model ,here the comparison is with V3 - non-reasoning model!
TRULY IMPRESSIVE
China is not stopping now
The way stocks of NVIDIA has dropped(deepseek) my god too much drama is coming in the universe of AI 😅
I have read about space race between US and Russia and now it's going to be an AI Race between US and China
You mean a race between China Chinese Engineers and Chinese Engineers hired by American companies ?
Up 8% on the 17% drop, so the market kinda thinks its not a serious long term threat. My opinion is, the market has no idea...as usual 😮
Is qwen an open source?
yes
Qwen 2.5 plus & Max ain't the rest aren't.
hey, man I really love your work
Wait til you see the one coming in about 6 or 7 weeks. Start shorting the for profit Ai stocks in the USA.
32,768 tokens size limit, can't do much really.
Swear, deep seek v3 was pretty fire
As much as I like China pumping out new models and putting pressure on companies like OpenAI, this model is not impressive at all judging from the quick test I did online. It's at best at the level of an average 70b at q4. It not only failed to answer my questions it even stuck to it's incorrect answer despite me telling it that it is incorrect and to try again while giving it huge hints. So it actually failed even harder. If I ask Claude something it gets wrong and tell it so and give it hints, it usually picks that up and tries to improve instead of ignoring me and running in circles. So just failing is one thing but not being able to collaborate in a productive way is another issue.
Forget everything, tomorrow another God made AI will come 😊😊😊
thanks a lot
Is it for free?
Yes
China is every day new model
Its joever for openai now
This could have been the title
Dont understand why India isnt no.1 in AI.😮 what went wrong?
Bad place to start an enterprise duhhh
Too busy in shady call centers looking for gift cards
We are busy digging places of worship
@@vivekkarumudi maybe but always go offshore to Seychelles (corporately), so the "place" is irrelevant; I kinda of meant for the skill set of this gen of IT people. And why GOV (if a bad place) didn't/doesn't make it a good place...instead of talent going elsewhere.
No money for GPUs.
you have invested heavily in this channel. Using nonsense pr clickbait like "forget this forget that": just ruins the trust gained by your followers. stop it.
@@paulmuriithi9195 genuinely asking what's wrong in forget x as a prefix, it's a general way to headline things isn't it ?
@@1littlecoder yeah, but for pr?
@1littlecoder stop it. Be factual like Mathew Berman. There's only so many ppl interested in Ai Yt channels. Click bait kills trust. By June 2025, we will use MOE and agents plus avatars to get us all AI NEWS we need.
@@1littlecoder it gives the impression that Qwen is way better than deepseek. In this era of all clickbait AI channels, you stand out, please do not lose that
@ so forget deepseek is not factual but someone talking about deepseek consipracy is factual, woah 👏🏽👏🏽👏🏽 you have a great scale of whats' factuality!
Chinese models are powerful. Also dangerous, IMHO. Ask Qwen or Deepseek about Tianamenn Square, for example, and it breaks them or wont answer. With that kind of jaded filtering do you trust these models with your sensitive data or their ineherent biases? I def do not regardless of how fantastic the reasoning is on DeepSeek. Further, there is not a chance DeepSeek was NOT trained using more advanced chips then the H800's based on a number of known factors. My 2 cents. That said - your videos are fantastic as always.
My latest video was about it th-cam.com/video/WVo01K6hVVs/w-d-xo.html
Qwen didn’t compare against Gemini 1206
Deepseek is the best. All closed model are liars
China need to chill a little 😂
i am very worry when the data of my clients are stored in USA not when they are stored in China.
informative
Interesting another chinese model, headache for EE.UU.
If you care much about data 😂😂 stop using internet 🤪🤪🤪
great but what about CodeFORCE?
Deepseek is the best. All closed model are liars