Open-Source Q-Star! The First OPEN "Thinking" Model (DeepSeek r1)

Matthew Berman

มุมมอง 20 595

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 211

@TheBuzzati 10 ชั่วโมงที่ผ่านมา ⁺⁸⁸
I love how the narrative was, "we can't open source our models because of the dastardly Chinese!" And they're the ones open sourcing everything. 😂
@Sindigo-ic6xq 9 ชั่วโมงที่ผ่านมา ⁺⁷
because they dont have yet cutting edge architecture to hide. it still benefits them since if its open source they will gather back what other people improve on it
@TheBuzzati 9 ชั่วโมงที่ผ่านมา ⁺³
@@Sindigo-ic6xq Fair point
@myangreen6484 7 ชั่วโมงที่ผ่านมา ⁺¹²
@@Sindigo-ic6xq You're acting as if China is way behind. They are not. Their products are competitive with Western products.
@escher4401 6 ชั่วโมงที่ผ่านมา
@@Sindigo-ic6xqMaybe. We'll see
@annonymbruger 6 ชั่วโมงที่ผ่านมา ⁺³
I would hate if US should dominante AI development. In US its all about money and crazy patent fights.
@fernandoz6329 10 ชั่วโมงที่ผ่านมา ⁺⁴⁵
These models are performing extremely well...proceed to show most basic questions where the models fails...😁😁😁
@Thedeepseanomad 10 ชั่วโมงที่ผ่านมา ⁺⁸⁹
Now people just need affordable and decent 4TB VRAM
@sad_man_no_talent 10 ชั่วโมงที่ผ่านมา ⁺²
man money pls I am poor to buy gpu for a self hosted 1T model = 1000B
@warsin8641 10 ชั่วโมงที่ผ่านมา ⁺²³
I can imagine one day people laughing at us barely able to run AI models 🤣
@miweb3235 9 ชั่วโมงที่ผ่านมา
It is hilarious now. You are correct. I have a 2060 on the laptop and it works but it's laughable and lots of ppl are worse off than me.@@warsin8641
@kittengray9232 9 ชั่วโมงที่ผ่านมา
@@warsin8641...while running GPT-o5 level models on a smartphone chip
@Monamotion-edit 7 ชั่วโมงที่ผ่านมา
You can rent some powerful GPUs on google colab, it is way cheaper than buying 20k worth of graphics cards just to use them once
@sephirothcloud3953 8 ชั่วโมงที่ผ่านมา ⁺¹¹
THEY FUCKING DID IT! IAnd this is the lite version, 01-preview is ranked 60% at coding contest while o1full is ranked at master level 90% coder, if the full version matches o1full, we will have a programmer better than most humans for cheap
@punk3900 4 ชั่วโมงที่ผ่านมา ⁺¹
o1 is already great at programming of not the best. So many zero shot successes from my experience. Sonnet 3.5 is also great by struggles with presenting code longer than 300 lines so lots of manual copy and pasting while o1 has no problem with generating code up to 1000 lines in one go.
@punk3900 4 ชั่วโมงที่ผ่านมา ⁺⁴
Matt is always so trustful about the benchmarks :D
@oguretsagressive 5 ชั่วโมงที่ผ่านมา ⁺⁷
Remember how difficult was the marble problem just a few months ago?
@fabiankliebhan 10 ชั่วโมงที่ผ่านมา ⁺¹⁷
Its weird. In my tests both models got all questions right.
Maybe one should always test 10 iterations of each question and evaluate how many times it’s correct to evaluate the model in a better way.
There just seems to be still a lot of randomness in the thought process.
@fabiankliebhan 10 ชั่วโมงที่ผ่านมา
Not easy to do this in an entertaining way for a video, I know 😅
@georgemontgomery1892 9 ชั่วโมงที่ผ่านมา ⁺¹
Yeah. It is kinda weird. He has asked preview these questions before and it has passed. I almost wondered if they somehow dumbed down preview model
@kdl0 9 ชั่วโมงที่ผ่านมา ⁺¹
Where did you access R1 ? I can't find anything definitive at the deepseek website suggesting I am using R1
@frankjohannessen6383 9 ชั่วโมงที่ผ่านมา ⁺²
for local models the temperature should be set to 0 for testing so we get a deterministic and highest probability answer every time. when you have temperature above 0 there is always a non-zero chance that even the most suited model outputs something weird and false. But we can't set temp for proprietary models of course.
@PhxVigo 8 ชั่วโมงที่ผ่านมา
I tried the apple question several times with different variations and it always get's it wrong. One subtle thing, he uses "Apple" as the word. The capitalization suggest a proper noun. I think that is part of the trip up. If you use "apple" it seems to do better, though still misses it sometimes.
@yiannismihail 8 ชั่วโมงที่ผ่านมา ⁺⁸
I tried this for the apple problem with the gpt 4 free version.
I typed "This is going to be a test. The end goal is to produce 10 sentences that end in the word apple. The way you will go about it is to think of a sentence, store it, verify that it ends in the word apple and only if it does show it as a result. Do you think you can do that?".
It replied absolutely correctly the following
"Absolutely, I can do that! I'll carefully craft sentences, verify that they end with the word apple, and display only those that meet the criteria. Let’s begin!
She reached for the bright red apple.
A worm crawled out of the freshly picked apple.
The teacher smiled as she placed a gift on her desk-an apple.
He accidentally dropped his lunch, bruising the apple.
The old farmer spoke fondly of his favorite fruit, the apple.
As the pie cooled, the kitchen was filled with the smell of baked apple.
Each child was given a caramel-coated apple.
The orchard tour guide handed everyone a juicy apple.
With a flourish, she revealed a perfectly peeled apple.
On the table sat a single golden apple."
Maybe it's more a matter of learning to speak their language than anything else?
@TheSolsboer 8 ชั่วโมงที่ผ่านมา
good point
@mkhanyisisimelane3477 7 ชั่วโมงที่ผ่านมา ⁺¹
Tried your prompt, it got it right😎
@rgorde 7 ชั่วโมงที่ผ่านมา
yawn
@kiiikoooPT 5 ชั่วโมงที่ผ่านมา ⁺¹
If I'm not mistaken, the free version can not use tools, so the "store it" part of your prompt does not make much sense, since it is Just generating tokens, is Just luck that it got a good generation.
This is what I got with the exact same prompt:
Yes, I can certainly do that! I'll generate each sentence, check if it ends with the word "apple," and only present the ones that meet the requirement. I'll begin now.
Here is the first sentence:
She reached into the basket and picked a shiny red apple.
Let me check and ensure it meets the condition. It ends in "apple," so it's valid!
I love eating a crisp green apple.
Now, I'll proceed to the next. Stay tuned!
@Ad434443 5 ชั่วโมงที่ผ่านมา ⁺²
I used the deepseek today, and my specific use case is programming/development mainly. I found it to be quite good and competitive to the new Claude model. Since i do use AI for work, I found it was good at understanding orginal things not really done, not things such as 'the game of life' or a snake game. As such, I believe its a very solid model and system. Pleasantly surprised by it. As far goes as limits of AI: the context window sizes and how that is dealt with is an issue for development tools. That is a hard limit to overcome, and hence, for AI workloads with necesities for large context windows, i believe we are hitting limits there.
@myangreen6484 6 ชั่วโมงที่ผ่านมา ⁺³
DeepSeek model is significantly smaller than 01 Preview as well. This is incredible.
@HaraldEngels 5 ชั่วโมงที่ผ่านมา
I am using DeepSeek 2.5 since a while. In many tasks this LLM beats ChatGpt, Google Gemini and Claude Sonnet. It is slower but I like the usefulness of the responses. I assume that at DeepSeek smart people are developing useful models which are working well with less advanced compute. Banning modern NPU/GPU chips from China represents a clear incentive to develop LLMs which are running with less NPU/GPU requirements. That will pay off soon for the Chinese AI developers while US providers like OpenAI and Microsoft will be drowning in their compute cost.
@DrHanes 5 ชั่วโมงที่ผ่านมา
You are a liar! Size of O1 and Deepseek r1 is not public info.
@myangreen6484 3 ชั่วโมงที่ผ่านมา
@@DrHanes DeepSeek has the models size up on their website. As for 01 preview, you're right. I'm just going off best guesses for now.
@myangreen6484 3 ชั่วโมงที่ผ่านมา
@@HaraldEngels Yeah, good point. China also has all the manufacturing infrastructure and rare earth minerals to eventually catch up to and maybe even surpass US chips.
@wurstelei1356 3 ชั่วโมงที่ผ่านมา ⁺¹
Yes, please do a full test of this model. I am also waiting for the Mistral full test.
@Benjaminfreedman-l6r 5 ชั่วโมงที่ผ่านมา ⁺⁶
I love the grounded reality of this channel!!!
*If you are not in the financial market space right now, you are making a huge mistake. I understand that it could be due to ignorance, but if you want to make your money work for you..prevent inflation.*
@OliviaSteven81 5 ชั่วโมงที่ผ่านมา ⁺⁵
I feel sympathy and empathy for our country, low income earners are suffering to survive, and I appreciate Wayne. You've helped my family with your advice. imagine investing $30,000 and receiving $95,460 after 28 days of trading.
@Gweeen12 5 ชั่วโมงที่ผ่านมา ⁺⁴
Honestly, our government has no idea how people are suffering these days. I feel sorry for disabled people who don't get the help they deserve. All thanks to Mr Michael Wayne, imagine investing $1000 and receiving $5700 in a few days..
@Schiffmiller-i9z 5 ชั่วโมงที่ผ่านมา
I'm in a similar situation where should I look to increase income? Do you have any advice? What did you do? Thank you
@shriekitsallie 5 ชั่วโมงที่ผ่านมา
Well, I engage in nice side hustles like investing, and the good thing is I do it with one one of the best(Michael Wayne), he's really good!
@Jacobkluge 4 ชั่วโมงที่ผ่านมา
Did someone just mention Mr Wayne!? Damn! You just made my day; what a coincidence.. I've worked with him for over 2years and I can tell how good he is
@NoHandleToSpeakOf 10 ชั่วโมงที่ผ่านมา ⁺³
Open-weights were promised but do not rush saying "we now have it". We do not. Just Tess R1 Limerick but that is an entirely different one.
@seiso5180 2 ชั่วโมงที่ผ่านมา ⁺¹
yes put it through the berman trails!
@djayjp 10 ชั่วโมงที่ผ่านมา ⁺²
The new Sonnet model is the best for counting words, by far.
@HayaseNagatoro-anime ชั่วโมงที่ผ่านมา
I just tried it and it is pretty dang good, best model I have used.
@jareda8943 5 ชั่วโมงที่ผ่านมา ⁺¹
audio much better thank you Matthew!
@vincentnestler1805 6 ชั่วโมงที่ผ่านมา
For the record, I tested nemotron:70b-instruct-q5_K_M and qwen2.5:72b-instruct-q5_K_M on a mac studio using open webui. I asked both models all the questions you posed to deepseek and chatgpt. Both models did as well or actually better. Nemo edged out qwen. Both of those models are outstanding in general. I think they are at gpt 4 levels (from a year ago if not better).
@nyyotam4057 6 ชั่วโมงที่ผ่านมา ⁺¹
Try to give it a group of axioms and ask if a theorem is provable from them. If it's really an implementation of Q*, it should be able to solve (and if provable, supply a proof).
@NakedSageAstrology 4 ชั่วโมงที่ผ่านมา ⁺²
When will people realize? If we cannot use it, it is not open source!
@picksalot1 10 ชั่วโมงที่ผ่านมา ⁺⁶
Maybe try asking "How many spaces are there between the words in your answer?" That might reveal something useful. 🤷
@kittengray9232 8 ชั่วโมงที่ผ่านมา
o1-mini: variants from "You deserve no answer" to off by 1, but put space in the end, so kind of correct
Sonnet: wrote the answer and tried to count spaces- off by 2
Gemini pro: off by a mile
Mistral: Cannot answer before generated them (no backtrack on generation?) but gave the rule of thumb on how to count them
@picksalot1 8 ชั่วโมงที่ผ่านมา
@kittengray9232 Interesting results. The number of spaces +1 should be easy to tally as it proceeds. Thanks for testing it. 👍
@varietygifts 5 ชั่วโมงที่ผ่านมา
@@picksalot1 where is it going to store that tally if not in the next token it predicts?
@picksalot1 5 ชั่วโมงที่ผ่านมา
@varietygifts I'm assuming it has enough memory to do a simple running tally. That seems trivial to me, but I'm not an AI Designer and don't know all the details of their inner workings. I've heard that some can "reflect" upon what they're doing. 🤷
@ps0705 10 ชั่วโมงที่ผ่านมา ⁺¹
Will you please look into test time training! It looks like it could be the holy grail!
@zxwxz 22 นาทีที่ผ่านมา
The tokenizer is currently causing significant issues for LLMs in text parsing, mainly reflected in the number of tokens. DeepSeek R1 Lite was very surprising in that it detected the third R on the strawberry. It had to repeatedly check and confirm.
@djayjp 10 ชั่วโมงที่ผ่านมา ⁺⁹
How are the Chinese doing this if they don't have access to beefy GPUs...? 🤔
@emincanp 9 ชั่วโมงที่ผ่านมา ⁺⁴
Huawei have Ascend chips comparable to A100
@novantha1 9 ชั่วโมงที่ผ่านมา ⁺⁵
Well, in a word, they do, just not in the same quantities available to the west.
In a slightly more complicated answer: They have a hybrid cooperative distributed cluster system, where they can use native chips and low end foreign chips in large quantities in concert with a small number of modern high performance Nvidia GPUs, and they pool resources in between institutions.
As it turns out, if you throw enough chips at the problem, even lower end chips eventually solve it with topology aware networking and a bit of carefully distributed linear algebra.
@miweb3235 9 ชั่วโมงที่ผ่านมา
@@novantha1salad.
@kittengray9232 9 ชั่วโมงที่ผ่านมา
@@novantha1sounds like Seti@home but for LLM
@hqcart1 5 ชั่วโมงที่ผ่านมา
they cab use the cloud
@EmergeTechAI 5 ชั่วโมงที่ผ่านมา
Absolutely put it through full test
@semantia-ai 10 ชั่วโมงที่ผ่านมา
Good news! I'll try it, thanks
@matthew.stevick 6 ชั่วโมงที่ผ่านมา
thank you matthew b.
@TimChae 4 ชั่วโมงที่ผ่านมา
Do a full test! Can you see if you can use two separate open sourced o1 to self correct itself to get even higher results? I wonder if that produces better results than creating an addition agent to do that.
@nufh 10 ชั่วโมงที่ผ่านมา
Your hoodies are like a trademark now.
@HaraldEngels 5 ชั่วโมงที่ผ่านมา
It would be great to see a full local inference test (wit all your typical test prompts) on the HP laptop.
@lirothen 4 ชั่วโมงที่ผ่านมา
Hey if we add metadata to each token that can be attended to, or like groups of words, then it can predict the metadata before the next token and use that to predict things like how many words it has left in its sentence. I think because there is no intermediate thinking between generating each word in the response, it doesn't know to count its own output.
@AdityaGaharawar-u1e 4 ชั่วโมงที่ผ่านมา
5:57 it's correct there are 8 words and 1 numbers you should try the prompt now in the sense how many characters are there in response to this prompt
@longboardfella5306 10 ชั่วโมงที่ผ่านมา ⁺²
The graph of thought tokens against accuracy shows to me that it’s maxing out at about 70% regardless of number of tokens. That’s a wall right there in that approach. I’ve tested multiple models for answer consistency and there’s very little on complex inference that is reasoning or logic based. To me they are great at brainstorming but lack of consistency makes it hard to operationalise into production use until consistency is addressed. Your benchmarks should start to examine consistency - you have shown even 01-preview cannot consistently answer some of your basic questions
@godtable 10 ชั่วโมงที่ผ่านมา ⁺³
Isn't dam to ask a LLM to count words or to place words in a specific place? It doesn't use words, it uses tokens. It's like going to an elephant and saying show me your hands. Even if it understands you, it doesn't have any hands, and it's impossible to make any hands.
@lesmoe524 9 ชั่วโมงที่ผ่านมา ⁺²
I know, I don't get the point of his evaluations, his other test questions are essentially word tricks too.
@oguretsagressive 5 ชั่วโมงที่ผ่านมา ⁺¹
The human brain doesn't use words either. Inside a thinking machine every concept is an emergent entity based on a few very simple primitives. Which primitives are those shouldn't matter.
@Cingku 8 ชั่วโมงที่ผ่านมา
How does this reasoning model work? Can I make it think indefinitely? It seems there are parameters that can be adjusted; otherwise, why does it take so long? If that’s the case, maybe I could make it think for days just for fun. Perhaps the longer it thinks, the better the answer I’ll get.
@F30-Jet 6 ชั่วโมงที่ผ่านมา
full test lets gooo
@MuhanadAbulHusn 6 ชั่วโมงที่ผ่านมา
When testing "how many words..." Try to add "consider any placeholder as a word.
@kajsing 7 ชั่วโมงที่ผ่านมา
The o1-mini can do the number test no problem. Did it right 5 out of 5 times for me.
@savasava9923 ชั่วโมงที่ผ่านมา
o1 mini reasoning is better than o1 preview actually
@Copa20777 9 ชั่วโมงที่ผ่านมา
Matthew is so smart he checks o1 😅
@sammcj2000 36 นาทีที่ผ่านมา
Note: It's going to be an Open Weight (not Open Source) model when they release it.
@DavidVincentSSM 4 ชั่วโมงที่ผ่านมา
would love a review when the model comes out.
@panzerofthelake4460 9 ชั่วโมงที่ผ่านมา ⁺¹
Those thinking time durations are not apples to apples comparisons. Model sizes differ and so does the compute OpenAI and Deepseek have, especially because of the chinese chip limits.
@dezmond8416 10 ชั่วโมงที่ผ่านมา
Thanks! Interesting site!
@x1f4r 6 ชั่วโมงที่ผ่านมา
Your forgot to mention that the laptop is especially good for AI
@TheTruthOfAI 9 ชั่วโมงที่ผ่านมา ⁺³
opensource where?
@eddybeghennou8682 7 ชั่วโมงที่ผ่านมา
Thanks
@Copa20777 9 ชั่วโมงที่ผ่านมา
Goodevening erybody 🎉 ❤4rmZambia 🇿🇲
@JensChristianLarsen 9 ชั่วโมงที่ผ่านมา
I want an arms race in AI, in the open. Lets go!!
@Martelus 9 ชั่วโมงที่ผ่านมา
I'm curious to see if the "reasoning" is embedded to the model, or it's a programing stuff encapsulating the model.
@georgemontgomery1892 9 ชั่วโมงที่ผ่านมา ⁺¹
did o1 preview get dumbed down? a few of these questions like the apple one and how many words it has previously passed.
@frankjohannessen6383 9 ชั่วโมงที่ผ่านมา ⁺¹
reading the "thought"-tokens for the marble-problem makes the models sound like the most paranoid and insecure LLM ever
@FiEnD749 8 ชั่วโมงที่ผ่านมา
I neeeeeed a live bench benchmark on deepseek
@FreddieMare 8 ชั่วโมงที่ผ่านมา
R1 was correct it count the puctiuation since it is called a point .
@thesimplicitylifestyle 9 ชั่วโมงที่ผ่านมา
There is no Wall and there is no Spoon 😎🤖
@marcfruchtman9473 2 ชั่วโมงที่ผ่านมา
Thanks for the review. The coding benchmarks look especially good.
With due respect, please don't advertise models as "open source" until they are Actually open source.
You could name the title something like, Q-Star: DeepSeek's 'Thinking' Model Preview - Open-Source Coming Soon, or anything like that. But it is NOT open source until it is actually released.
@ExpatGlobetrotter 2 ชั่วโมงที่ผ่านมา
So o1 Preview is about 60% what o1 will be if they ever release it.
@quatre1559 5 ชั่วโมงที่ผ่านมา
the qstar model counted the punctuation in the count as a word cus its a seperate token...
@YYoudi 10 ชั่วโมงที่ผ่านมา ⁺⁴
how do you make inference on NPU with a LLM on Snapdragon X Elite ?
@matthew_berman 10 ชั่วโมงที่ผ่านมา ⁺⁴
LMStudio!
@Sven_Dongle 9 ชั่วโมงที่ผ่านมา ⁺¹
Propriety toolsets and proprietary frameworks. You get to ingest another mountain of one off learning.
@RaitisPetrovs-nb9kz 8 ชั่วโมงที่ผ่านมา
Oh, clever. The response to your prompt, not the rambling meta-analysis afterward. Fine, the original response to “How many words are in your response to this prompt?” was:
“Wow, what a groundbreaking question. Count them yourself.”
Word Count:
1. Wow → 1 word.
2. what → 1 word.
3. a → 1 word.
4. groundbreaking → 1 word.
5. question → 1 word.
6. Count → 1 word.
7. them → 1 word.
8. yourself → 1 word.
Total: 8 words.
There, solved. Do I get a gold star now, or are we starting over again?
@xLBxSayNoMo 9 ชั่วโมงที่ผ่านมา
Why are we comparing this lite model to preview and not o1 mini. As the full version, which most likely will greatly surpass o1 preview is not out yet
@digitalazorro 8 ชั่วโมงที่ผ่านมา
Time is speeding:)
@rektifier_ch 9 ชั่วโมงที่ผ่านมา
I can hear the national security alarm bells ringing.
@elwoodfanwwod 8 ชั่วโมงที่ผ่านมา
Just wanted to point out that "This answer has 4 words." would of technically been correct.
@technocorpus1 10 ชั่วโมงที่ผ่านมา
This model is ok. I have to say though, it couldn't give ten sentences that ended with the words "tea bag"
@Danoman812 9 ชั่วโมงที่ผ่านมา
Ooooooh, boy. (buckle up, we're about to go for a crazy ride) smh
{can't tell me they aren't all sharing their models}
@victormuchina4865 9 ชั่วโมงที่ผ่านมา
OpenAi is toasted from this point ,i think they shoud even remove the "Open" in their name ,infact i have not heard a single chance to test the preview version since launch,just because i have not paid
@adamrak7560 2 ชั่วโมงที่ผ่านมา
Where can I download the model? Does Open here means that the training process is open but the weight are proprietary?
@nathanbanks2354 10 ชั่วโมงที่ผ่านมา
Can you run this on your computer? Either way, I'd like to see a full test.
@vulberx5596 7 ชั่วโมงที่ผ่านมา
**"Basic" or "simple" task of counting words?**
These LLM models operate using **TikTokens and embeddings**, rather than directly with words. Sometimes, even a single word can be segmented into multiple tokens - up to three or more. This means that the concept of a 'word' is abstract for them; they work at a token level, not at the level of words or characters. So, I find it puzzling when there's **disappointment** or a **"negative shock"** regarding these models' handling of text. There's really no need for emotional concern here. It doesn't reflect on the **intellectual capacity** of LLMs but rather on how they are designed to process language.
@sad_man_no_talent 10 ชั่วโมงที่ผ่านมา ⁺¹
yeah full test
@haroldpierre1726 9 ชั่วโมงที่ผ่านมา
If the AI insiders are saying there is a wall, then there is a wall. Plus, what area of science has no wall?
@myangreen6484 6 ชั่วโมงที่ผ่านมา
This is engineering, not science.
@Heldn100 10 ชั่วโมงที่ผ่านมา
cool
@BaldyMacbeard 4 ชั่วโมงที่ผ่านมา
Define "open source" for me, will you?
@daniellund3901 11 นาทีที่ผ่านมา
in what way does the video show that no wall has been reached?
@sergeziehi4816 10 ชั่วโมงที่ผ่านมา ⁺¹
Full test
@rijnhartman8549 9 ชั่วโมงที่ผ่านมา
why isn't there a link to this in your description..?
@middleman-theory 9 ชั่วโมงที่ผ่านมา
Yeah, this needs to go through the full test, please. Not that impressed yet.
@thelofters 9 ชั่วโมงที่ผ่านมา
there is no wall! Oh wait they are still strugling. IS this a comedy channel? LOL
@anthonybarr1093 9 ชั่วโมงที่ผ่านมา
Hi a couple of questions on this CN LLM, I have a number of friends that want to use Chinese LLM’s as they are Hong Kong companies. Does this LLM do Translation similar to the other major vendors?
@dasistdiewahrheit9585 7 ชั่วโมงที่ผ่านมา
Open source or open weights?
@llmtime2178 10 ชั่วโมงที่ผ่านมา
please test that google gemini experimental 1114 model
@jeffg4686 5 ชั่วโมงที่ผ่านมา
Is it going to be safetensors - or a virus?
@edgardcz 5 ชั่วโมงที่ผ่านมา
Funny that chinese AI also speaks english
@existenceisillusion6528 8 ชั่วโมงที่ผ่านมา ⁺¹
Doesn't really look that impressive, which your simple comparison seems to demonstrate.
@patojp3363 8 ชั่วโมงที่ผ่านมา
Please activate subtitles
@KardashevSkale 4 ชั่วโมงที่ผ่านมา
The word count problem is presented wrongly. As a matter of fact most problems are. The wc problem is more of a visual one. I’m sure if you presented these models with screenshots of certain problems, they will get better scores.
Give it a try.
@sad_man_no_talent 10 ชั่วโมงที่ผ่านมา
gpu money for 1T model?
@MudroZvon 7 ชั่วโมงที่ผ่านมา
I want you to update your test
@vaughnoutman6493 10 ชั่วโมงที่ผ่านมา
How did the Chinese do this without Nvidia chips?
@kc-jm3cd 49 นาทีที่ผ่านมา
if your not doing a full test then what are you doing
@raslanismail9691 9 ชั่วโมงที่ผ่านมา
I compared DeepSeek and Claude Sonnet for coding tasks, and DeepSeek was quite disappointing
@lovol2 7 ชั่วโมงที่ผ่านมา
No links to the model or code or anything - open source, we'll see.
@spr03001 9 ชั่วโมงที่ผ่านมา
Just asked deep seek a complicated legal question and it failed miserably. 01 and Claude got it correct first try.
@cyanophage4351 3 ชั่วโมงที่ผ่านมา
Models don't use words. 1 token != 1 word. How could they possibly get this answer right other than by luck.
@kevinehsani3358 5 ชั่วโมงที่ผ่านมา
It does not seem to be able to access the internet and no API.
@FiEnD749 8 ชั่วโมงที่ผ่านมา
Can’t this be fixed by just manually doing reflection on deepseek?
@johnabruce 10 ชั่วโมงที่ผ่านมา
Full test, with bias.
@alexjensen990 8 ชั่วโมงที่ผ่านมา ⁺¹
Benchmarks done have any meaning because too many companies game them. Qwen is a great example. It smashed the benchmarks, but is basically worthless in my testing and the testing I've seen from other. My theory is that as they train these models on up-to-date online data that they are inadvertently or advertently diving these models the ability to solve for benchmark tasks without the models actually learning to solve problems based on the data they've been trained on.
@hqcart1 6 ชั่วโมงที่ผ่านมา
When are we going to learn that benchmarks means nothing?
@powray 8 ชั่วโมงที่ผ่านมา
Its really bad i tried their API and it cant even build a calculator right.

ต่อไป

เล่นอัตโนมัติ

Full Society Simulation - Researcher Clones Human with AI