DeepSeekR1 - Full Breakdown

Sam Witteveen

มุมมอง 9 339

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ม.ค. 2025

ความคิดเห็น • 35

@nufh 13 ชั่วโมงที่ผ่านมา ⁺²²
This open model is so good, hard to believe that this is MIT license.
@lemniscif 3 ชั่วโมงที่ผ่านมา
Well, with TikTok getting regulated, there needs to be a new hole.
@MS-wz9jm ชั่วโมงที่ผ่านมา
When you read the paper Deepseek says themselves there is a lot more meat left on the bone. Expect a follow up model pretty quick.
@cariyaputta 9 ชั่วโมงที่ผ่านมา ⁺⁹
This is the greatest gift for for the upcoming Chinese New Year holiday.
@lipinglin1994 5 ชั่วโมงที่ผ่านมา ⁺¹
That’s why there is a discount for API. I am going to use it during the holiday.
@JH-bb8in ชั่วโมงที่ผ่านมา
you mean lunar new year
@cariyaputta 58 นาทีที่ผ่านมา
@@JH-bb8in I specifically refer to the Chinese starting date, not every lunar calendar is the same, the Indian's starts at March 22 for example.
@mrchongnoi 4 ชั่วโมงที่ผ่านมา ⁺³
I always like your assessments. No hype
@samwitteveenai ชั่วโมงที่ผ่านมา ⁺¹
Thanks this is exactly what I am going for
@hiawoood 11 ชั่วโมงที่ผ่านมา ⁺²
The most useful video about ds r1 in youtube. I enjoy the concise and approachable technical details in your videos. Please never stop posting.
@briancase6180 2 ชั่วโมงที่ผ่านมา ⁺¹
Nice deep dive. These models are great, and are actually doing something I wasn't sure was possible. Now that I see it, I'm not sure why I thought this would be difficult. 🤷
@samwitteveenai ชั่วโมงที่ผ่านมา
You make a really good point, when you actually see what they're doing, it's not as complicated as a lot of people would think.
@balegua33 12 ชั่วโมงที่ผ่านมา
Thank you and greets from Germany! love your videos
@alchemication 6 ชั่วโมงที่ผ่านมา
Curious about multilingual capability here, will definitely play around soon! Also, for testing reasoning i would suggest a large complex task and treat it like a one shot solver, not a chat model. At least that seems to be the trick and strength of openai O models right now. Best!
@14supersonic 4 ชั่วโมงที่ผ่านมา
Reasoning combined with test time training would be killer for local OSS models. We need models with these techniques combined together somehow. I believe at that point we'd be beyond AGI, but we'd probably be at ASI at that point.
@alexslee5356 12 ชั่วโมงที่ผ่านมา ⁺¹
Always concise explanation and right to the point. Thank you Sam :D Great video!
@samwitteveenai 12 ชั่วโมงที่ผ่านมา
Thanks much appreaciated
@CognitiveComputations 11 ชั่วโมงที่ผ่านมา ⁺³
Do you know if they released the distillation procedure?
So that we can, for instance, distill it onto qwen2.5-coder
@samwitteveenai 5 ชั่วโมงที่ผ่านมา ⁺²
AFAIK they haven't released the data but I talked about the distillation in the video. the basically just do a FT on 800k examples sampled from R1 and DeepSeekv3 for non reasoning tasks.
@CognitiveComputations 5 ชั่วโมงที่ผ่านมา
@samwitteveenai oh yeah I could reproduce that in a hot minute! I'll get on it
@MS-wz9jm ชั่วโมงที่ผ่านมา
I expect they may end up doing this as in the paper they said they did not do RL on reasoning for engineering/coding tasks - thus R1 doesnt have a huge improvement over V3 for coding. Once they do the RL for coding i suspect they may release something like this.
@hqcart1 11 ชั่วโมงที่ผ่านมา ⁺⁸
dude, we already passed the point that bench marks mean nothing!
@TheGuyWhoGamesAlot1 8 ชั่วโมงที่ผ่านมา
I wouldn't say they mean "nothing" a model that performs middling or bad on benchmarks are usually not good. Actually most of the time not good.
However, I agree when we are using SOTA models, it becomes less useful.
We need some empirical metrics, like benchmarks, but we also have to know that doesn't tell the whole story.
@samwitteveenai 5 ชั่วโมงที่ผ่านมา
The benchmarks that are really interesting here are the DeepSeek-R1 compared to the DeepSeekv3 as they are the exact same base model but mean the different is showing the strength of their new post training compared to a more standard post training regime.
@MeinDeutschkurs 7 ชั่วโมงที่ผ่านมา
Most of my tests of the 70b model resulted in a chain of vomited text. It’s easy to say that it is the wrong model to prompt for “Please write an overview about the German tense Plusquamperfekt.” There is a lot to think about, and yes, it is far away from anything correct. There is no wrong question or wrong model for a certain question.
@maxziebell4013 11 ชั่วโมงที่ผ่านมา ⁺²
R for Remarkable
@khangvutien2538 8 ชั่วโมงที่ผ่านมา
Thank you.
I’ve read on LinkedIn that the terms & conditions of Deepseek are that they have copyrights on the applications that are developed using their models. Is it true? Then it’s not really MIT license, is it?
@jittertn 5 ชั่วโมงที่ผ่านมา
Consipracy theory crap, other labs are panicking and spreading bs all over the net
@michaeltse321 11 ชั่วโมงที่ผ่านมา ⁺¹
if the context length is 2million+ then it would desoy the competition
@eddiehaug 9 ชั่วโมงที่ผ่านมา
And it'll cost a small fortune to run (at that scale)...
@HermanTheKid 9 ชั่วโมงที่ผ่านมา
Conspiracy theory time! Put on your foil hats!
I don't actually know anything, but I gave DS3 and Clause 3.5 a prompt asking for a paragraph of corporate jargon that uses cliché catchy business phrases, without actually saying anything useful. There were slight variations in the words, but the paragraph structure and phrases were beat-for-beat the same. Same phrases, same order. Wouldn't it be hilarious if DS3 was a slightly modified wrapper around Claude?
A single data point is all you need for a conspiracy theory, right?
@JeremyJanzen 6 ชั่วโมงที่ผ่านมา ⁺¹
Ok but if it was and they sold it this cheap they’d be losing a ton of money.

ต่อไป

เล่นอัตโนมัติ

smolagents - HuggingFace's NEW Agent Framework