DeepSeekR1 - Full Breakdown

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ม.ค. 2025

ความคิดเห็น • 35

  • @nufh
    @nufh 13 ชั่วโมงที่ผ่านมา +22

    This open model is so good, hard to believe that this is MIT license.

    • @lemniscif
      @lemniscif 3 ชั่วโมงที่ผ่านมา

      Well, with TikTok getting regulated, there needs to be a new hole.

    • @MS-wz9jm
      @MS-wz9jm ชั่วโมงที่ผ่านมา

      When you read the paper Deepseek says themselves there is a lot more meat left on the bone. Expect a follow up model pretty quick.

  • @cariyaputta
    @cariyaputta 9 ชั่วโมงที่ผ่านมา +9

    This is the greatest gift for for the upcoming Chinese New Year holiday.

    • @lipinglin1994
      @lipinglin1994 5 ชั่วโมงที่ผ่านมา +1

      That’s why there is a discount for API. I am going to use it during the holiday.

    • @JH-bb8in
      @JH-bb8in ชั่วโมงที่ผ่านมา

      you mean lunar new year

    • @cariyaputta
      @cariyaputta 58 นาทีที่ผ่านมา

      @@JH-bb8in I specifically refer to the Chinese starting date, not every lunar calendar is the same, the Indian's starts at March 22 for example.

  • @mrchongnoi
    @mrchongnoi 4 ชั่วโมงที่ผ่านมา +3

    I always like your assessments. No hype

    • @samwitteveenai
      @samwitteveenai  ชั่วโมงที่ผ่านมา +1

      Thanks this is exactly what I am going for

  • @hiawoood
    @hiawoood 11 ชั่วโมงที่ผ่านมา +2

    The most useful video about ds r1 in youtube. I enjoy the concise and approachable technical details in your videos. Please never stop posting.

  • @briancase6180
    @briancase6180 2 ชั่วโมงที่ผ่านมา +1

    Nice deep dive. These models are great, and are actually doing something I wasn't sure was possible. Now that I see it, I'm not sure why I thought this would be difficult. 🤷

    • @samwitteveenai
      @samwitteveenai  ชั่วโมงที่ผ่านมา

      You make a really good point, when you actually see what they're doing, it's not as complicated as a lot of people would think.

  • @balegua33
    @balegua33 12 ชั่วโมงที่ผ่านมา

    Thank you and greets from Germany! love your videos

  • @alchemication
    @alchemication 6 ชั่วโมงที่ผ่านมา

    Curious about multilingual capability here, will definitely play around soon! Also, for testing reasoning i would suggest a large complex task and treat it like a one shot solver, not a chat model. At least that seems to be the trick and strength of openai O models right now. Best!

  • @14supersonic
    @14supersonic 4 ชั่วโมงที่ผ่านมา

    Reasoning combined with test time training would be killer for local OSS models. We need models with these techniques combined together somehow. I believe at that point we'd be beyond AGI, but we'd probably be at ASI at that point.

  • @alexslee5356
    @alexslee5356 12 ชั่วโมงที่ผ่านมา +1

    Always concise explanation and right to the point. Thank you Sam :D Great video!

    • @samwitteveenai
      @samwitteveenai  12 ชั่วโมงที่ผ่านมา

      Thanks much appreaciated

  • @CognitiveComputations
    @CognitiveComputations 11 ชั่วโมงที่ผ่านมา +3

    Do you know if they released the distillation procedure?
    So that we can, for instance, distill it onto qwen2.5-coder

    • @samwitteveenai
      @samwitteveenai  5 ชั่วโมงที่ผ่านมา +2

      AFAIK they haven't released the data but I talked about the distillation in the video. the basically just do a FT on 800k examples sampled from R1 and DeepSeekv3 for non reasoning tasks.

    • @CognitiveComputations
      @CognitiveComputations 5 ชั่วโมงที่ผ่านมา

      @samwitteveenai oh yeah I could reproduce that in a hot minute! I'll get on it

    • @MS-wz9jm
      @MS-wz9jm ชั่วโมงที่ผ่านมา

      I expect they may end up doing this as in the paper they said they did not do RL on reasoning for engineering/coding tasks - thus R1 doesnt have a huge improvement over V3 for coding. Once they do the RL for coding i suspect they may release something like this.

  • @hqcart1
    @hqcart1 11 ชั่วโมงที่ผ่านมา +8

    dude, we already passed the point that bench marks mean nothing!

    • @TheGuyWhoGamesAlot1
      @TheGuyWhoGamesAlot1 8 ชั่วโมงที่ผ่านมา

      I wouldn't say they mean "nothing" a model that performs middling or bad on benchmarks are usually not good. Actually most of the time not good.
      However, I agree when we are using SOTA models, it becomes less useful.
      We need some empirical metrics, like benchmarks, but we also have to know that doesn't tell the whole story.

    • @samwitteveenai
      @samwitteveenai  5 ชั่วโมงที่ผ่านมา

      The benchmarks that are really interesting here are the DeepSeek-R1 compared to the DeepSeekv3 as they are the exact same base model but mean the different is showing the strength of their new post training compared to a more standard post training regime.

  • @MeinDeutschkurs
    @MeinDeutschkurs 7 ชั่วโมงที่ผ่านมา

    Most of my tests of the 70b model resulted in a chain of vomited text. It’s easy to say that it is the wrong model to prompt for “Please write an overview about the German tense Plusquamperfekt.” There is a lot to think about, and yes, it is far away from anything correct. There is no wrong question or wrong model for a certain question.

  • @maxziebell4013
    @maxziebell4013 11 ชั่วโมงที่ผ่านมา +2

    R for Remarkable

  • @khangvutien2538
    @khangvutien2538 8 ชั่วโมงที่ผ่านมา

    Thank you.
    I’ve read on LinkedIn that the terms & conditions of Deepseek are that they have copyrights on the applications that are developed using their models. Is it true? Then it’s not really MIT license, is it?

    • @jittertn
      @jittertn 5 ชั่วโมงที่ผ่านมา

      Consipracy theory crap, other labs are panicking and spreading bs all over the net

  • @michaeltse321
    @michaeltse321 11 ชั่วโมงที่ผ่านมา +1

    if the context length is 2million+ then it would desoy the competition

    • @eddiehaug
      @eddiehaug 9 ชั่วโมงที่ผ่านมา

      And it'll cost a small fortune to run (at that scale)...

  • @HermanTheKid
    @HermanTheKid 9 ชั่วโมงที่ผ่านมา

    Conspiracy theory time! Put on your foil hats!
    I don't actually know anything, but I gave DS3 and Clause 3.5 a prompt asking for a paragraph of corporate jargon that uses cliché catchy business phrases, without actually saying anything useful. There were slight variations in the words, but the paragraph structure and phrases were beat-for-beat the same. Same phrases, same order. Wouldn't it be hilarious if DS3 was a slightly modified wrapper around Claude?
    A single data point is all you need for a conspiracy theory, right?

    • @JeremyJanzen
      @JeremyJanzen 6 ชั่วโมงที่ผ่านมา +1

      Ok but if it was and they sold it this cheap they’d be losing a ton of money.