Make Local Deepseek THINK LONGER!💥 Local Test-Time Scaling 💥

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 ก.พ. 2025
  • Paper Abstract
    Test-time scaling is a promising new approach to
    language modeling that uses extra test-time compute to improve performance. Recently, OpenAI’s
    o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to
    achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K
    of 1,000 questions paired with reasoning traces
    relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we
    develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait”
    multiple times to the model’s generation when it
    tries to end. This can lead the model to doublecheck its answer, often fixing incorrect reasoning
    steps.
    In this video, we turn DeepSeek-R1-Distill-Qwen-1.5B into a deep thinking model enabling test time scaling.
    Note: this works with all the models that generate thinking tokens!
    🔗 Links 🔗
    s1: Simple test-time scaling
    arxiv.org/pdf/...
    MLX LM - pypi.org/proje...
    Code by Awni Hannum - gist.github.co...
    ❤️ If you want to support the channel ❤️
    Support here:
    Patreon - / 1littlecoder
    Ko-Fi - ko-fi.com/1lit...
    🧭 Follow me on 🧭
    Twitter - / 1littlecoder

ความคิดเห็น • 38

  • @marcfruchtman9473
    @marcfruchtman9473 วันที่ผ่านมา +10

    This is a ground breaking paper. Well done!, And thank you for presenting the video on it. Great stuff.

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา +2

      Glad you enjoyed it! Might do another paper one more clearly!

  • @alx8439
    @alx8439 19 ชั่วโมงที่ผ่านมา +2

    Make that scientific - run a benchmark!

  • @TheMemenist
    @TheMemenist วันที่ผ่านมา +4

    Just watched one of your old videos and i must say, the quality of your in-video talking and content presentation has improved drastically. Also love how the videos now are short and to the point

  • @KevinKreger
    @KevinKreger 22 ชั่วโมงที่ผ่านมา +3

    Excellent! I had read about people automating 'continue' response to get better response. Love this❤!

    • @1littlecoder
      @1littlecoder  17 ชั่วโมงที่ผ่านมา +2

      That's great to know!

    • @doitjames6041
      @doitjames6041 8 ชั่วโมงที่ผ่านมา

      Can you point me to this

  • @TheGalacticIndian
    @TheGalacticIndian 14 ชั่วโมงที่ผ่านมา +1

    DeepSeek hidden powers!😯

  • @GetzAI
    @GetzAI วันที่ผ่านมา +2

    Did I miss where you put the wait state?

  • @KumR
    @KumR 15 ชั่วโมงที่ผ่านมา +1

    Hi Bro one question. When DeepDeek or other similar COT model thinks it displays thoughts on screen..Do these thoughts consume tokens?

    • @ugwuanyicollins6136
      @ugwuanyicollins6136 ชั่วโมงที่ผ่านมา

      Yes

    • @rajbiswas776
      @rajbiswas776 13 นาทีที่ผ่านมา

      Lol, yeah! OpenAI O1 didnt show the thinking tokens they still charged you for it.

  • @Incredible_428
    @Incredible_428 23 ชั่วโมงที่ผ่านมา +1

    The other one was crashing because you weren't using the special tags that need to format the text prompt
    Like the user
    Or system.
    This is the format that the models needs as input. You can use chat schema that transformars uses to format input from a list of dicts(openai chat history format)

    • @1littlecoder
      @1littlecoder  18 ชั่วโมงที่ผ่านมา

      I think you're absolutely right. I guess I got carried away with the working solution. Forgot about this

  • @GaleiqGoogle
    @GaleiqGoogle วันที่ผ่านมา

    Liked this video a lot. Ignore the comment that said to scrap it. This video is actually actionable which is what we need. No point to papers if the viewers can't use them at home. Now, could you make a video showing how to use it? i.e maybe in roo cline / claude desktop / aider / chatgpt? Basically other chatbots?

  • @wwkk4964
    @wwkk4964 21 ชั่วโมงที่ผ่านมา

    Keep up the great work!

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh วันที่ผ่านมา +3

    I sent you an email detailing how I preproduced the work in the paper with no special tools and using the same seed. (because youtube comment section moderation bots hated it in comment form) I'm not saying your coverage of the paper was bad, but your approach to reproduce the effects were not good. Not to knit pick, but this whole paper is what I would consider obvious, I've been doing very similar things to these reasoning models from the moment I got them. No easier way to jailbreak a model than to edit what it thinks it wrote so I have been doing worse things to models for even longer. But yeah, apparently more people need to see it. And a reason harder button to automate this would probably be good in some cases and would be trivial to implement.

    • @jackmartin1146
      @jackmartin1146 16 ชั่วโมงที่ผ่านมา +1

      Hi, I'm very interested to hear more about your work, could you share the example through github or upload it in drive and share?

    • @zyxwvutsrqponmlkh
      @zyxwvutsrqponmlkh 10 ชั่วโมงที่ผ่านมา

      ​@@jackmartin1146the method is as simple as editing the output to delete the answer and tag and appending "wait" while it is trivial to make a script to do this at a click of a button, it is also trivial to do it in lm studop manually.

  • @Michael-Humphrey
    @Michael-Humphrey วันที่ผ่านมา

    Even more important is the the data set

  • @Cingku
    @Cingku วันที่ผ่านมา

    Yes please do for llamacpp too I am waiting. :)

  • @emonsahariar9292
    @emonsahariar9292 วันที่ผ่านมา

    Bro, I just ran the 14b one to solve that kind of problems. But the issue with your solution is, these kind of problems are really numerous and hard to be hand-rolled every time.

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา

      What do you mean by hand-rolled here ?

    • @emonsahariar9292
      @emonsahariar9292 วันที่ผ่านมา

      @@1littlecoder I meant "the numerous times" you have to give it reinforcement to get the expected output. The local LLMs are therefore hopeless (with home use-purpose PCs). I have deleted mine.

  • @d.d.z.
    @d.d.z. วันที่ผ่านมา +1

    Smashed

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา

      @@d.d.z. thank you

  • @GetzAI
    @GetzAI วันที่ผ่านมา +1

    DEMOS NEVER WORK WHILE RECORDING 🤣 That is a known law of the universe my friend!! Or when you are Zooms.

  • @Ernest_Viger-Beaulieu
    @Ernest_Viger-Beaulieu 15 ชั่วโมงที่ผ่านมา

    wow

  • @timmygilbert4102
    @timmygilbert4102 วันที่ผ่านมา +2

    I didn't understand a single thing actionable from this video 😅

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา +2

      😭

    • @timmygilbert4102
      @timmygilbert4102 วันที่ผ่านมา

      @1littlecoder sowy 😳

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา

      @@timmygilbert4102 I'll try another one. Thanks for the feedback

    • @flyingcheesecake3725
      @flyingcheesecake3725 6 นาทีที่ผ่านมา

      same. the paper says add "wait" but his instruction is install mlx-lm and download that specific qwen model. there are nowhere in the code to add "wait" or anything. so i assume the "wait" stuff is finetuned to the particular qwen model he mentioned?

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh วันที่ผ่านมา

    :(

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา

      Not good ?

    • @zyxwvutsrqponmlkh
      @zyxwvutsrqponmlkh วันที่ผ่านมา

      @@1littlecoder No, but also you can do this in lm studio don't need any mac thing. But also your example failed. This is a clinker, I would have scrapped the video and not posted it. Delete thinking over token and appending wait is trivial to implement. But your thinking model you used was failing before you got to a point where this could be helpful, maybe you had the temperature off or something.

    • @1littlecoder
      @1littlecoder  วันที่ผ่านมา

      @zyxwvutsrqponmlkh out of 3, 2 worked. And why would you think it failed ?

    • @zyxwvutsrqponmlkh
      @zyxwvutsrqponmlkh วันที่ผ่านมา +1

      ​@@1littlecoder How many c is in.. how many d ins in... You delete the finished thinking token and append wait (which it practically did itself) but still gets it wrong. Making it do more wait and rethink cycles does not address the fact that your llm setup is fundamentally flawed. Need to have something more stable before applying this technique. And you can keep the seed the same so you can measure apples to apples how the output after the extra thinking changes.