Tulu 3: This NEW Model BEATS DEEPSEEK-V3 & R1?! (Fully Tested)

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 43

  • @bungrudi
    @bungrudi 19 ชั่วโมงที่ผ่านมา +55

    bro dont you think its time to bump the test a bit? maybe to make it more relevant to real life, make the model understand existing code and solve/build something related to that code.

  • @honkytonk4465
    @honkytonk4465 18 ชั่วโมงที่ผ่านมา +19

    If it doesn't beat r1 then I'm not interested

  • @phyllyf763
    @phyllyf763 8 ชั่วโมงที่ผ่านมา +2

    When an open source llm beats another, it is progress and good. When a closed source beats open source, I'm on my toes to see us take the lead again. TEAM OPEN SOURCE!!!!! Thanks King.

  • @hedgehog6300
    @hedgehog6300 23 ชั่วโมงที่ผ่านมา +11

    Since when is 38 the right answer for number 5? Plus, I would consider it cheating when the model uses code execution to get the answer since the model doesn't calculate the answer itself and relies on the code execution. Even on a simple calculation like number 5 and STILL gets the wrong answer

    • @AICodeKing
      @AICodeKing  20 ชั่วโมงที่ผ่านมา +1

      38-40 is a correct number based on how you calculate it. It doesn't do code execution. It does reasoning in code format and predicts output instead of running it (kind of a dry run).

    • @thanatosjamie7663
      @thanatosjamie7663 19 ชั่วโมงที่ผ่านมา +1

      38 is incorrect. Getting that as the answer shows a misunderstanding of the question.
      The language is "overstated by 20%". Meaning the 20% is in reference to the original number.
      Also 0.8 x 48 would not give exactly 38, that gives a non-integer. That should indicate that the answer is incorrect given that the question is about apples.
      This approach is often used to test if someone truly understands a question.

    • @hedgehog6300
      @hedgehog6300 19 ชั่วโมงที่ผ่านมา +2

      I don't know where you went to school, AICodeKing, but it is not between 38 and 40. It is 40 since 48/1.2=40 as it's put as an overstatement of 20% in the question and you want to therefore reduce the number (48) by 20%. It cannot be 38 or 39. It's basic math a 405b model should be able to do.

    • @gregkendall3559
      @gregkendall3559 14 ชั่วโมงที่ผ่านมา

      It is time to add some new "reasoning" questions. Maybe some logic "misguided attention" type questions to see if the models can get past their training and think things through better.

    • @weevil601
      @weevil601 12 ชั่วโมงที่ผ่านมา +1

      @@hedgehog6300 Reducing 48 by 20% does not give you 40. It gives you 38.4. But increasing 40 by 20% does give you 48. The correct answer is 40 because the question said 48 was overstating the correct answer by 20%, so the problem asks you to find the number that, when increased by 20%, gives you 48. You do NOT want to reduce 48 by 20%. That gives you the wrong answer and is not what the question calls for.

  • @Adventure1844
    @Adventure1844 4 ชั่วโมงที่ผ่านมา

    Thanks for your Great Videos

  • @mrinalraj4801
    @mrinalraj4801 22 ชั่วโมงที่ผ่านมา +1

    Great content king. Please start sharing the website you use in every video for the explored topic.

  • @IndianTinker
    @IndianTinker 23 ชั่วโมงที่ผ่านมา +4

    I just spend time watching your videos and dont build anything. Slow down, please !

    • @NITHZAEL92
      @NITHZAEL92 22 ชั่วโมงที่ผ่านมา

      😂😂😅

    • @Radioactive-Cactus
      @Radioactive-Cactus 15 ชั่วโมงที่ผ่านมา +1

      Seriously. The speed of AI updates across TH-cam has got brain.exe to use up too much memory!

  • @dhruvsharma4426
    @dhruvsharma4426 วันที่ผ่านมา +9

    Make video on codename goose. It's a game changer

    • @enkor349
      @enkor349 23 ชั่วโมงที่ผ่านมา +1

      why ?

    • @someonemight
      @someonemight 23 ชั่วโมงที่ผ่านมา

      I used it on a small personal project and it worked pretty well aside from its crazy consumption of tokens (ran into rate limits despite having a tier 3 account), but then I tried it on a couple of files in a corporate codebase and it ended up deleting half the code.

    • @kolsongar
      @kolsongar 20 ชั่วโมงที่ผ่านมา

      Goose is crazy. I've done some insane robotic processes and automation with it.

    • @someonemight
      @someonemight 20 ชั่วโมงที่ผ่านมา

      @kolsongar curious, can you elaborate?

  • @hideperson7310
    @hideperson7310 วันที่ผ่านมา +5

    Provide link of it as well

    • @hideperson7310
      @hideperson7310 19 ชั่วโมงที่ผ่านมา +1

      Please provide links as well 🙏

  • @TheReferrer72
    @TheReferrer72 วันที่ผ่านมา +5

    Oh, that butterfly!

  • @Adventure1844
    @Adventure1844 4 ชั่วโมงที่ผ่านมา

    I think that if a model were extensively trained on questions like 3 or 4, it would likely perform worse. This is because such training has no practical use in real life and unnecessarily distorts the biases within the LLM.

  • @v1nigra3
    @v1nigra3 19 ชั่วโมงที่ผ่านมา +2

    I know people are slow and may not realize this - but the reason all scores are just about the same, is that they are all copies of one original tech. Until that tech advances they won’t either, however there is a fair chance of the copy companies may crack something new while at it that would change the game.

  • @izibaneinspiration
    @izibaneinspiration วันที่ผ่านมา +2

    Good Video King

  • @alexsun5247
    @alexsun5247 22 ชั่วโมงที่ผ่านมา +1

    This butterfly is really good!

  • @다루루
    @다루루 วันที่ผ่านมา +5

    Claude는 소식이 읎네 🥲
    There's no news from Claude 🥲

    • @phoneywheeze
      @phoneywheeze วันที่ผ่านมา

      Feb 5 or Feb 13

    • @다루루
      @다루루 วันที่ผ่านมา +2

      @@phoneywheeze Thank you!

  • @33butterzucker33
    @33butterzucker33 19 ชั่วโมงที่ผ่านมา +1

    Tuelu3 8b + Ollama ist the best local Model for me.

  • @abdalla_abdalla
    @abdalla_abdalla วันที่ผ่านมา +2

    Could you make a video about Mistral small 3 24B

  • @arpo71
    @arpo71 23 ชั่วโมงที่ผ่านมา

    I would like to know how you can run a model locally with cline 😊

    • @anatalelectronics4096
      @anatalelectronics4096 20 ชั่วโมงที่ผ่านมา

      get yourself 250k to buy GPU that can run R1, that will work. Smaller models dont work with cline because cline has a sys prompt that makes it impossible to work with small models. Cline has good work flow but the prompts are really bad for safe development. Cline is no meta tool, it cannot work real time on itself, as Aider does.

    • @arpo71
      @arpo71 20 ชั่วโมงที่ผ่านมา

      @ 250k what?

    • @anatalelectronics4096
      @anatalelectronics4096 16 ชั่วโมงที่ผ่านมา

      @@arpo71 $

  • @MrMoonsilver
    @MrMoonsilver 23 ชั่วโมงที่ผ่านมา

    where small 3 tho

  • @chadpogs7973
    @chadpogs7973 วันที่ผ่านมา +1

    Wow!!!! 🎉🎉

  • @Rizzler524
    @Rizzler524 23 ชั่วโมงที่ผ่านมา

    yo bro can i get Lovable ai pro for free man its my dream and I love yo videos

  • @vibhorgautam9928
    @vibhorgautam9928 วันที่ผ่านมา +3

    First comment