OpenAI o3 mini FAILED My Test BIG TIME? (FREE)

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.พ. 2025

ความคิดเห็น • 36

  • @toddschavey6736
    @toddschavey6736 7 วันที่ผ่านมา +8

    Deepseek R1 + V3 combo with Cline is amazing.
    Id want to try Cerebras R1 70B in place of V3 next. If you you havent heard ov Cerebras, the make AI Wafer and the output token rate is nearly instant. Up until recent, they only had crap Lama 70b

    • @micbab-vg2mu
      @micbab-vg2mu 6 วันที่ผ่านมา +1

      I wll try this combo so far - at the moment I use Cusrosr and Windserf :)

    • @pepefrogic3034
      @pepefrogic3034 6 วันที่ผ่านมา

      Similar problems are with R1. CoT is useless if requests are misunderstood whixh happens a lot. 4o performs better than o3 mini

  • @marvinfiori2541
    @marvinfiori2541 7 วันที่ผ่านมา +5

    Mervin idea for next video: Model destilation!

  • @ngana8755
    @ngana8755 7 วันที่ผ่านมา +2

    Did you use 03-min low, medium or high? High is supposed to be better at answering the most difficult logical reasoning questions.

  • @EnglishImage
    @EnglishImage 4 วันที่ผ่านมา

    Gave this prompt to Deepseek, which reasoned through the logic correctly. Surprised and disappointed o3 mini took such a blind turn in its "thinking".

  • @xhridhar
    @xhridhar 7 วันที่ผ่านมา +10

    That sucks . There was so much hype over this .

    • @xXWillyxWonkaXx
      @xXWillyxWonkaXx 7 วันที่ผ่านมา +8

      I dont care honestly, lets just hope DeepSeek copies the upcoming O3 and releases it lol

    • @micbab-vg2mu
      @micbab-vg2mu 6 วันที่ผ่านมา

      Agree

  • @InAMinute-ws3yv
    @InAMinute-ws3yv 6 วันที่ผ่านมา

    It solved my automation problem. But there is no image upload, so not useful for me.

  • @Swooshii-u4e
    @Swooshii-u4e 6 วันที่ผ่านมา

    You should’ve also gave the question to v3 and r1

  • @Swooshii-u4e
    @Swooshii-u4e 6 วันที่ผ่านมา

    How do you use v3 and r1 in combo?

  • @RadiantNij
    @RadiantNij 6 วันที่ผ่านมา

    My experience with it in coding has been great so far

  • @vivekkarumudi
    @vivekkarumudi 6 วันที่ผ่านมา

    Thanks the The very first question gave me a very clear idea that it is a failed one o3 model

  • @monstrositylabs
    @monstrositylabs 7 วันที่ผ่านมา +3

    IMO it's way better than o1 for coding. 01 preview for some weird reason is still the best

  • @micbab-vg2mu
    @micbab-vg2mu 6 วันที่ผ่านมา +5

    03-mini it is faster, cheaper but dummer version of o1 - I do not see why I should use it - now I have R1.

  • @MS-wz9jm
    @MS-wz9jm 7 วันที่ผ่านมา +3

    Try Kimi 1.5 reasoning

  • @RadiantNij
    @RadiantNij 6 วันที่ผ่านมา +2

    That clickbait title thooo

  • @everydaybob
    @everydaybob 6 วันที่ผ่านมา +1

    Well, unfortunately o3-mini-high is a joke. o1-pro is so much better in reasoning.What they are claiming is totally false it looks like.

  • @roodood
    @roodood 4 วันที่ผ่านมา

    o3-mini was useless for my agent, as it consistently failed to call any functions. Even when hardcoding the tool choice, it just responded with the function arguments as json string.

  • @gjadams74
    @gjadams74 6 วันที่ผ่านมา

    "Imagine five dead people" tonight for the first time hearing your tests I saw this or heard this differently.
    What if the prompt could be "five deceased people"

    • @gjadams74
      @gjadams74 6 วันที่ผ่านมา +1

      So often in English there is a use of the word "dead" less as an adjective
      As in "already dead" like the mice in a cage with a cat inside.

    • @faustprivate
      @faustprivate 6 วันที่ผ่านมา +1

      100% this. Give dead ppl could imply they will die since they are on the track and the train is coming towards them. It just assumes your English is so so 😅

  • @幼女
    @幼女 7 วันที่ผ่านมา +1

    Your coding test has different problems for each video, which makes it hard to follow.

  • @JavedAlam24
    @JavedAlam24 6 วันที่ผ่านมา

    Your perspective on the trolley problem is not the only valid interpretation

    • @celtiberian
      @celtiberian 4 วันที่ผ่านมา

      You're wrong. Read the prompt again.

  • @islambaraka6552
    @islambaraka6552 7 วันที่ผ่านมา +1

    Bad model for my testing as well 🤔

  • @md.zunaidtausif-vj9cy
    @md.zunaidtausif-vj9cy 6 วันที่ผ่านมา

    10x better than deep shiit

    • @mattki-y9y
      @mattki-y9y 6 วันที่ผ่านมา +1

      Lool, cope more

  • @sephirothcloud3953
    @sephirothcloud3953 7 วันที่ผ่านมา

    This model was trained with STEM in mind, it sucks in everything else

    • @gjadams74
      @gjadams74 6 วันที่ผ่านมา

      @@sephirothcloud3953 perhaps another example of how it should be STEAM having Arts included. Likely da Vinci would agree. One step closer we are however

  • @fabianmunoz7365
    @fabianmunoz7365 7 วันที่ผ่านมา

    WOW this model is a beast.

  • @Swooshii-u4e
    @Swooshii-u4e 6 วันที่ผ่านมา

    This video sucks tho because it didn’t compare to deep

  • @avi7278
    @avi7278 6 วันที่ผ่านมา

    Deepseek is dog water in comparison. It's literally terrible at golang. You can tell it's a distilled sonnet model that mostly focused on prompt kiddy languages and frameworks. It tried to make an optional parameter in golang by adding an int parameter without a pointer and then saying if x = 0, then set a default value. I can't even begin to explain how many things are wrong with that. O3 refactored an entire feature in my go app in one shot where even o1 pro missed things after five minutes of thinking. But o3 one shots the whole thing in 10 seconds, and then even roasted me for the original code. o3 gives the same vibes as the jump we saw from sonnet 3.5 but even more so.

  • @mrpro7737
    @mrpro7737 7 วันที่ผ่านมา +1

    I will stick with deepseek, intel they release o3