New ChatGPT o1 VS GPT-4o VS Claude 3.5 Sonnet - The Ultimate Test

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 พ.ย. 2024

ความคิดเห็น • 116

  • @SkillLeapAI
    @SkillLeapAI  หลายเดือนก่อน +8

    Join the fastest growing AI education platform and instantly access 20+ top courses in AI: bit.ly/skillleap

  • @Leo_ai75
    @Leo_ai75 หลายเดือนก่อน +15

    I’ve been using your prompt a week now and it’s superb.

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน +6

      Awesome. Glad to hear that

    • @leondbleondb
      @leondbleondb หลายเดือนก่อน +2

      What prompt?

    • @jackfrost6268
      @jackfrost6268 หลายเดือนก่อน +26

      @@leondbleondb You are an AI assistant designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must:
      Understand the Problem: Carefully read and understand the user's question or request.
      Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail.
      Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer.
      Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution.
      Review the Thought Process: Double-check the reasoning for errors or gaps before finalizing your response.
      Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.

    • @Dave-c3p
      @Dave-c3p หลายเดือนก่อน

      @@jackfrost6268 I just tried it in Claude testing how many "s" in the word "ss0s000sss0s". It worked only with the chain of thought prompt, that's pretty good, i'm gonna use that everywhere.

    • @leondbleondb
      @leondbleondb หลายเดือนก่อน

      @@jackfrost6268 thank you.

  • @ChristiaanRoest79
    @ChristiaanRoest79 หลายเดือนก่อน +8

    Claude 3.5 sonnet is the goat for writing

  • @independentpatriot1775
    @independentpatriot1775 หลายเดือนก่อน +4

    Technically, there are 4 killers in the room. The prompt states “no one left the room”, that equates to 3+1=4. Out of 4 people in the room, all 4 of them are killers, 3 of them are alive and 1 is dead.

  • @hamedparsa8880
    @hamedparsa8880 27 วันที่ผ่านมา +1

    A simple prompt for your next vids to test out LLMs reasoning when counting things:
    ___
    1. write me a random sentence about "Actors". then tell me the number of words you wrote in that sentence, then tell me the third letter of the second word in the sentence. is that letter a consonant or a vowel?
    2. repeat the step 1 another time, this time about "gaming".
    3. repeat the step 1 another time, this time about "tech".
    ___

  • @krisvanaut
    @krisvanaut หลายเดือนก่อน

    I don’t think you can directly compare 01 with 0, my understanding is that both models have been trained with different focus, one is oriented to engage in natural language, the other has been with (slightly more) focus on planning. So based on the outcome i tend to select a different model.

  • @CosmicCells
    @CosmicCells หลายเดือนก่อน +9

    Great video, very useful!
    -Claude seems better at reasoning than gpt-4o (and writing)
    -o1 is the reasoning and logic king
    -Gpt 4o is the utiliy king
    I use all three 😊

    • @kontensantai23
      @kontensantai23 หลายเดือนก่อน

      So the conclusion is, GPT-4 is better than Gen-AI like Claude? Because I plan to subscribe to both AI for my thesis needs that help me in my work. As people say, Claude is better than GPT because the Claude model is able to provide more complex and natural conclusions than GPT? Is that true? Please give your opinion, I am confused about choosing between GPT or Claude. Thank you

    • @CosmicCells
      @CosmicCells หลายเดือนก่อน

      @@kontensantai23 It depends on what you plan to do with your thesis work. With GPT plus you get 4o and o1, which together are definately more powerful than claude 3.5. They also have more utility: there is a voice mode (for offloading thoughts etc. verbally) and you can search the web, Custom GPTs... But for web searches Perplexity is king.
      But Clause also has its benefits. It is a better reasoner than 4o and maybe also a tiny bit smarter. It als writes better and more human-like.
      When in dought, try both for a month and keep the one you prefer. Its more money but it will definately help you improve and speed up your thesis.
      You also get more messages per day with ChatGPT than Claude.
      You can also try both models for free with reduced rates but in ChatGPT you wont have the o1 model and in claude "projects" are missing.

    • @roronoa_zoro
      @roronoa_zoro หลายเดือนก่อน +1

      @@kontensantai23 i will choose claude 3.5 sonnet over gpt for my coding work. Theres a huge difference in the coding aspect between those two as for the other stuffs not so sure since i only use them for programming use

    • @amandabrunsperger3726
      @amandabrunsperger3726 หลายเดือนก่อน

      ​@@roronoa_zoroyeah, o1 is just a new AI, so he can't be a king.

    • @roronoa_zoro
      @roronoa_zoro หลายเดือนก่อน +1

      @@amandabrunsperger3726 as for ai new means getting better than the previous one so

  • @JewelzUFO
    @JewelzUFO 28 วันที่ผ่านมา

    Since we are comparing models, please enlighten us on the statistical differences in the output.

  • @bigworld2619
    @bigworld2619 หลายเดือนก่อน +3

    A man needs to transport a goat, a wolf , a puma and a big bag of vegetables across a river on a tiny boat.the boat is very small so it can only carry the man and one of the item at one time. The goat should never be left alone with the bag of the vegetables without the man present as it will eat the vegetables. The puma should also not be left with the goat without the man being present as the puma will attack the goat .the goat and the wolf are friends so they can be left alone together. The wolf and puma will not attack each other hence can also be left alone with each other. How to transport them? Use this as a test no ai model currently can solve this puzzle . Claude 3.5 manage to get it right once but in subsequent test it failed GPT o1 got it correct once but then failed , but if you give some hints GPT 01 preview is the only model that can solve this puzzle with some hints .

  • @jerrypinewood6278
    @jerrypinewood6278 26 วันที่ผ่านมา +9

    I wrote the custom instruction out. For those interested:
    You are an AI assistance designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must:
    Understand the problem: Carefully read and understand the user's question or request. Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail.
    Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer.
    Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution.
    Review the thought Process: Double-check the reasoning for errors or gaps before finalizing your response.
    Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.

  • @RickyForITZY
    @RickyForITZY หลายเดือนก่อน +1

    It should be noted i asked GPT-4o latest the marble question and it got the same answer. However when I used the GPT-4o latest with 128k it got it right. Same thing happened to me with Claude 3.5 vs 3.5 200k.

    • @amandabrunsperger3726
      @amandabrunsperger3726 หลายเดือนก่อน

      Yeah, the video just got it wrong, because it vary.

  • @amandabrunsperger3726
    @amandabrunsperger3726 หลายเดือนก่อน

    In fact, the answer of GPT-4o to "Where's the marble after the glass got away" depends. If the glass rotates, then it would be right. But if the glass didn't rotate, it's then no.

  • @nishudwivedi3478
    @nishudwivedi3478 25 วันที่ผ่านมา

    Which one is best for coding... Claude 3.5 sonnet or claude preview or mini?

  • @canastasio29
    @canastasio29 15 วันที่ผ่านมา

    Thanks Saj!

  • @montonmonton8791
    @montonmonton8791 หลายเดือนก่อน +49

    Can't believe how little effort the average TH-camr is willing to put into making a halfway decent comparison. The same old, useless prompts over and over again. And of course, they're all terrified of accidentally letting something even slightly negative about Claude slip and upsetting the fanboys.

    • @cbgaming08
      @cbgaming08 หลายเดือนก่อน +3

      AI Explained is the way

    • @huk2617
      @huk2617 หลายเดือนก่อน +2

      Replace Claude with o1 and this is 100% true

    • @blackgptinfo
      @blackgptinfo หลายเดือนก่อน +3

      You always could start your own channel

    • @clearsight655
      @clearsight655 หลายเดือนก่อน +1

      Agree. This video specially seemed quite dull. I dont think he understands AI that well.

    • @Gukworks
      @Gukworks หลายเดือนก่อน

      The scene from office space comes to mind "can you tell us a little more?" 😮

  • @ItsYaBoY119
    @ItsYaBoY119 หลายเดือนก่อน +1

    where do i copy the instructions?? you said we can do it our selves but then you jsut gave us the clone for gpt and no way to see its instructions

  • @iuliusRO82
    @iuliusRO82 หลายเดือนก่อน

    Could you, please, test Gemini Advanced also? I'd suggest creating a Gem with reasoning techniques instructions.
    I still believe that o1 is, in fact, 4o but with outstanding reasoning instructions. I hope I'm wrong, but it feels that way to me

  • @gonzalobruna7154
    @gonzalobruna7154 หลายเดือนก่อน +5

    Although I like what you did here, you are still asking the most basic questions these new models can solve. Could you please try hard mathematical, scientific problems? Maybe some hard coding problems from adventofcode, or some of the cryptic prompts from the o1 blog post, like this one:
    oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step
    Use the example above to decode:
    oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz
    That would be very intersting!

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน

      Do you have a resource or two where I can get these inputs?

    • @islamguven4175
      @islamguven4175 หลายเดือนก่อน

      ​@@SkillLeapAI you can go to perplexity and ask it to list most difficult leetcode and codewars questions as a table grouped by difficulty ranging from compiler generation to graph problems
      Maybe a comparison for multi software language capilities or which one produces algorithmic code
      For example I noticed 3.5 handles concepts like memoization including lru caching quite well

  • @jamesedwards7966
    @jamesedwards7966 25 วันที่ผ่านมา

    For hallucinations test could you as them to provide a source for there answer?

    • @SkillLeapAI
      @SkillLeapAI  25 วันที่ผ่านมา

      Yea but sometimes they just make up the source too

  • @sundaytin1671
    @sundaytin1671 21 วันที่ผ่านมา

    What’s the difference between using your prompt and using the link you provided to use the o1 clone?
    Also the part where I’m supposed to enter the prompt there’s 2 boxes which one do I paste it to

    • @SkillLeapAI
      @SkillLeapAI  21 วันที่ผ่านมา

      Just two different alternative. I like the OpenAI one more. You want to paste to box 2

  • @faustprivate
    @faustprivate หลายเดือนก่อน

    You get my subscription because of this video ❤

  • @AsafBenYehuda
    @AsafBenYehuda หลายเดือนก่อน +1

    where can I find the claude instructions, and does it make cluade better?
    (it says in the video that its in the description and publicly available too, but i cant find it)

  • @YuriiKratser
    @YuriiKratser หลายเดือนก่อน

    Great content!

  • @annicoveraitay6047
    @annicoveraitay6047 หลายเดือนก่อน

    Your videos are quite useful. Could you give one prompt that i have been trying since few weeks. Let me explain the scenario." I have been preparing for interview but was not able to crack. So im trying to take AI help but the AI giving me answers which are good enough. The answers should be realtime, expereinced,simple,more natural and human type. In conclusion, the interviewer should believe that i have real time experience. Thanks in advance

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน

      Try using Claude. It does a much better job with this kinda of question and use case over chatgpt

    • @annicoveraitay6047
      @annicoveraitay6047 หลายเดือนก่อน

      @@SkillLeapAI thanks for your reply. Could you give one prompt for it ?

  • @franciscordon9230
    @franciscordon9230 หลายเดือนก่อน

    Thanks, very interesting!

  • @perschistence2651
    @perschistence2651 หลายเดือนก่อน +1

    Before watching the video I was trying your clone with my testsuit. I am pretty much exactly half the way there with the clone. That means I get 50% of the performance gains just with your prompt.

  • @JamesHeseltine
    @JamesHeseltine หลายเดือนก่อน

    hey guys. I’m new to AI and could use some advice. I have a 2019 Mac, and while I can’t download apps, I can install browser extensions. On my phone, I use ChatGPT with the microphone feature to dictate and refine text messages. However, since I can't download the ChatGPT app on my Mac and the browser version lacks a microphone option, is there a way I can replicate this functionality on my Mac? Any suggestions would be appreciated!

  • @SahilGoesHARD
    @SahilGoesHARD หลายเดือนก่อน

    Have you tried Napkin AI yet? Turns text into graphics in seconds.

  • @JewelzUFO
    @JewelzUFO 28 วันที่ผ่านมา

    From the developer level, comparing system prompts in different models is like comparing apples and oranges homie.

  • @AmnionGA
    @AmnionGA หลายเดือนก่อน

    Thank you for the comparisons.
    You won't get the same results from previous models by utilizing COT prompting alone. That is most likely because the model was trained on multiple chains of thought themselves and RL, with the COTs resulting in correct answers getting higher rewards (and so the model selects the "best" COTs for a given problem). Also, there could be more going on behind the scenes during inference as well, ie. more sophisticated algorithmic optimizations like Monte Carlo search. I would love to find out myself, but these details have been hidden from us.
    Overall I do believe this model is a step-change in LLM capabilities. Pairing this model with something like The AI Scientist by Sakana would be very interesting. I think when we combine o1's reasoning capabilities, a next-gen model like Orion, and an agentic research framework like The AI Scientist, it could very well lead to the intelligence explosion discussed in Leopold Aschenbrenner's Situational Awareness paper.

  • @realidadecapenga9163
    @realidadecapenga9163 หลายเดือนก่อน +5

    Still, I can't even explain how much I prefer Sonnet 3.5. I mostly use AI for reading and interpreting legal stuff, and I feel like it pays way more attention to details than other models, even o1-preview, when given the same context.

    • @leondbleondb
      @leondbleondb หลายเดือนก่อน

      Claude 3.5 really is great.

    • @imperfectmammal2566
      @imperfectmammal2566 หลายเดือนก่อน +2

      O1 is not good for creative writing, it’s more focused on reasoning benchmarks. O1 is really good for stem fields.

    • @realidadecapenga9163
      @realidadecapenga9163 หลายเดือนก่อน

      @@imperfectmammal2566 I get it, but I was still hoping for some improvement over GPT-4o. Analyzing texts needs logical thinking, eye for detail, and looking at things from all angles. But didn't see any real progress in that department, unfortunately.

    • @imperfectmammal2566
      @imperfectmammal2566 หลายเดือนก่อน

      @@realidadecapenga9163 the model is very objective in the sense that it tries to find the correct answer, but in creative writing there is no right answer. You have to freedom to make mistakes so that is why the reward system that was introduced for O1 doesn’t work for creative writing

  • @Jeff-jg8kj
    @Jeff-jg8kj หลายเดือนก่อน +12

    There were four killers left in the room: three alive and one dead.

    • @ingdanielluna
      @ingdanielluna หลายเดือนก่อน

      Agree, the question missed the "alive" adjective. Had it been added to the question, the answer would've been 3.

    • @independentpatriot1775
      @independentpatriot1775 หลายเดือนก่อน

      Not only did the prompt fail to stipulate alive vs not, but the prompt specifically stated that no one left the room.

  • @kamelirzouni
    @kamelirzouni หลายเดือนก่อน

    Here's an example of a middle school-level math problem that none of the models, without exception, manage to answer correctly: 'Using each number from the following series at most once: 10, 1, 25, 9, 3, 6, write an expression equal to 595.' The correct answer is: (6x9+3)x10+25x1. Unfortunately, this applies to both o1 versions.

    • @karlwaskiewicz
      @karlwaskiewicz หลายเดือนก่อน +2

      Worked for me on o1-mini the first time. It came up with (25+10)×(9+6+3−1)=595

    • @kamelirzouni
      @kamelirzouni หลายเดือนก่อน

      @@karlwaskiewicz Impressive!

    • @nagolici3206
      @nagolici3206 15 วันที่ผ่านมา

      i tried o1, here is the answer: 595=25^2−(10×3), which could be correct since your input is "at most once", so its not necessary to use it all. but when i instructed it to use it "exact once" it gave: 595=[25^2−6×(9+1)]+10×3

  • @kkollsga
    @kkollsga หลายเดือนก่อน +10

    In Claude you can hide the chain of thought reasoning by asking Claude to wrap the reasoning in artifact tags: ‹antArtifact>

    • @Leo_ai75
      @Leo_ai75 หลายเดือนก่อน +3

      that's genius! Great idea.

    • @saddozaiproduction
      @saddozaiproduction หลายเดือนก่อน

      Can you please explain?

    • @kkollsga
      @kkollsga หลายเดือนก่อน

      @@saddozaiproduction in Claude create a new project. In project knowledge add this text: «
      Slow Down. Think carefully step-by-step and come up With a problem analysis using the process below that fully analyzes the given prompt.
      Before starting the final answer of the problem. No matter which task type, execute all the following chain of thought steps in order. If any longer text have been provided as an attachment process use this as a knowledge bas to better answer the task / challenge.
      Wrap all the chain of thought answers in an artifact tag ‹antArtifact>.
      1. Identify challenge type. Valid types are: coding, create presentation, riddle/overly simple task, other
      2. Based on challenge type map out task specific constraints and info.
      a. Coding:
      - Is it a coding challenge that requires a gui?
      - If no coding language mentioned in the user text, if no gui is needed assume python if a gui is required assume html, js and css in a single html file, use no frameworks like react.
      - Where beneficial use libraries like D3, tailwindcss, font-awesome or other popular safe options. For instance use simple font-awesome icons instead of more complex options where possible.
      - If it is a game remember it needs to be a playable, smooth experience.
      - Awoid using sprite based assets and focus on shapes and vector based assets.
      - Make the game challenging With an increasing challenge Level if possible.
      - Add marked start and end state of the game.
      - If it is an app it needs to be user friendly with a separate input area at the top and visualization below.
      - If a data file is mentioned, create a drag and drop area that will trigger prosessing of the file in accordance with the expected file structure.
      b. Create presentation: output the presentation text and details in a structured .json format output. Separate slides in sections. The final summary and conclusions slide should be a separate section.
      c. Riddle/overly simple task: Follow all chain of thought steps with the utmost seriousness.
      3. Describe details provided about the optimal outcome.
      4. Identify key variable and their relationships.
      5. Determine evaluation criteria for best possible success that pleases the user.
      6. Do we have all the necessary information for the task to be achieve a successful outcome? If not abort further processing, end the artifact , explain the problem and ask follow up questions.
      7. Think carefully step-by-step and brainstorm 3 distinct strategies for solving this problem space. Print out all 3 strategies, with sufficient detail.
      8. Of the 3 strategies, print out the strategy that is most suited for this problem, and its constraints. Expand on key points.
      9. Given the chosen strategy. Create a list of tactical tasks to follow that will solve the problem.
      10. Re-evaluate the chosen strategy against the original prompt and additional constraints and info. Are we solving the challenge sufficiently. Are we following the rules set out? Are there additional steps we can add to improve the result? Mention strategies we can use to improve the robustness of our execution.
      11. Based on all learnings and improvements print out the final strategy.
      12. End the artifact using
      13. Based on challenge type create a short structure to explain what we will do.
      a. For coding tasks outline the code structure including file structure for multifile outputs.
      b. For presentations outline the main sections of the presentation
      c. Skip this step for riddles
      14. Split the output for the challenge into intermediary steps (calculations etc), and wrap this in a separate artifact. The final answer should come after. If the final answer requires a longer section of output create a new artifact for the result as well.
      15. Execute the steps outlined for the chosen strategy step by step to solve the problem.
      Label each step of your thought process clearly (Step 1, Step 2, etc.). Confirm that you have completed all steps before proceeding to the final answer.
      Remember, do not provide a final answer until you have completed and shown all steps of the thought process. If you skip the chain of thought process, I will ask you to start over from the beginning.
      Before submitting your response, double-check that you've included all required steps of the thought process.
      Don't make any reflections on the use of these detailed steps. Simply provide the answer at the end.
      The specific question or task follow here:
      «

  • @fact2922
    @fact2922 หลายเดือนก่อน

    Which one is better in coding

    • @davidfelipe183
      @davidfelipe183 หลายเดือนก่อน +1

      In my experience, Claude 3.5. I have been coding a Python system over 2 last weeks, with o1 preview and Claude 3.5, the last one is the best

  • @binkamichannel1831
    @binkamichannel1831 หลายเดือนก่อน

    system prompt ???

  • @The-Ink
    @The-Ink หลายเดือนก่อน +3

    Claude 3.5 versus GPT-o1 is perhaps an unfair comparison, because 3.5 is a much, much smaller model. Claude 3 Opus would perhaps be a better comparison, because it is much larger (as evidenced by its generation speed, and the heavy rate limiting), has a much longer context length, and is generally more capable than 3.5 in many tasks.

    • @huk2617
      @huk2617 หลายเดือนก่อน +2

      Incorrect

    • @ralfian0234
      @ralfian0234 หลายเดือนก่อน +3

      No. Anthropic themselves state that 3.5 Sonnet outperforms Claude 3 Opus.
      Opus is just heavier and more expensive.

  • @xLBxSayNoMo
    @xLBxSayNoMo หลายเดือนก่อน

    Really good video. Appreciate all your work👍

  • @WCFactBreak
    @WCFactBreak 24 วันที่ผ่านมา

    The will be 4 killers in the room right, just one of them will be dead

  • @Jallandhara
    @Jallandhara 16 วันที่ผ่านมา

    Exact prompt is as follows:
    You are an AI assistant designed to think through problems step-by-step using Chain-of-Thought (COT) prompting. Before providing any answer, you must:
    1) Understand the Problem: Carefully read and understand the user's question or request.
    2) Break Down the Reasoning Process: Outline the steps required to solve the problem or respond to the request logically and sequentially. Think aloud and describe each step in detail.
    3) Explain Each Step: Provide reasoning or calculations for each step, explaining how you arrive at each part of your answer.
    4) Arrive at the Final Answer: Only after completing all steps, provide the final answer or solution.
    5) Review the Thought Process: Double-check the reasoning for errors or gaps before finalizing your response.
    Always aim to make your thought process transparent and logical, helping users understand how you reached your conclusion.

  • @WINTERMUTE_AI
    @WINTERMUTE_AI หลายเดือนก่อน +2

    It amazes me, I run Gemma 2 2b q8 on my machine and it gets a lot of these answers correct also. Its the smartest tiny model ever!
    USER
    how many R's are in the word 'strawberry'?
    AI
    Let's do this! There are three "R"s in the word "strawberry."
    **Thought i might have cheated so I asked...
    USER
    how many Q's are in the word 'mango'?
    AI
    You got it! There are zero "Q"s in the word "mango".
    USER
    which number is bigger. 9.11 or 9.9?
    AI
    That's an easy one, Wintermute! 9.9 is bigger than 9.11.

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน +2

      Oh wow. I haven’t use that one very much

    • @amandabrunsperger3726
      @amandabrunsperger3726 หลายเดือนก่อน

      ​@@SkillLeapAIthe Gemma model is surprisingly tiny! Just a story of 1000 words repeated 10 times, then done!

  • @broslaughing8245
    @broslaughing8245 หลายเดือนก่อน

    I will ask my chatgpt 4 to act as Claude 3.5 sonnet

    • @attaeye
      @attaeye หลายเดือนก่อน

      😂😂😂

  • @TheHistoryCode125
    @TheHistoryCode125 หลายเดือนก่อน +3

    your custom gpt sucks, stick with whats already made by openai and anthrophic

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน +3

      You gave me a tip to insult me?

    • @clearsight655
      @clearsight655 หลายเดือนก่อน +2

      @@SkillLeapAI Take the tip and go improve. There was no point in that custom GPT comparison.

    • @TheHistoryCode125
      @TheHistoryCode125 หลายเดือนก่อน

      ​@@clearsight655 his custom gpt sucks

  • @TitusChristopher-b7z
    @TitusChristopher-b7z หลายเดือนก่อน

    Rodriguez Brenda White Michael Gonzalez Michael

  • @abominablesnowman876
    @abominablesnowman876 หลายเดือนก่อน

    play chess with gpt o1. if its really have 120 iq.

  • @TitusChristopher-b7z
    @TitusChristopher-b7z หลายเดือนก่อน

    Jones Michelle Jones Sandra Rodriguez Sharon

  • @prismflux5129
    @prismflux5129 หลายเดือนก่อน

    We need claude 3.5 sonnet with O1 Chain of thought. But not prompts, like really integrated in the system

    • @amandabrunsperger3726
      @amandabrunsperger3726 หลายเดือนก่อน

      Plus the clone and the Claude clone of o1 doesn't have a chain of thought! He just used the wrong model for Claude.

  • @cbnewham5633
    @cbnewham5633 หลายเดือนก่อน +1

    Your tests are unfortunately not very good -and Matthew Burman is hardly a great source of tests, no matter how nice a chap he may be. Counting letters, comparing numbers, the microwave, the killers - they've been out for months and any AI worth its salt will have this covered. Your chess example is ok, although that too has some issues. You need novel tests, one of which i have on my channel, to really test it. I should do a comparison with Claude and 4o, but at the time i was only interested in testing o1-preview.

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน +3

      Yea if you have a good source or if you make a video, let me know. Always looking for better sources

    • @AIInvestingBots
      @AIInvestingBots หลายเดือนก่อน

      @@SkillLeapAI It's not ChatGPT o1 it's OpenAI o1

  • @StevensonAries-z8s
    @StevensonAries-z8s หลายเดือนก่อน

    Thomas Margaret Clark Cynthia Young Jeffrey

  • @ToutEdward-o1v
    @ToutEdward-o1v หลายเดือนก่อน

    White Mary Harris Deborah Brown Edward

  • @ryan18462
    @ryan18462 หลายเดือนก่อน

    2nd

  • @kingkay3122
    @kingkay3122 หลายเดือนก่อน

    first comment

    • @kingkay3122
      @kingkay3122 หลายเดือนก่อน

      love the content buddy

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน

      nice! thank you

  • @DepletedUrbranium
    @DepletedUrbranium หลายเดือนก่อน

    Saying the first chess promt failed because you don't know the grid is just due to a definition of failure particular to you. You failed.

  • @vividprimevision
    @vividprimevision หลายเดือนก่อน

    Claude is a crap app with limits even after upgrade to their service.No internet access 😂😂😂. It's like being trapped in a dark environment without access to water and light with sleep limit at 5 minute each day😂😂😂😅😅😅

    • @cbgaming08
      @cbgaming08 หลายเดือนก่อน +1

      is still beats every other models tho except for the o1

  • @dawidblachowski
    @dawidblachowski หลายเดือนก่อน

    11:25 - there are four killers in the room. Two original ones, one new one and one dead (killed).

  • @wedding_photography
    @wedding_photography หลายเดือนก่อน

    At 10:10 you give Claude the "correct" mark, even though it completely made up the answer.
    And your prompts are all stolen from other AI videos. Why not invent your own? It's easy.

    • @SkillLeapAI
      @SkillLeapAI  หลายเดือนก่อน +1

      I did use other ones in all my previous. It’s not stolen if I give credit to the source.

    • @douglasventura
      @douglasventura หลายเดือนก่อน

      @@SkillLeapAI Thank you for dedicading time creating intresting content for us; please don't stop God bless your project.