Claude 3.5 Sonnet vs GPT-4o: Side-by-Side Tests

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ก.ย. 2024

ความคิดเห็น • 295

  • @AnthonyGoubard
    @AnthonyGoubard 3 หลายเดือนก่อน +257

    I think given no point when both are correct may bias the final result. Let's say, you've done 20 tests, 15 are the same results, 1 gpt4o is better, 4 Claude Sonnet is better. The score is then 4-1 for Clause Sonnet but actually it's more 19-16.

    • @luigideff
      @luigideff 3 หลายเดือนก่อน +32

      Yea exactly. It totally makes a perception difference.

    • @beautifulandtoolate
      @beautifulandtoolate 3 หลายเดือนก่อน +10

      A draw is typically 0.5 points each

    • @FloatingWeeds2
      @FloatingWeeds2 3 หลายเดือนก่อน +5

      Do 0.01 points each for a draw so you can have two different categories of data in one number 😘

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน +47

      Good call, it looks a lot different when you include ties. I recalculated the final scores adding in a point for ties, and the final tally is GPT-4o: 17 and Claude 3.5 Sonnet: 19.
      That shows a clearer picture of how close these models are 👍

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน +16

      Nice. That is my kind of useless optimization 😎 This would end up with GPT-4o at 6.11 and Claude 3.5 Sonnet at 8.11

  • @MartinJefferies-j1d
    @MartinJefferies-j1d 3 หลายเดือนก่อน +217

    Summary: 1. both are great. 2. don't use either for fact finding. 3. Since they are both free, use both simultaneously.

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน +14

      A bit reductive :) But yeah, that's the gist! Both are really good and have different strengths.

    • @theplaylistpsycho
      @theplaylistpsycho 2 หลายเดือนก่อน +5

      Calling both free isn't entirely accurate, both are free but limited access.
      When you hit the free user limit, Chatgpt forces you from 4o to the 3.5 model, and Claude just bars users from using since they currently don't have an unlimited use model for free users.

    • @MeQt
      @MeQt 2 หลายเดือนก่อน +3

      They arent free though?

    • @oberpenneraffe
      @oberpenneraffe 2 หลายเดือนก่อน

      @@MeQt Some requests are free, buts there is a strict limit after a few requests. Then you have to wait a few hours before you can use it again for free.

    • @JamesR624
      @JamesR624 2 หลายเดือนก่อน +1

      "don't use either for fact finding"
      If neither of these can even do basic fact-checking, then what's the point?
      So basically they're nifty gimmicky chatbots but with a practical usability that's outclassed by Assistant and Siri from a decade ago since those actually search the web and give you sources.

  • @costicanu7
    @costicanu7 2 หลายเดือนก่อน +4

    on writing code, gpt 4.o is way better than sonnet 3.5
    I tried them both multiple times, sonnet 3.5 sometimes does not understand when is a harder task.
    sonnet 3.5 surprised me when I asked for a solution, his answer was suitable for my task.
    Very good to have them both!

    • @barafwal253
      @barafwal253 2 หลายเดือนก่อน

      Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files.
      Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too?
      In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?

    • @_fuji_studio_
      @_fuji_studio_ หลายเดือนก่อน +1

      bruh its the opposite, claude sonet 3.5 is way better at coding, i switched to it for my coding assintance the moment i supprised it give cery good answer while gpt can't and gpt repeat the same wrong answer

    • @coursehub2407
      @coursehub2407 25 วันที่ผ่านมา

      @@_fuji_studio_ same i also swtich to sonnet 3.5 for coding

  • @SSS-100M
    @SSS-100M 3 หลายเดือนก่อน

    Your video is great! I can understand the difference between Claude 3.5 Sonnet and GPT-4o. Also, I canceled GPT-4o because Claude 3.5 Sonnet is better than GPT4o. When I want to create an image, I use Gemini1.5 Pro.

  • @LikeAPro.1995
    @LikeAPro.1995 19 วันที่ผ่านมา

    Also, you 19:29, you gave point to GPT but you didn't increase its point on the screen. Thank you though for your contribution. It was interesting

  • @MudroZvon
    @MudroZvon 3 หลายเดือนก่อน +3

    Instant subscribe!

  • @229Mike
    @229Mike 3 หลายเดือนก่อน

    I don’t know if I can agree with this fellow. I tried using Claude 3.5 sonnet and it got a picture breakdown, incorrect by the timeline that was present. No problems wit chat.
    And at that point, I just didn’t wanna trust Claude.

  • @drlordbasil
    @drlordbasil 3 หลายเดือนก่อน +81

    Claude Sonnet is wayyy better for complex tasks and assistance in debugging.

    • @KroeSufos102
      @KroeSufos102 3 หลายเดือนก่อน +1

      Perfect for my work!

    • @GeeGnebAb
      @GeeGnebAb 2 หลายเดือนก่อน +1

      yeaaa crazy how good sonet is, it's like talking to a professional who can really solve and explain the problem

  • @Ivan7Kovnovic
    @Ivan7Kovnovic 3 หลายเดือนก่อน +42

    The GDP 2018 question was actually answered correctly. According to every source I found on the internet, Germany was 4th and the UK was 5th.

    • @manuelcardoso5830
      @manuelcardoso5830 3 หลายเดือนก่อน +4

      Agreed, I saw the same thing. Both AIs were rigth.

    • @gege151500
      @gege151500 3 หลายเดือนก่อน

      I actually found India on the IMF website?

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน +11

      Yup, I messed up here 😬 I saw the GDP based on PPP rankings, and mistook it for nominal GDP. The AIs got it right! Source: en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)

    • @DGaryGrady
      @DGaryGrady 2 หลายเดือนก่อน

      @@PatrickStorm_ There are different ways of computing GDP depending on the purpose of the comparison, so there is no single right answer. The most usual comparison is based on currency conversion, but PPP (purchasing power parity) is more appropriate in some contexts, especially if you're ultimately looking at GDP per capita and standard of living.
      (Incidentally, depending on when you ask the question, the comparison changes if you pretend California is an independent country. Open­AI and Anthropic are both based in San Francisco, but that's just a coincidence. For now.)

  • @briankgarland
    @briankgarland 3 หลายเดือนก่อน +26

    I pay for both, primarily for coding, and haven't used 4o since Sonnet came out.

    • @baldeeptiwana
      @baldeeptiwana 2 หลายเดือนก่อน

      I also to want to buy the paid version for one of them for my python-gis project. Do you think Sonnet is better for coding?

    • @mikemin5
      @mikemin5 2 หลายเดือนก่อน

      @@baldeeptiwanaNow I’m not a coder or even know what python-gis means, but I think that if you’re trying to make a rendered program, Claude’s split screen is nice, but it shouldn’t be a dealbreaker. I use ChatGPT all the time for coding and with few prompts, can get a perfect piece of code that works exactly how I want it no matter how complex my needs are. I’ve made tons of sites and little programs in html, and both AI’s are definitely gonna be good. A 3% difference in some random test shouldn’t show you where to put all your money.

    • @zHqqrdz
      @zHqqrdz 2 หลายเดือนก่อน

      @@baldeeptiwana It objectively is

    • @barafwal253
      @barafwal253 2 หลายเดือนก่อน

      Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files.
      Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too?
      In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?

  • @РодионСадыков-е2г
    @РодионСадыков-е2г 3 หลายเดือนก่อน +10

    GPT’s recognising Obama’s prank is astonishing

  • @blueicicle1973
    @blueicicle1973 3 หลายเดือนก่อน +54

    it sounded a little biased towards Claude

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน +13

      Yeah, I have to figure out how to do this as a blind test next time. Thanks for the feedback

    • @adammccoy1
      @adammccoy1 2 หลายเดือนก่อน +2

      @@PatrickStorm_veryyyy biased exp the marking system , disliked

    • @shreyashsingh8682
      @shreyashsingh8682 2 หลายเดือนก่อน

      ​@@adammccoy1 Have you ever tried Claude? It's better than Chat gpt 4o for complex tasks

    • @paulustangkeallo7840
      @paulustangkeallo7840 2 หลายเดือนก่อน

      @@PatrickStorm_ One alternative is to review the output without knowing which LLM produces that output.

    • @_fuji_studio_
      @_fuji_studio_ หลายเดือนก่อน

      bruh claude sonet 3.5 is way better, i use it for coding at its amazing

  • @rachelsnijders817
    @rachelsnijders817 2 หลายเดือนก่อน +1

    Claude was better at writing the short scene with the bunny, but still made a lot of mistakes. For example: the smell of revolution. What? (also, a smell does nothing to cover up theft)
    You can use AI when brainstorming ideas in creative writing, but do the actual writing yourself 😉

  • @aidajam3294
    @aidajam3294 2 หลายเดือนก่อน +1

    It is garbage. I have really have long experience using different GPTs (in programming). Claude (at least Sonnet) is making up stuff a lot. You should pay more attention how precise it is. In the end you will spend more time just to checking documentations, API and etc. On contrary ChatGPT (4o) is verbosy (even with your custrom instruction) but more accurate. IMHO, it is better to get annoying verbosy results instead of keeping constantly in eye on result's precison. I will switch back to ChatGPTo , I think the hype is over

  • @zejdzglebiej
    @zejdzglebiej 3 หลายเดือนก่อน +5

    The question is, what do you mean by writing a better text? I'm afraid that you evaluate texts too positively, where there is understatement, and a lack of logical structure with opening and closing. You perceive it as an aura of mystery. That's why Clodie cheated on you, because what he couldn't do, you interpreted as good writing.

  • @ktms1188
    @ktms1188 3 หลายเดือนก่อน +6

    Claude 3.5 and GPT-4o both have their strengths, and it’s fascinating to see how they differ. Claude feels more human, like it’s really trying to understand what I’m asking, but then I’ve noticed with the memories function in GPT the model I think knows a lot more when I’m trying to ask now so now has much better answers like Claude 3.5. My issue is sometimes it hits those frustrating blocks and says it’s unable to answer my question, which drives me nuts even when it’s nothing controversial and it clearly would know the answer. I noticed in one of their talking points that is one of their big things. They are working on as it is overly restrictive and they know it so improve that. GPT-4o, on the other hand, is super analytical but occasionally needs me to rephrase my questions to get the best answers.
    I’ve been using both for a while now, and here’s what I’ve found: Claude’s artifact mode is mind-blowing, it’s nice if you’re on an iPhone or iPad since no android app. GPT’s memory function is a game-changer, making it more accurate over time as it learns from our interactions.
    Wouldn’t it be amazing if they combined the best of both worlds? I’d love to see a deep dive comparison between custom GPTs like “Scholar” and the standard GPT-4o, especially for fact-based questions. Does the customization really boost accuracy?

  • @FrancescoDellaValle
    @FrancescoDellaValle 3 หลายเดือนก่อน +15

    I appreciate your work, but I found this video biased and inconsistent in judging the two models' responses. In two or three tests, you verbally preferred ChatGPT, yet you didn't award any points and declared a draw. This doesn't seem unbiased to me.

  • @trainspotting02
    @trainspotting02 2 หลายเดือนก่อน +2

    Sorry these are such basic tasks. A proper test is a custom environment, action and reward used in reinforcement learning.

  • @yaroslavzdanovskiy5704
    @yaroslavzdanovskiy5704 หลายเดือนก่อน +1

    Patric, you seriously think that 4o is smarter then 4? Have you even used both to compare? I think if you had you would know that comparing claude to 4o make no sense whatsoever.

  • @sergeyromanov2751
    @sergeyromanov2751 3 หลายเดือนก่อน +26

    Your list of questions was not balanced. You focused too much on the language problems (where Claude 3.5 Sonnet is clearly ahead) and completely ignored the logic, reasoning and math problems (where GPT-4o would crush its opponent).

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน +14

      Totally valid critique. I'll add those sections in to future model comparison videos.

    • @-Meric-
      @-Meric- 2 หลายเดือนก่อน +3

      GPT is pretty bad at logic and math based on benchmarks. Claude would probably win there as well

  • @NithinJune
    @NithinJune 3 หลายเดือนก่อน +5

    for the coding tests you should do a plagiarism check to check if it is straight ripping someone’s code

    • @incription
      @incription 2 หลายเดือนก่อน +2

      sometimes it does, but programmers also are guilty to that a lot... if it works, no need to rewrite it

  • @djayjp
    @djayjp 3 หลายเดือนก่อน +6

    19:32 You forgot to give the point to GPT-4o in the tally.

    • @AJBarea
      @AJBarea 3 หลายเดือนก่อน +2

      he gives gpt-4o two points and claude 0 points for that whole category less than a minute later around 20:13

  • @mrchrizztech1050
    @mrchrizztech1050 2 หลายเดือนก่อน +2

    These coding question are really useless imo....
    I would ask these questions:
    1. Given our custom Database (.db) file we want to find out what the average income is for each item. We want a webpage that shows us this total and where we can search for items to find the totals. (Testing coding, HTML, Css, js and also SQL and DB knowledge)
    2. Create a very small application that runs as a windows service that creates a window we need to close to start a counter. After x seconds we want that window to reappear and to close (Tests out knowledge of OS and code)
    3. Given an foreign API we have the endpoint x. (Insert here custom endpoint). We want to transform the response in a way where we can use it to show it innfor example our windows service each time it pop ups

  • @timothyhernandez5141
    @timothyhernandez5141 3 หลายเดือนก่อน +2

    How about Claud 3.5 to gpt 4? , which is better 😅Thanks

    • @amirhossein1108
      @amirhossein1108 2 หลายเดือนก่อน

      What is the difference between gpt 4 and gpt4-0??

  • @JosefTorkelsen
    @JosefTorkelsen 3 หลายเดือนก่อน +3

    Great job on the video dude! I also agree with your results for yourself at the end that discusses how you plan to use them. I like Claude but without those extra things, Chat GPT is my daily driver.

  • @journees4300
    @journees4300 3 หลายเดือนก่อน +2

    This is not a well thought out test. There are actually standardize methods to compare AI models. Look for GLUE, COCO, MS MaRCO, SQuAD, etc. It depends on what aspects do want to compare.

  • @codingzen869
    @codingzen869 2 หลายเดือนก่อน +1

    I just like the fact that Gemini doesn't even count anymore. They are way off the race. Google deserves every bit of it.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      I wouldn't count Google out just yet, Gemini has been slowly getting better and better. Who knows which model will be leading in 6 months or a year.

  • @SurfCatten
    @SurfCatten 3 หลายเดือนก่อน +2

    Pretty impressive for a fairly small TH-cam channel! If you can get the visibility I expect your channel will do very well. Unfortunately for that to happen you need to go pretty far down the clickbait TH-cam algorithm there's just no other way to build views and subscriptions

  • @EternalRecurrence88
    @EternalRecurrence88 4 วันที่ผ่านมา

    You should do one with music writing. I’m about to test out claud now. Chat gpt has been mostly disappointing. It excels in defining genres, and instrumentation. Its really vague in allot of sound design areas. The worse part is the musical notation itself. Weather its suggestions are flat out boring (cookie cutter to the extreme), or just doesn’t have a great way to display timing of what its trying to describe. In gpt3 i was able to get it to write me some midi. Which was flat out bonkers. In 4 it sent me a link to download a midi from Drive. The link was broken. I asked it to send me a new link in which it responded it couldn’t use Drive….. when it comes to UE5 blueprints, i was able to get it to do things there are literally no tutorials for on YT or google results. It took me months of fine tuning to get a decent enough data table CSV. To its credit though the data table was over 8000 rows and about 6 columns. But the trial and error had allot of headaches attached.

  • @AbhisheksinghbhadauriyaG
    @AbhisheksinghbhadauriyaG 3 หลายเดือนก่อน +6

    00:03 CLA 3.5 Sonet outperforms GPT 40 in benchmarks
    02:27 Claude 3.5 Sonet outperforms GPT-40 in speed and live code demonstrations.
    04:46 CLA 3.5 Sonnet outperforms GPT-40 in creative writing tests
    07:13 Comparison of performance between CLA 3.5 and GPT-40 models
    09:52 Comparison of Claude 3.5 Sonet and GPT-40
    12:27 Difference in code review of Claude 3.5 Sonnet and GPT-4o
    14:57 Comparing GPT-3.5 Sonnet and GPT-4o
    17:28 Comparison of GPT-40 and Claude 3.5 Sonnet performance on trivia questions
    19:49 GPT-40 performed better in factual accuracy
    22:04 Claude 3.5 Sonnet outperformed GPT-40 in understanding and summarizing human emotions.
    24:18 CLA 3.5 Sonet offers better performance and cost saving

  • @u4icdissonance180
    @u4icdissonance180 3 หลายเดือนก่อน +2

    Excellent video, I think you did a good job of being objective. Just a note, if you're looking for conversation and support, tell GPT you're looking more for an emotionally supportive answer than a solution based answer. Then what you get out of it is very similar to Claude. Claude still deserves the point because your average user likely won't think to try that, but the option is there for people that want more conversational GPT.

  • @hanfo420
    @hanfo420 2 หลายเดือนก่อน +1

    Claude didn‘t even get that it was Obama.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      Lol I missed that. That should have been a negative point even!

  • @eburgwedel
    @eburgwedel 3 หลายเดือนก่อน +2

    Could have been a good comparison, but wasn’t - image gen and facts make absolutely no sense; reasoning was missing entirely. It did point out a few important things, though, so thank you.

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน

      I appreciate the feedback! I hear you about the reasoning not being in there. That was a miss, future comparison videos will definitely have that section… and probably won’t have the facts section. Glad you got some stuff out of the video though!

  • @joserubio3036
    @joserubio3036 15 วันที่ผ่านมา

    I have used gemini 1.5 pro, chatgpt 4o and Claude sonnet 3.5 base model and OMG Antropic made a great work. The results where MUCH MUCH better in every single aspect I used he models for:
    - Coding
    - Data analysis
    - Huge data information sum ups
    - Getting insights and core info from research papers
    - Law content
    - Studying purposes: making Anki type questions, creating Feynman technique summaries, etc
    By far Claude offered me the best outputs out of all and it was just the free version... definitely gonna give a try this one for a while and analyse even deeper. If you ask me definitely pick Claude pro version which is even cheaper than rest of LLMs

  • @djayjp
    @djayjp 3 หลายเดือนก่อน +3

    I'm pretty sure limes float if they have been squeezed and sink if they haven't.

    • @jerkface38
      @jerkface38 3 หลายเดือนก่อน

      The esther question is also dumb. Let's be honest here, the info on the internet is ambiguous. Lots of info saying she was married in 1981 or 1982. Only the wiki says m. 1985 but with no real context. Had he asked both models why they said what they said, he would have gotten a refined response. Also, he only gave 1 point for image generation? I'm sorry but that's at least a 3 pointer even if some of the images suck.

  • @LikeAPro.1995
    @LikeAPro.1995 19 วันที่ผ่านมา

    7:28 >> Sorry, but GPT found it was a brick while Claude found it as a stone, which is not correct.
    8:02 >> Sorry, but GPT found it was Obama, yet Claude only said it was the man in suites, which is not correct.

  • @azhuransmx126
    @azhuransmx126 2 หลายเดือนก่อน +1

    So biased😮

  • @baltakatei
    @baltakatei 16 วันที่ผ่านมา

    08:40 As someone who has a `# journal --follow | less +F` terminal for my home server in lieu of a fish tank, I must warn you all AI companies are furiously and continuously downloading and re-downloading everything that won't block them on the Internet. anthropic, bytedance, openai, etc. I also know they've brought the Internet Archive down at least once with their aggressive spidering. Better tests would be completely novel images never uploaded.

  • @Le_Lys_Eclectique
    @Le_Lys_Eclectique หลายเดือนก่อน

    Great video thanks!
    It would be really awesome if you get them both speaking with each other to, i font know, maybe how to solve some of the greatest problems of humanity. And maybe have them choose, together, what the 1st most important would be!
    Do you think if what I’ve suggest is even possible???

  • @512Bytes
    @512Bytes 3 หลายเดือนก่อน

    I have a video on my Chanel integrating all Ai models into LINUX, ChatGPT, Gemini, Claude and more.

  • @Repz98
    @Repz98 3 หลายเดือนก่อน +8

    This video was really well made, and I enjoyed it through the entire video! I thought I was watching someone with 200k plus subs, based on the quality of this content. Keep it up, I’m subscribing now!

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน +1

      I'm really happy you liked it :) and you made my day saying it was comparable to a channel with 200k subs!

  • @maximiliandegarnerinvonmon6457
    @maximiliandegarnerinvonmon6457 3 หลายเดือนก่อน +2

    They both actually got it correct about the 5th highest GDP in 2018.

    • @maximiliandegarnerinvonmon6457
      @maximiliandegarnerinvonmon6457 3 หลายเดือนก่อน

      Gives a scary example of how we presume that we are superior 😂😂

    • @maximiliandegarnerinvonmon6457
      @maximiliandegarnerinvonmon6457 3 หลายเดือนก่อน

      We are usually 4th but 2018 was the time we dropped fir that year and again just recently last year. It's a sticking point here due to elections and that's how we know 😂😂😂

  • @phillipbones7522
    @phillipbones7522 3 หลายเดือนก่อน +1

    Videoes like these are pointless for the simple fact its nothing more then comparing two software apps with one being a half step either behind or in front. Unless Claude can take that full step forward which it failed to do this time around then ChatGPT will just sprint forward again with another great leap. I can only afford one and for now sticking with ChatGPT as it has more protential in what I need it for

  • @vuhoang5903
    @vuhoang5903 3 หลายเดือนก่อน +5

    My experience is GPT is still better than Claude in logic, reasoning and problem solving (for coding, math, data analysis,...)

    • @SharvindRao
      @SharvindRao 3 หลายเดือนก่อน +2

      Blah blah blah

  • @ianimewhatif862
    @ianimewhatif862 2 หลายเดือนก่อน +2

    You sound AI generated

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน +1

      Lol, I promise I'm human. I did run my voice through an AI enhancer, I think I'm going to stop doing that going forward - I got a few of these comments 😬

    • @ianimewhatif862
      @ianimewhatif862 2 หลายเดือนก่อน

      @@PatrickStorm_ Yeah, AI enhancers aren’t rlly great at keeping voices human. I rlly think people would appreciate it more if they heard a definitely human voice than to hear something in between both. Thx

  • @BICYCL3
    @BICYCL3 2 หลายเดือนก่อน

    for educational purposes please make them do language arts homeworks and algebra 2....for educational purposes

  • @RoseAlternative
    @RoseAlternative 2 หลายเดือนก่อน +3

    Thanks for putting in the time and effort to make this video! I was wondering if I should renew my GPT-4o, or try Claude for the first time. Now I'm set on trying Claude.
    The video quality is amazing, keep up the good work! :)

    • @litpapi1849
      @litpapi1849 2 หลายเดือนก่อน

      same here! been a long time chatgpt user we'll see how this goes

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน +1

      Thanks for the kind words! I currently have both ChatGPT and Claude subscriptions, but if I had to choose just one, I would probably go with Claude. It's sort of a toss up at the moment.

    • @RoseAlternative
      @RoseAlternative 2 หลายเดือนก่อน

      @@PatrickStorm_ I’m absolutely loving Claude right now. The ONLY downside I’m experiencing compared to GPT4o is that it feels like I’m being limited too much with the amount of messages I can send, and also the maximum of 5 images per conversation can be a pain. Overall though, I’m using it for code assistance and it has phenomenal coding uses.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      @@RoseAlternative Yeah, the message limits really have been a pain. That's usually when I switch to ChatGPT :)

  • @brezza6892
    @brezza6892 3 หลายเดือนก่อน

    Germany was 4th not 5th. They were both correct and you were wrong. According to the WEF that is.

  • @Kutsushita_yukino
    @Kutsushita_yukino 3 หลายเดือนก่อน +1

    claude 3.5 sonnet lost it’s EQ OPUS had though….so it’s not reliable as a conversational model. OPUS is still worth it if you get tired of sonnet 3.5’s robotic bland responses. try comparing them if you don’t believe me. this is not subjective, it’s fact that sonnet 3.5 sacrificed emotional intelligence for more specs.

    • @117ao
      @117ao 3 หลายเดือนก่อน

      exactly!

  • @MusicStudioNYC
    @MusicStudioNYC 16 วันที่ผ่านมา

    Well done!! Can you do it again with the new GPT-o1 model?

  • @nicholasfabris130
    @nicholasfabris130 3 หลายเดือนก่อน +1

    In round 5 UK was the correct answer

  • @terryterry1655
    @terryterry1655 2 หลายเดือนก่อน

    ask tis.. predict team playing on final UEFA 2024 and predict score

  • @BrokenNat
    @BrokenNat 2 หลายเดือนก่อน

    The less input we give it the less stupid shit we unknowingly feed it. We will never reach true comcious ai with real intellagence that surpasses ours untill we start training it on the true rules of the universe.

  • @MassiveDerek
    @MassiveDerek 2 หลายเดือนก่อน +2

    3:23 i thought someone was inside my house whistling

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      Lol sorry about that. I was going for a western theme for that part.

  • @andrewslabbert4316
    @andrewslabbert4316 2 หลายเดือนก่อน +1

    I've watched a lot of AI videos out there, this one was truly helpful. You've gained my subscribe & my full attention Patrick! Thank you!

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      I really appreciate it. Glad you're getting something from my videos!

  • @YoKKJoni
    @YoKKJoni 3 หลายเดือนก่อน

    been using claude for 6 months now..
    never seen a reason to use anythign else..

  • @cesarsfalcao
    @cesarsfalcao 2 หลายเดือนก่อน +1

    I'm using GPT 4o for free, it's a win.

  • @mitakshara158
    @mitakshara158 หลายเดือนก่อน

    so, gpt is better at emotions and closer to the much feared AGI

  • @JamesTeeter-t9c
    @JamesTeeter-t9c 24 วันที่ผ่านมา

    Johnson Jeffrey Walker Timothy Jones Christopher

  • @hidd3n_
    @hidd3n_ 3 หลายเดือนก่อน +1

    imagine if we had their child, Claude Pete 4.5

  • @singhbhai
    @singhbhai 3 หลายเดือนก่อน +1

    Claude is better at Coding really really fast.

  • @robwin0072
    @robwin0072 หลายเดือนก่อน

    Hello,
    Good video. I liked and subscribed.
    First, I think you stiffed GPT4o on the R:8 summary question. Yes, it was more than 300 words-however, since it hit all the aspects of the dense article, GPT4o should have received a point.
    Also, the prompt scrolled fast-I was unable to read it-so I don’t know if you asked to limit it to 300 words.
    Second, I have to write a production program for a small operation insurance company. I will use GnuCOBOL; which of the two would you use to assist in that project?

    • @PatrickStorm_
      @PatrickStorm_  หลายเดือนก่อน +1

      I would use Claude. In every benchmark I’ve seen, Claude is the leader for coding.

  • @caresvlbdjz
    @caresvlbdjz 3 หลายเดือนก่อน

    for coding claude 3.5 sonnet is much better in my tests

  • @shreyasnaik5600
    @shreyasnaik5600 หลายเดือนก่อน

    I think it would be better to do a blind comparison.

  • @AndresIbanezVasquez
    @AndresIbanezVasquez หลายเดือนก่อน

    Thanks for the comparisson! I do agree in the end it comes down to personal preference, and people should really try both (and perhaps also Gemini) to see which one suits them better. I used to write poems in my youth, I was quite fond of them, and I actually prefered the poem GPT gave you, it felt more elegant with more "fancy" but also soothing words, while Claude in my opinion gave a not very memorable rendition with somewhat generic and common words. But again, its personal preference.

    • @AndresIbanezVasquez
      @AndresIbanezVasquez หลายเดือนก่อน

      After trying this prompt myself, I also added "pretend you are a world-class poet" and Claude's version was almost on par with GPT in my opinion, so I guess providing detailed prompts is also very useful!

  • @Improstor
    @Improstor 2 หลายเดือนก่อน

    Recommend to double check if you are opposing two massive LLM in fact findings (Germany GDP). Interestingly though, even the astronaut question, ChatGPT can answer correctly if asked for a list of astronaunts, presumably some pharsing/question understanding issue both LLMs may share. When I challenged ChatGPT why it answered wrongly, it corrected itself. LLMs are just kids with endless amount of information in their head, but sometimes too stupid to understand questions.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      You are right, the structure of the prompt can massively improve (or degrade) the performance on different tasks. My goal was to limit bias, so I wanted to go with the simplest prompt, but I'm sure I could have gotten better answers using a handful of prompting techniques

  • @rupertllavore1731
    @rupertllavore1731 2 หลายเดือนก่อน

    Claude 3.5 sonnet is already OP right now Maybe you can Even the odds by Allowing ChatGPT 4o to use it's GPT store Variants! Like "Data Analyst" Then "Write for me" And "ScholarGPT" Hahaha

    • @rupertllavore1731
      @rupertllavore1731 2 หลายเดือนก่อน

      HAHA CHATGPT main Claude3.5 on the side API usage :))

  • @kaicex
    @kaicex 2 หลายเดือนก่อน

    I understand that in the free version of Claude, you get 5 free queries with Claude Sonnet. How many free queries will I get with Claude Sonnet if I buy the Pro plan?

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      I chat with it all day and run out most days by about 3pm. It's way more than 5, but it depends on how long the chat is. I would say that I easily get 50 message in before it tells me to wait a couple hours. But, then I just switch back to ChatGPT!

  • @IrisSappington-s3c
    @IrisSappington-s3c 26 วันที่ผ่านมา

    Anderson Sarah Smith Robert Thomas Carol

  • @KeiferStreet
    @KeiferStreet 2 หลายเดือนก่อน

    This isn’t a fair comparison. ChatGPT has major advantages in being able to scour the web for more recent information and Clause does not. In the one category where ChatGPT dominated, you added a qualifier that this is the worst way to use LLMs, so it felt like a concession to Claude. This entire video feels very biased toward Claude.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      Fair points. At the end, I did say that I’m going to stick to ChatGPT for the majority of my work in part because of the web search feature, but I think I could have done a better job being unbiased. Thanks for the feedback

  • @SlyMaelstrom
    @SlyMaelstrom 2 หลายเดือนก่อน

    It's hard to tell if ChatGPT 4o is doing a better job with the Obama picture as it's a known picture that has been so heavily reported on. Given that 4o was trained multi-modally, it wouldn't be surprising if this very picture was in its data set along with the description of it. You can't really say it understands the humor if it is potentially responding based on consumed data. I think you would have to recreate an image with the same humor, albeit with different people, at a different angle and setting, etc... and see if it still understands what its looking at.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      Yeah, that's a valid point. Because it is so widely cited of an image, I'm actually more surprised that Claude didn't get it right.

  • @NithinJune
    @NithinJune 3 หลายเดือนก่อน +1

    image generation should be minus points lmao

  • @julianvillaquira4127
    @julianvillaquira4127 2 หลายเดือนก่อน

    Where are you taking your answers from? (for example, the GDP one I think Germany came fourth, not fifth)

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน +1

      I tried to get questions with clear answers that I found from multiple sources online. That specific question was actually wrong though, or at least not entirely correct. This got called out in another comment, but I was using a different calculation of GDP than the common one that both LLMs answered with.

  • @fernandoz6329
    @fernandoz6329 3 หลายเดือนก่อน +1

    In this type of showdown finding which is the best, I think that would be useful to the evaluator not know who is creating the answer to avoing being biased for personal preferences.
    There were few answers where I disagree, so maybe I'm biased too.
    For coding, DeepSeek 2 is outstanding.

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน

      That is a really good point. I am very sure I would be able to distinguish between them even if it was a blind test, so I wonder how I could do that 🤔 I’ll keep this in mind for future comparison videos though. Thanks for the feedback.

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน

      And yes, DeepSeek 2 looks really good, I haven’t tried it out yet, but it’s on my list!

  • @MrAmad3us
    @MrAmad3us 3 หลายเดือนก่อน +8

    Claude premium plan gives less messages / dollar. It’s significantly more consistent in long and complex convos, but you reach the 5h message limit quickly

    • @gideons6126
      @gideons6126 3 หลายเดือนก่อน +1

      Cheaper per token if you use the API which is how I go for it but I agree with you about premium value

    • @fool-on-the-hill
      @fool-on-the-hill 3 หลายเดือนก่อน +3

      Yeah, I ran into this quickly, not knowing there was a limit. The only limit I ran into with ChatGPT is the amount of images I could create, but it took me a while to get there even. That stinks. Oh, and Claude can’t do images…
      Currently, I’m subscribed to both to see if I can tell which ones better, but I’m not sure I can.

    • @barafwal253
      @barafwal253 2 หลายเดือนก่อน

      @@fool-on-the-hill Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files.
      Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too?
      In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?

    • @barafwal253
      @barafwal253 2 หลายเดือนก่อน

      Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files.
      Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too?
      In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?

  • @suleymanbolek7296
    @suleymanbolek7296 3 หลายเดือนก่อน +6

    This was the best comparison video on TH-cam. Great job man, subscribed.

  • @truecuckoo
    @truecuckoo 3 หลายเดือนก่อน +2

    Isn’t the biggest improvement with GPT-4O the audio conversational voice chat skills? Not converting audio to text prompts, but actually understanding the tone of the voice itself etc. It comes across as pretty nuanced in the demos I’ve seen.

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน

      Absolutely! But that isn't available to anyone but insiders yet. But even without the audio, when GPT-4o came out, it was top of the leaderboards for pretty much everything.

  • @vm_jayfus9332
    @vm_jayfus9332 3 หลายเดือนก่อน +3

    Your channel deserves sooooo Much more attention😮

  • @artificiyal
    @artificiyal 2 หลายเดือนก่อน +1

    the only thing seperating them is now the training data

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน +1

      Yeah, you are right. And I think that's clear by Llama 3.1 being about as good as these models at a smaller size. It's all about the data.

  • @augustoliver2779
    @augustoliver2779 2 หลายเดือนก่อน

    They are both not great for reasoning.

  • @djezio258
    @djezio258 2 หลายเดือนก่อน

    Conclusion: GPT-4o is for research and Summarization, while Claude 3.5 Sonnet is great for poetry, conversation, storywriting and dialoguing. I prefer to spend $20/month for GPT-4o

  • @mitakshara158
    @mitakshara158 หลายเดือนก่อน

    Please try gpt 4o for a continued conversation, you ask something and then it answers and then you interrupt and ask a follow up question.. as if its a real person sitting beside you

  • @eastfremantle8989
    @eastfremantle8989 2 หลายเดือนก่อน

    @8:00, both missed the point of the foot on the scale situation by being ironic as Obama, being the President, has literally the ability to “tip the scales” in favor of himself and undermining the other persons perception without being aware of the manipulated outcome. This is a lot more deeper than just some fun and games reported by the AI’s. A deeper and more literal meaning should have been flushed out by the Ai engines and once an Ai notices this I suspect the Turing test would be realized.

    • @PatrickStorm_
      @PatrickStorm_  2 หลายเดือนก่อน

      That is a really good point. I think they both did somewhat poorly on that one. Admittedly, it is a pretty tough concept.

    • @eastfremantle8989
      @eastfremantle8989 2 หลายเดือนก่อน

      Furthermore, I just did the same test with the Obama photo with foot on the scales with chat gpt and asked it to explain the associated irony and again it failed to recognize the specific irony. Chat GPT asked me to state and explain and I refused as I think this test would be good to not teach the LLM for obvious reasons.

  • @MrZooper
    @MrZooper 2 หลายเดือนก่อน

    Claude seems to be better at reasoning, maybe try a reasoning test that doesnt involve images as GPT is ahead in image contexts
    GPT is trained heavily on public data so it would perform better at random facts although its still a bit useless

  • @arthurfernandes6934
    @arthurfernandes6934 2 หลายเดือนก่อน

    GPT 4 is massively better in context, writing and everything that needs interpretation:
    GPT 4 bunny history:
    Under the pulsing neon of 2147, Detective Eli Ramos adjusted his Temporal Badge. The rain-flickered holograms cast shadows as he stepped into the alley where time had fractured.
    The chrono-meter beeped-1932, Chicago. Eli's target was Alvin Holloway, a time fugitive masquerading as a bootlegger. Using a purloined time device, Holloway had been jumping decades, altering events for personal gain.
    Eli slipped through the temporal gate, his trench coat merging seamlessly into the milieu of mobsters and flappers. The smoky speakeasy was loud with jazz, the air thick with conspiracy. He spotted Holloway immediately, charm and danger rolled into one.
    "You can't keep running, Alvin," Eli said, cornering him by the bar.
    "And you can't stop me," Holloway smirked, reaching for his concealed device.
    Quick as a blink, Eli drew his Temporal Disruptor. A beam of blue light, and Holloway froze-the device dropping, useless.
    "You're out of time," Eli quipped, cuffing Holloway.
    Back in 2147, the chief commended him. But Eli knew better. Time was a loop; his next case was just a ripple away. Somewhere, sometime, another fracture awaited. But for now, under the eternal city lights, Eli Ramos took a well-deserved breath.
    GPT 4o:
    In the bustling heart of Neo-Victorian London, Detective Arlo Sinclair adjusted his bowler hat and checked his pocket watch. The gears inside whirred, signaling it was time. With a flick of his wrist, the watch’s face shifted, revealing a hidden dial. He twisted it, and the world around him shimmered.
    Moments later, Arlo stood in the shadowy alley of a different era-New York City, 1942. The scent of fresh rain mingled with smoke from the distant war effort. He scanned the street and spotted his target: a man in a pinstripe suit carrying a briefcase too important to be left in this time.
    Arlo followed, his every step silent as he closed in on the man. When the moment was right, he sprang, snatching the briefcase. The man barely had time to react before Arlo twisted the dial again. The city melted away, replaced by the futuristic skyline of 2147.
    In his hidden office, Arlo placed the briefcase on a sleek table. Opening it, he smiled at the sight of the stolen schematics, safe from those who would misuse them. Time could be twisted, but justice always straightened its course.
    Arlo leaned back, ready for the next case. "Ai que tudooo," he murmured.

  • @SomeOne-p6f
    @SomeOne-p6f 3 หลายเดือนก่อน +1

    What should they have done with the 1 second interval question to keep to 1 second exactly?

    • @PatrickStorm_
      @PatrickStorm_  3 หลายเดือนก่อน +1

      It can go pretty deep, but they should create a variable with the start time, then use that to calculate time until the next tick. This stackoverflow I just found has the problem and solution well laid out: stackoverflow.com/questions/29971898/how-to-create-an-accurate-timer-in-javascript

    • @SomeOne-p6f
      @SomeOne-p6f 3 หลายเดือนก่อน

      @@PatrickStorm_ That's a great link, thanks.

  • @NithinJune
    @NithinJune 3 หลายเดือนก่อน +1

    Claude genuinely seems so exciting

  • @RolandGustafsson
    @RolandGustafsson 3 หลายเดือนก่อน

    ChatGPT is way better at haikus than Claude sonnet. Using Claude for haikus reminded me of the ChatGPT 3.5 days. I've had the opposite experience with the actual stuff I do, 4o is better than Claude for my uses and I've done extensive tests. My prompts tend to be much more intensive than the ones you're asking, more detailed.

  • @Zealotux
    @Zealotux 2 หลายเดือนก่อน

    I've tried both for non-trivial coding and Claude is MUCH better at complex tasks, GPT-4o didn't stand a chance.

  • @hooooman.
    @hooooman. 3 หลายเดือนก่อน

    Its always true that competition between companies is good for us. But in this case, competition between these AI companies is not good for us(as a normal human being, as well as from a developer pov)💀

  • @MentalModels_
    @MentalModels_ หลายเดือนก่อน

    This is a really good video

  • @terryterry1655
    @terryterry1655 2 หลายเดือนก่อน

    link for android apk #sonnet pls

  • @onesimplecuban
    @onesimplecuban 2 หลายเดือนก่อน

    I pay for my Claude. I’ve made so far amazing web applications as well as programming. I love it

  • @_HMCB_
    @_HMCB_ 2 หลายเดือนก่อน

    First time visitor. Awesome stuff. Clearly presented. And your speaking is a good pace. So many TH-camrs need to learn how to slow down and enunciate. Everything feels so hurried. You’ve earned a new sub. Than you.

  • @peanutbutterjellybeans1336
    @peanutbutterjellybeans1336 3 หลายเดือนก่อน

    ChatGPT-4o is my quick search engine. Claude 3.5 Sonnet is my main workhorse. The artifact feature in Claude 3.5 Sonnet is so good.

  • @iftekharhossen7221
    @iftekharhossen7221 3 หลายเดือนก่อน

    First of all GPT-4o is free

  • @nahadmaniyot6790
    @nahadmaniyot6790 3 หลายเดือนก่อน

    Sure, here is a summary of the video in points:
    * This video compares two large language models, Claude 3.5 Sonnet and GPT-40, by giving them a series of head-to-head tests.
    * The winner is determined by the video creator, Patrick Storm, based on his subjective criteria.
    * Claude 3.5 Sonnet wins in creative writing, dialogue generation, sentiment analysis (partially), conversational skills, and summarization (when length is considered).
    * GPT-40 wins in factual question answering and image generation (because Claude 3.5 Sonnet doesn't have an image generation model).
    * They tie in summarizing a research paper and coding (when the prompt is simple).
    * Overall, Claude 3.5 Sonnet performs better based on the video creator's evaluation.

  • @celiafeayrton668
    @celiafeayrton668 19 วันที่ผ่านมา

    Clark Melissa Garcia Melissa Williams Anna

  • @johnreimer4361
    @johnreimer4361 3 หลายเดือนก่อน

    Ask Claude 3.5 Sonnet: "List the first 12 people in order that landed on the moon." The result is correct, and it shows that the 11th person was, in fact, Eugene Cernan. It pays to ask questions that force a step-by-step process of answering.

  • @gvi341984
    @gvi341984 2 หลายเดือนก่อน

    Claude still struggles with math and explaining it. Has a hard time seeing images from PDFs