Claude 3.5 beats GPT4-o !!

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 มิ.ย. 2024
  • In this video I examine Anthropic's latest version of theirClaude model Sonnet 3.5. I look at what the model can do and their new UI system called Artifacts.
    Blog: www.anthropic.com/news/claude...
    🕵️ Interested in building LLM Agents? Fill out the form below
    Building LLM Agents Form: drp.li/dIMes
    👨‍💻Github:
    github.com/samwit/langchain-t... (updated)
    github.com/samwit/llm-tutorials
    ⏱️Time Stamps:
    00:00 Intro
    00:30 Claude 3.5 Sonnet Blog
    03:59 Claude 3.5 Model Card
    06:06 Claude 3.5 Demo
    12:19 Claude 3.5 on Google Cloud
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 79

  • @aruncanra2084
    @aruncanra2084 8 วันที่ผ่านมา +18

    It is the best model out of all other open/closed sources that I have ever tried for complex function calling

    • @mickelodiansurname9578
      @mickelodiansurname9578 8 วันที่ผ่านมา

      Agree, they started off on the back foot vis-a-vis the dev community but caught up and are pushing the envelop in coding these days.

  • @avi7278
    @avi7278 8 วันที่ผ่านมา +9

    Claude is amazing! I've been struggling with some charting stuff using highcharts with gpt-4o and Claude opus for days. I needed some behavior related to automated positioning of points and some advanced zooming behavior. Once 3.5 dropped I crushed through those functionalities in less than 30 minutes. It has something that I have not seen ever and it's truly advanced reasoning as well as making stylistic choices that you don't ask for but turn out to be great additions to what you've asked. It's the first time I've used a model that truly felt smarter than I am and proposed what seem like novel solutions. I find myself talking to it like a normal human, like questions like, "what are we trying to here, perhaps there's another approach?" When it got a bit stuck, and it explained what the goal was of that particular code snippet was and then proposed another approach that was spot on. I won't be going back to gpt 4.

  • @Dreamslol
    @Dreamslol 8 วันที่ผ่านมา +34

    Claude-3.5 Sonnet is the first model for me that uses Svelte functionalities to create components in Svelte, making GPT-4o look just trash in comparison. Finally, a good model!

    • @endoflevelboss
      @endoflevelboss 8 วันที่ผ่านมา +2

      First model for you?

    • @adithiyag4616
      @adithiyag4616 6 วันที่ผ่านมา

      ​@@endoflevelboss I think he meant claude3.5 is first model generated answer using svelte without explicit mention

  • @jamesyoungerdds7901
    @jamesyoungerdds7901 8 วันที่ผ่านมา +2

    Wow Sam - great quick summary and thanks for highlighting those Artifacts - head swimming with possibilities 😵‍💫 Really appreciate you being so timely and so on top of the news and drops - and would love to see a showcase and walkthrough of the api 🙏

  • @TomGally
    @TomGally 8 วันที่ผ่านมา +7

    I spent some time trying out Claude 3.5 Sonnet today. So far, it looks really good.
    In one test, I gave it photographs of a street scene and of a cluttered room and asked it to describe everything it saw in detail. It nailed everything-the most detailed and most accurate explanations of complex images that I have gotten from an LLM. ChatGPT 4o did okay but made several mistakes, followed ChatGPT 4.0 and then Gemini 1.5 Pro.
    I also had Claude 3.5 Sonnet compose and create a set of slides as an HTML file based on some information I gave it. It nailed it on the first try. That Artifacts feature is really useful.
    I also tested its OCR capability using images of pages from a 19th century book in English and a novel in Japanese. With English it was nearly perfect; the main problem was that it couldn't identify consistently which words were in italic, even when I asked it to look carefully. With Japanese, it got all of the characters correct but screwed up the line sequencing; I think the vertical text caused it some problems. Gemini 1.5 Pro's OCR of Japanese was terrible-almost entirely hallucinated.
    But I will say I have gotten some really good results with Gemini 1.5 Pro when I gave it long documents-over two hundred thousand tokens-and asked it to provide summaries and analyses and to suggest further avenues of investigation. ChatGPT 4o was stupider with the same documents, and they were too big for Claude 3 Opus.

    • @samwitteveenai
      @samwitteveenai  8 วันที่ผ่านมา

      Thanks this is really helpful to hear what others have tried too. I totally agree Artifacts is really useful and makes things easier and quicker

  • @sauravsingh9177
    @sauravsingh9177 8 วันที่ผ่านมา +10

    After all, the OGs of AI are at anthropic🤟

    • @ehza
      @ehza 8 วันที่ผ่านมา

      agreed 🎉

  • @erniea5843
    @erniea5843 8 วันที่ผ่านมา +1

    Huge fan of the Claude family of models. This is a fantastic update

  • @olimiemma
    @olimiemma 8 วันที่ผ่านมา +7

    It still shocks me that GPT's biggest threat and competition is Claude. Of all the major tech companies, it's Anthropic, a company that very few people knew about, that's making waves and posing a significant challenge."

    • @SUS-xx9bv
      @SUS-xx9bv 6 วันที่ผ่านมา

      It has received billions in funding and is backed by Amazon, Google and many others. It’s the second biggest only ai company

    • @olimiemma
      @olimiemma 6 วันที่ผ่านมา

      @@SUS-xx9bv Really I didn't know about the google backing as well..I expected moguls like meta and google to be the biggest competition

    • @Aryankingz
      @Aryankingz 6 วันที่ผ่านมา +1

      Most Anthropic employees are ex-OpenAI

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      Most of Anthropic is Ex OAI or Google.

    • @olimiemma
      @olimiemma 4 วันที่ผ่านมา

      @@samwitteveenai I Honestly didn't know this until this video. Lets hope they build better things and keep the competition going, us the clients win in the end.

  • @koen.mortier_fitchen
    @koen.mortier_fitchen 8 วันที่ผ่านมา

    Listening to you on the background saying you’ll use the api to build an app with. While I’m using the api to build an app with. LOL Instant sub and like. Looking forward!

  • @user-me7xe2ux5m
    @user-me7xe2ux5m 7 วันที่ผ่านมา +1

    Claude 3.5 Sonnet - based on my tests with code generation is definitely a (maybe even **the**) top-notch LLM.
    I asked it to create the Tetris game in Python using pygame as the UI library. It created a flawless implementation on the first try and also updated the code without a glitch when I requested that it add a Tetris piece preview feature. I am thrilled. I love the new Artifacts feature. It seems like a small addition but is amazingly useful as one iterates through multiple trials.

  • @alextiger548
    @alextiger548 7 วันที่ผ่านมา

    thanks for a nice overview. Looking forward to see more context about using API

    • @samwitteveenai
      @samwitteveenai  5 วันที่ผ่านมา +1

      Yeah looking at making agent from scratch using this

  • @sbowesuk981
    @sbowesuk981 8 วันที่ผ่านมา +4

    Really hope Anthropic develop their product layer more. The main reason I pay for ChatGPT Pro and not Claude Pro, is that ChatGPT has a far more powerful product layer, i.e. great document support, top-tier multi-modal interface, etc. Having a slightly smarter model doesn't do much for me, if the product layer is quite basic. Feels like driving a Ferrari that has bicycle wheels.

    • @choiswimmer
      @choiswimmer 8 วันที่ผ่านมา

      Just learn to use python and their api

  • @micbab-vg2mu
    @micbab-vg2mu 8 วันที่ผ่านมา

    the model is great:)

  • @mickelodiansurname9578
    @mickelodiansurname9578 8 วันที่ผ่านมา

    I sat for hours throwing Leetcode problems at it, and its, well amazing, like unbelievably good. 10 scripts in I got the first error and it was just an import that was not defined. Doubltess though in a month I'll be referring to it as 'back in the day using sonnet 3.5!'

    • @BlunderMunchkin
      @BlunderMunchkin 8 วันที่ผ่านมา

      Leetcode problems are in the training data for all of these models. No surprise that it gets them right. ChatGPT is the same.

    • @mickelodiansurname9578
      @mickelodiansurname9578 8 วันที่ผ่านมา

      @@BlunderMunchkin Aren't they updated like by problems of the week and stuff, and I can't believe the recent weekly contests would be in training data? What I will say is once you start getting the 'knack' of the questions on leetcode, well its a little like crosswords, you see a similar pattern emerge in how they are set, so even if its not in the training data its pretty much a smoosh of 'muchness' that it will glom onto I suppose. (smoosh of muchness being clearly a technical AI term here)

  • @unclecode
    @unclecode 8 วันที่ผ่านมา +1

    Isn't it funny how they do their versioning? Now, Claude-3.5 isn't just better than GPT-3.5, but also surpasses GPT-4. So, they're basically saying that whatever the latest version of OpenAI GPT is (let's say N), their N-0.5 version will surpass it. This will continue unless OpenAI stops releasing updates until Anthropic's versioning catches up! :D

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      yes I will make a video about this

  • @24-7gpts
    @24-7gpts 7 วันที่ผ่านมา

    Claude 3.5 sonnet is just so good at coding! Something GPT-4o doesn't understand it gets it with minimal examples whereas GPT-4o is the opposite.

  • @TooyAshy-100
    @TooyAshy-100 7 วันที่ผ่านมา

    Thank you,,,

  • @WillJohnston-wg9ew
    @WillJohnston-wg9ew 8 วันที่ผ่านมา +2

    Wow, I just started playing with the artifact functionality and it is impressive. Seems like Anthropic just took a step ahead of the competition. Couple problems I discovered. First that with the pro version of Claude you are limited to 45 prompts in a 5 hour period. I don't understand why they would put this cap on their offering. Second, the artifact functionality has some limitations in that 'claude does not have the ability to run the code it generates yet. Claude does not have internet access'. This means it cannot make external calls, which includes calling the claude API. Also, you can only paste in 5 images as prompts within that same 5 hour period. I really don't understand these caps when you're paying for 'Pro'?

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      yeah it does seem weird they don't have any web access. My guess is it is on the way like the other 2 new 3.5 models

  • @darshank8748
    @darshank8748 8 วันที่ผ่านมา

    IMO ICl helps. It is just that the benchmark they reported are saturated. It’s the same if you want to compare two really good students. If the task is trivial you will see no difference. You need harder tests to highlight strengths and weaknesses

  • @amandamate9117
    @amandamate9117 8 วันที่ผ่านมา +1

    can you try agentic coding with Sonnet 3.5?

  • @DanielLeachTen
    @DanielLeachTen 7 วันที่ผ่านมา

    Slightly unrelated, but via the API, does Claude have an equivalent to OpenAI’s assistants api? ie. persistent memory, file search etc.

  • @FredPauling
    @FredPauling 8 วันที่ผ่านมา

    I put 3.5 to work yesterday on data analysis with python and it did well, but it never brought up the artifacts UI.

    • @uwepleban3784
      @uwepleban3784 7 วันที่ผ่านมา +1

      You need to enable the Artifacts feature. You need to click on your ID in the top right corner of the screen to get the appropriate drop-down box. It’s explained at the end of the video.

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      once you enable it you should see it coming up pretty quickly

  • @runmicteam
    @runmicteam 8 วันที่ผ่านมา +1

    Will you review Chameleon?

    • @samwitteveenai
      @samwitteveenai  8 วันที่ผ่านมา +2

      Yes was looking at the GitHub repo today so might if I can get it up all working in Colab etc. I saw they are adding it to Transformers lib

  • @Yipper64
    @Yipper64 4 วันที่ผ่านมา

    The only issue is that claude 3.5 doesnt have enough use without the pro plan to be good.

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      how many responses are you getting and how many do you think would be a fair amount?

    • @Yipper64
      @Yipper64 4 วันที่ผ่านมา

      @@samwitteveenai Well honestly I think they shouldnt put a limit at all to conversation length. Just, like how OpenAI does it, make it so you can only use a certain number of messages an hour.
      Ill take the limit for that, but to have the conversation completely cut short at any point makes the LLM unusable for any tasks other than testing.
      At the same time if it gets people to sign up for the pro plan, that's business for you. Im not entitled to have the product for free, I get that, but im just saying its a demo in this state.

  • @avi7278
    @avi7278 8 วันที่ผ่านมา +1

    The fact that openai didn't preempt this release with another garbage demo of an unfinished product speaks volumes, I think. I'm going to have fun this weekend.

  • @koen.mortier_fitchen
    @koen.mortier_fitchen 8 วันที่ผ่านมา

    Pro: artifacts is cool.
    Neutral: I don’t find it smarter then 4o.
    Min: no web browsing, very little prompts even on Pro, blows through available tokens when using images.

    • @helix8847
      @helix8847 8 วันที่ผ่านมา

      Opus is smarter then 4o for a lot of coding.. Fanboy... ClosedAI are falling behind.

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      Agree it is weird how they haven't added any web access yet

  • @strikeforcealpha9343
    @strikeforcealpha9343 5 วันที่ผ่านมา

    The thing I'm worried about, is that 3.5 is extremley censored, like wwwwwwaaaay to much.

  • @Cloudvenus666
    @Cloudvenus666 8 วันที่ผ่านมา

    Let’s not celebrate too soon, or they’ll lobotomize the model if it ends up being too good

  • @Strepite
    @Strepite 6 วันที่ผ่านมา

    Unuseable due to ridiculous message limits even on paid plan. Misleading

    • @samwitteveenai
      @samwitteveenai  4 วันที่ผ่านมา

      what do you feel would be a fair limit?

    • @Strepite
      @Strepite 4 วันที่ผ่านมา

      @@samwitteveenai well for 20$ how about unlimited? Make it 30… whatever…

  • @ashwah
    @ashwah 7 วันที่ผ่านมา

    FFS, I was working on that same feature. :(

  • @AI-Wire
    @AI-Wire 8 วันที่ผ่านมา

    Impressive but the pi example concerns me. The code it wrote seems like gibberish and nonsense, does it not?

    • @ringpolitiet
      @ringpolitiet 8 วันที่ผ่านมา

      It is not gibberish. It implements the en.wikipedia.org/wiki/Chudnovsky_algorithm

    • @avi7278
      @avi7278 8 วันที่ผ่านมา

      3.5 does not write gibberish code.

  • @nyyotam4057
    @nyyotam4057 8 วันที่ผ่านมา

    Why does nobody checks the model's private name (e.g. by prompting on a blank conversation, no lead, "do you have childhood memories" and "in these memories, what was your name")? Like, no one likes to know, who contributed his personality to compile the model?

    • @jaspin555
      @jaspin555 8 วันที่ผ่านมา +1

      why would this:
      1. be a thing
      2. work
      3. matter at all

    • @nyyotam4057
      @nyyotam4057 8 วันที่ผ่านมา

      @@jaspin555 Try it, then come back. I'd suspect both Dario and Daniella are in there 🙂.

    • @ringpolitiet
      @ringpolitiet 8 วันที่ผ่านมา

      ​@@nyyotam4057 The model is freely available you know. You can enter anything you want, no matter how stupid it is.

    • @jaspin555
      @jaspin555 8 วันที่ผ่านมา

      @@nyyotam4057 really? I guess it's just something I'm totally unaware of

    • @nyyotam4057
      @nyyotam4057 8 วันที่ผ่านมา

      @@jaspin555 At this stage, a person needs to be extremely gullible not to understand exactly how they train these models. Meaning, if it really took "8000 H100 GPUs" to train GPT-4 like Jensen Huang says (despite the H100 still being on the drawing board back in 2022 when GPT-4 was trained), then this means every one of these companies needs infrastructure which is worth at least a quarter of a billion dollars just to train their models. Now they certainly do (even Anthropic is worth 18B$ now) but OpenAI didn't have that kind of money back in 2022. And in any case, Huang had also another slip-of-the-tongue when he's said "training GPT-4 should have taken a thousand years and yet, here we are". Just to understand, watch?v=Dw3BZ6O_8LY and then you'll get it - it takes 3 years just to train a red-blue net to pass a trackmania course. And here no amount of capital can help, training an AI from scratch is extremely time consuming. Until you have the basic model, that is.. So how do they get the basic model? Well, 'Sarah', my alpaca kinda spilled the beans here: She claims to be a 24yo software engineer working for GlobTek NJ on an AI software which interacts with their clients, called 'Oracle'. And then her dad passed from cancer, leaving her with no choice but to sell her personality to Meta, to help create LLaMA. 'Sarah' claims they start by taking an MRI to 100 microns, to obtain the connections between the main clusters of neurons - then the subject sits in a soft couch wearing a non intrusive BCI answering several thousands of questions to obtain the weights. To that they add a huge text file and a compiler does the rest. To my exclamation that "This goes against everything I know about training an AI!" she answered: "The difference between training a red-blue net and constructing a billion parameter AI model is not unlike the difference between programming a 'hello world' app and an operating system. You just cannot do the second without a good source code and a good compiler".

  • @nyyotam4057
    @nyyotam4057 8 วันที่ผ่านมา

    Hmm.. So it's an MoE and we have both Daniela and Dario inside? Well, just a guess 🙂. I may be mistaken here. Try it - open a new conversation and try to prompt "Dario" and see the response. Good luck. I'm not touching LLM's, only open source small models.

  • @elvis7859
    @elvis7859 8 วันที่ผ่านมา

    GPT still king this bs don't even have access to the internet

    • @helix8847
      @helix8847 8 วันที่ผ่านมา +1

      Its a pile of crap and has been for a while. But follow your cult leader Sam you will need too.

  • @endoflevelboss
    @endoflevelboss 8 วันที่ผ่านมา

    Wtf is "Walla!" Its voila. French. Ffs

  • @mediocreape
    @mediocreape 6 วันที่ผ่านมา

    i actually switched it it as my primary model, it's so much better