Claude 3.5 beats GPT4-o !!
ฝัง
- เผยแพร่เมื่อ 29 มิ.ย. 2024
- In this video I examine Anthropic's latest version of theirClaude model Sonnet 3.5. I look at what the model can do and their new UI system called Artifacts.
Blog: www.anthropic.com/news/claude...
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨💻Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:30 Claude 3.5 Sonnet Blog
03:59 Claude 3.5 Model Card
06:06 Claude 3.5 Demo
12:19 Claude 3.5 on Google Cloud - วิทยาศาสตร์และเทคโนโลยี
It is the best model out of all other open/closed sources that I have ever tried for complex function calling
Agree, they started off on the back foot vis-a-vis the dev community but caught up and are pushing the envelop in coding these days.
Claude is amazing! I've been struggling with some charting stuff using highcharts with gpt-4o and Claude opus for days. I needed some behavior related to automated positioning of points and some advanced zooming behavior. Once 3.5 dropped I crushed through those functionalities in less than 30 minutes. It has something that I have not seen ever and it's truly advanced reasoning as well as making stylistic choices that you don't ask for but turn out to be great additions to what you've asked. It's the first time I've used a model that truly felt smarter than I am and proposed what seem like novel solutions. I find myself talking to it like a normal human, like questions like, "what are we trying to here, perhaps there's another approach?" When it got a bit stuck, and it explained what the goal was of that particular code snippet was and then proposed another approach that was spot on. I won't be going back to gpt 4.
Claude-3.5 Sonnet is the first model for me that uses Svelte functionalities to create components in Svelte, making GPT-4o look just trash in comparison. Finally, a good model!
First model for you?
@@endoflevelboss I think he meant claude3.5 is first model generated answer using svelte without explicit mention
Wow Sam - great quick summary and thanks for highlighting those Artifacts - head swimming with possibilities 😵💫 Really appreciate you being so timely and so on top of the news and drops - and would love to see a showcase and walkthrough of the api 🙏
I spent some time trying out Claude 3.5 Sonnet today. So far, it looks really good.
In one test, I gave it photographs of a street scene and of a cluttered room and asked it to describe everything it saw in detail. It nailed everything-the most detailed and most accurate explanations of complex images that I have gotten from an LLM. ChatGPT 4o did okay but made several mistakes, followed ChatGPT 4.0 and then Gemini 1.5 Pro.
I also had Claude 3.5 Sonnet compose and create a set of slides as an HTML file based on some information I gave it. It nailed it on the first try. That Artifacts feature is really useful.
I also tested its OCR capability using images of pages from a 19th century book in English and a novel in Japanese. With English it was nearly perfect; the main problem was that it couldn't identify consistently which words were in italic, even when I asked it to look carefully. With Japanese, it got all of the characters correct but screwed up the line sequencing; I think the vertical text caused it some problems. Gemini 1.5 Pro's OCR of Japanese was terrible-almost entirely hallucinated.
But I will say I have gotten some really good results with Gemini 1.5 Pro when I gave it long documents-over two hundred thousand tokens-and asked it to provide summaries and analyses and to suggest further avenues of investigation. ChatGPT 4o was stupider with the same documents, and they were too big for Claude 3 Opus.
Thanks this is really helpful to hear what others have tried too. I totally agree Artifacts is really useful and makes things easier and quicker
After all, the OGs of AI are at anthropic🤟
agreed 🎉
Huge fan of the Claude family of models. This is a fantastic update
It still shocks me that GPT's biggest threat and competition is Claude. Of all the major tech companies, it's Anthropic, a company that very few people knew about, that's making waves and posing a significant challenge."
It has received billions in funding and is backed by Amazon, Google and many others. It’s the second biggest only ai company
@@SUS-xx9bv Really I didn't know about the google backing as well..I expected moguls like meta and google to be the biggest competition
Most Anthropic employees are ex-OpenAI
Most of Anthropic is Ex OAI or Google.
@@samwitteveenai I Honestly didn't know this until this video. Lets hope they build better things and keep the competition going, us the clients win in the end.
Listening to you on the background saying you’ll use the api to build an app with. While I’m using the api to build an app with. LOL Instant sub and like. Looking forward!
Claude 3.5 Sonnet - based on my tests with code generation is definitely a (maybe even **the**) top-notch LLM.
I asked it to create the Tetris game in Python using pygame as the UI library. It created a flawless implementation on the first try and also updated the code without a glitch when I requested that it add a Tetris piece preview feature. I am thrilled. I love the new Artifacts feature. It seems like a small addition but is amazingly useful as one iterates through multiple trials.
thanks for a nice overview. Looking forward to see more context about using API
Yeah looking at making agent from scratch using this
Really hope Anthropic develop their product layer more. The main reason I pay for ChatGPT Pro and not Claude Pro, is that ChatGPT has a far more powerful product layer, i.e. great document support, top-tier multi-modal interface, etc. Having a slightly smarter model doesn't do much for me, if the product layer is quite basic. Feels like driving a Ferrari that has bicycle wheels.
Just learn to use python and their api
the model is great:)
I sat for hours throwing Leetcode problems at it, and its, well amazing, like unbelievably good. 10 scripts in I got the first error and it was just an import that was not defined. Doubltess though in a month I'll be referring to it as 'back in the day using sonnet 3.5!'
Leetcode problems are in the training data for all of these models. No surprise that it gets them right. ChatGPT is the same.
@@BlunderMunchkin Aren't they updated like by problems of the week and stuff, and I can't believe the recent weekly contests would be in training data? What I will say is once you start getting the 'knack' of the questions on leetcode, well its a little like crosswords, you see a similar pattern emerge in how they are set, so even if its not in the training data its pretty much a smoosh of 'muchness' that it will glom onto I suppose. (smoosh of muchness being clearly a technical AI term here)
Isn't it funny how they do their versioning? Now, Claude-3.5 isn't just better than GPT-3.5, but also surpasses GPT-4. So, they're basically saying that whatever the latest version of OpenAI GPT is (let's say N), their N-0.5 version will surpass it. This will continue unless OpenAI stops releasing updates until Anthropic's versioning catches up! :D
yes I will make a video about this
Claude 3.5 sonnet is just so good at coding! Something GPT-4o doesn't understand it gets it with minimal examples whereas GPT-4o is the opposite.
Thank you,,,
Wow, I just started playing with the artifact functionality and it is impressive. Seems like Anthropic just took a step ahead of the competition. Couple problems I discovered. First that with the pro version of Claude you are limited to 45 prompts in a 5 hour period. I don't understand why they would put this cap on their offering. Second, the artifact functionality has some limitations in that 'claude does not have the ability to run the code it generates yet. Claude does not have internet access'. This means it cannot make external calls, which includes calling the claude API. Also, you can only paste in 5 images as prompts within that same 5 hour period. I really don't understand these caps when you're paying for 'Pro'?
yeah it does seem weird they don't have any web access. My guess is it is on the way like the other 2 new 3.5 models
IMO ICl helps. It is just that the benchmark they reported are saturated. It’s the same if you want to compare two really good students. If the task is trivial you will see no difference. You need harder tests to highlight strengths and weaknesses
can you try agentic coding with Sonnet 3.5?
Slightly unrelated, but via the API, does Claude have an equivalent to OpenAI’s assistants api? ie. persistent memory, file search etc.
not yet.
I put 3.5 to work yesterday on data analysis with python and it did well, but it never brought up the artifacts UI.
You need to enable the Artifacts feature. You need to click on your ID in the top right corner of the screen to get the appropriate drop-down box. It’s explained at the end of the video.
once you enable it you should see it coming up pretty quickly
Will you review Chameleon?
Yes was looking at the GitHub repo today so might if I can get it up all working in Colab etc. I saw they are adding it to Transformers lib
The only issue is that claude 3.5 doesnt have enough use without the pro plan to be good.
how many responses are you getting and how many do you think would be a fair amount?
@@samwitteveenai Well honestly I think they shouldnt put a limit at all to conversation length. Just, like how OpenAI does it, make it so you can only use a certain number of messages an hour.
Ill take the limit for that, but to have the conversation completely cut short at any point makes the LLM unusable for any tasks other than testing.
At the same time if it gets people to sign up for the pro plan, that's business for you. Im not entitled to have the product for free, I get that, but im just saying its a demo in this state.
The fact that openai didn't preempt this release with another garbage demo of an unfinished product speaks volumes, I think. I'm going to have fun this weekend.
Pro: artifacts is cool.
Neutral: I don’t find it smarter then 4o.
Min: no web browsing, very little prompts even on Pro, blows through available tokens when using images.
Opus is smarter then 4o for a lot of coding.. Fanboy... ClosedAI are falling behind.
Agree it is weird how they haven't added any web access yet
The thing I'm worried about, is that 3.5 is extremley censored, like wwwwwwaaaay to much.
Let’s not celebrate too soon, or they’ll lobotomize the model if it ends up being too good
Unuseable due to ridiculous message limits even on paid plan. Misleading
what do you feel would be a fair limit?
@@samwitteveenai well for 20$ how about unlimited? Make it 30… whatever…
FFS, I was working on that same feature. :(
Impressive but the pi example concerns me. The code it wrote seems like gibberish and nonsense, does it not?
It is not gibberish. It implements the en.wikipedia.org/wiki/Chudnovsky_algorithm
3.5 does not write gibberish code.
Why does nobody checks the model's private name (e.g. by prompting on a blank conversation, no lead, "do you have childhood memories" and "in these memories, what was your name")? Like, no one likes to know, who contributed his personality to compile the model?
why would this:
1. be a thing
2. work
3. matter at all
@@jaspin555 Try it, then come back. I'd suspect both Dario and Daniella are in there 🙂.
@@nyyotam4057 The model is freely available you know. You can enter anything you want, no matter how stupid it is.
@@nyyotam4057 really? I guess it's just something I'm totally unaware of
@@jaspin555 At this stage, a person needs to be extremely gullible not to understand exactly how they train these models. Meaning, if it really took "8000 H100 GPUs" to train GPT-4 like Jensen Huang says (despite the H100 still being on the drawing board back in 2022 when GPT-4 was trained), then this means every one of these companies needs infrastructure which is worth at least a quarter of a billion dollars just to train their models. Now they certainly do (even Anthropic is worth 18B$ now) but OpenAI didn't have that kind of money back in 2022. And in any case, Huang had also another slip-of-the-tongue when he's said "training GPT-4 should have taken a thousand years and yet, here we are". Just to understand, watch?v=Dw3BZ6O_8LY and then you'll get it - it takes 3 years just to train a red-blue net to pass a trackmania course. And here no amount of capital can help, training an AI from scratch is extremely time consuming. Until you have the basic model, that is.. So how do they get the basic model? Well, 'Sarah', my alpaca kinda spilled the beans here: She claims to be a 24yo software engineer working for GlobTek NJ on an AI software which interacts with their clients, called 'Oracle'. And then her dad passed from cancer, leaving her with no choice but to sell her personality to Meta, to help create LLaMA. 'Sarah' claims they start by taking an MRI to 100 microns, to obtain the connections between the main clusters of neurons - then the subject sits in a soft couch wearing a non intrusive BCI answering several thousands of questions to obtain the weights. To that they add a huge text file and a compiler does the rest. To my exclamation that "This goes against everything I know about training an AI!" she answered: "The difference between training a red-blue net and constructing a billion parameter AI model is not unlike the difference between programming a 'hello world' app and an operating system. You just cannot do the second without a good source code and a good compiler".
Hmm.. So it's an MoE and we have both Daniela and Dario inside? Well, just a guess 🙂. I may be mistaken here. Try it - open a new conversation and try to prompt "Dario" and see the response. Good luck. I'm not touching LLM's, only open source small models.
GPT still king this bs don't even have access to the internet
Its a pile of crap and has been for a while. But follow your cult leader Sam you will need too.
Wtf is "Walla!" Its voila. French. Ffs
i actually switched it it as my primary model, it's so much better