NEW - Anthropic Updated Claude Models & Computer Use Agents!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ต.ค. 2024

ความคิดเห็น • 37

  • @davidtindell950
    @davidtindell950 39 นาทีที่ผ่านมา +1

    Exciting ! Thank You !!

  • @jeffsteyn7174
    @jeffsteyn7174 4 ชั่วโมงที่ผ่านมา +10

    Computer use is going to be a game changer

    • @samwitteveenai
      @samwitteveenai  4 ชั่วโมงที่ผ่านมา +2

      Yeah this is exactly what I talked about in the Agent-S video yesterday, just didn't expect it to be here so quickly

    • @tappiera
      @tappiera 23 นาทีที่ผ่านมา

      Is this like RPA on steroids?

  • @devanshoo
    @devanshoo 3 ชั่วโมงที่ผ่านมา +4

    computer use has a big big usecase for Software QA specifically. Really excited

    • @justtiredthings
      @justtiredthings ชั่วโมงที่ผ่านมา

      yeah, one of the biggest missing pieces for a mostly autonomous SWE. if we can automatically feed console errors back into the prompt (easy) and have the agent actually test various aspects of the app (hard), then that's really all you need to set a coding agent up with a list of product requirements, leave it alone for a while, and come back the next day to see what it's managed to build in an iterative fashion

  • @TamasDrNagy
    @TamasDrNagy 2 ชั่วโมงที่ผ่านมา +1

    Thanks, very informative

  • @RichardWatson1
    @RichardWatson1 3 ชั่วโมงที่ผ่านมา +1

    Should be some online VM desktop you could use the computer use on. Reduce risks and give more people a way to use it safely.

  • @aliyananwar3727
    @aliyananwar3727 58 นาทีที่ผ่านมา

    computer use is beyond over hyped agents of langchain, we need powerful ocr and and powerful llm for this to replicate

  • @GNARGNARHEAD
    @GNARGNARHEAD 3 ชั่วโมงที่ผ่านมา +2

    *excitement intensifies!*

  • @mybocks3
    @mybocks3 51 นาทีที่ผ่านมา

    LMAO! 😂 Yellowstone is quite beautiful ❤️

  • @denijane89
    @denijane89 2 ชั่วโมงที่ผ่านมา

    Funny, about 4 hours ago, I got one very unfortunate session with Claude in which it basically forgot Latex. I wonder if it has something to do with the update. Because it looked VERY odd. (like writing pi as a symbol and not as \pi etc).

  • @billybofh2363
    @billybofh2363 3 ชั่วโมงที่ผ่านมา

    A very small thing - but one of my 'bots' that was using sonnet 3.5 seems to now be automatically aware of the tool/function-calls it has available. As in, it'll mention them in it's response as 'something you might want to ask me to do'. Not sure if it's just a quirk - but I never had previous models seem user-facing 'aware' of their available tools. It's responses with an eye to a nuanced take on it's system prompt also seems much better. Looking forward to trying Haiku!

  • @r.m8146
    @r.m8146 2 ชั่วโมงที่ผ่านมา

    Amazing.

  • @alchemication
    @alchemication 40 นาทีที่ผ่านมา

    Looking forward to compare gpt4o-mini and the new haiku, as they definitely have their place. And trying the new sonnet asap obviously (assuming price is same..)

  • @ukoni8667
    @ukoni8667 36 นาทีที่ผ่านมา

    Thats why he was saying AGI by 2026..the new era of autonomous machines

  • @marilynlucas5128
    @marilynlucas5128 2 ชั่วโมงที่ผ่านมา +1

    I've been waiting for a model that can use blender efficiently. i describe the scene i want and then it gets to work to build the scene in blender

    • @justtiredthings
      @justtiredthings ชั่วโมงที่ผ่านมา

      It looks like Adobe is working on something like that with project scenic

  • @0011110000111110
    @0011110000111110 ชั่วโมงที่ผ่านมา +1

    Computer use will be great ONCE IT IS RUN LOCALLY. I don't trust cloud machines owned by others to be using my computer, that makes it not my computer anymore and it's a pain making a VM for each time.

  • @richardadonnell
    @richardadonnell 3 ชั่วโมงที่ผ่านมา +1

    🎯 Key points for quick navigation:
    00:00:00 *🚀 Introduction and New Model Overview*
    - Announcement of two new Claude models: 3.5 Sonnet and 3.5 Haiku.
    - Overview of how the new models fit into existing frameworks.
    - Mention of Opus 3.5, which is anticipated but not yet available.
    00:01:00 *📊 Performance and Benchmark Comparisons*
    - 3.5 Sonnet outperforms previous models on most benchmarks.
    - Benchmarked against GPT-4o, Gemini 1.5 Pro, and others.
    - Highlight of SWE Bench score improvement from 33.4% to 49%.
    - Focus on agentic tool use and coding enhancements.
    00:03:27 *⚡ Haiku Model Details and Future Potential*
    - Haiku 3.5 expected to outperform Claude 3 Opus.
    - Limitations: initially released as text-only, with image input support to follow.
    - Potential for fast and affordable performance in many tasks.
    00:04:23 *🖥️ API Development and Computer Interaction*
    - Introduction of an API that enables Claude models to interact directly with computers.
    - Allows searches and task execution through a browser autonomously.
    - Benchmarked on OSWorld; possible risks highlighted.
    00:06:20 *🧪 Demonstrations and Precautions*
    - Demo videos showcase model abilities like filling Google Sheets and performing searches.
    - Identified risks include errors during testing and potential misuse.
    - Suggested using a separate computer for safety when testing the API.
    00:08:25 *📋 Conclusion and Summary*
    - Summary of the benefits of using Sonnet for coding and Haiku for fast tasks.
    - Speculation about the release of Opus 3.5.
    - Invitation for viewer feedback and future exploration of the API usage.
    Made with HARPA AI

  • @GiovanneAfonso
    @GiovanneAfonso 3 ชั่วโมงที่ผ่านมา

    please do more

  • @micbab-vg2mu
    @micbab-vg2mu 3 ชั่วโมงที่ผ่านมา

    an interesting update:)

  • @LeonvanBokhorst
    @LeonvanBokhorst 3 ชั่วโมงที่ผ่านมา

    Bring it on 😁

  • @wendten2
    @wendten2 3 ชั่วโมงที่ผ่านมา +1

    Why did they not change the name to Claude 4 or at the very least 3.6.. Isn't that what those numbers are for?

    • @samwitteveenai
      @samwitteveenai  3 ชั่วโมงที่ผ่านมา +1

      Agree I almost called it 3.6 in the Thumbnail to show it was new

    • @SwapperTheFirst
      @SwapperTheFirst 2 ชั่วโมงที่ผ่านมา

      my assumption is that they're using the same architecture, as in 3.5 v1.

    • @toadlguy
      @toadlguy 2 ชั่วโมงที่ผ่านมา

      I think it isn't the architecture but the foundation model weights (i.e., the weights may change due to fine tuning, quantization, etc., but based on the same training) that are the same. If you mean architecture as in the Model architecture, I agree 😉

    • @wendten2
      @wendten2 2 ชั่วโมงที่ผ่านมา

      ​@@toadlguy In my understanding the first denominator is the architecture, and the decimals the weight tuning.. but that just from pure intuition

    • @justtiredthings
      @justtiredthings ชั่วโมงที่ผ่านมา

      OpenAI does the same annoying thing. Why denominate 15 different versions of GPT4 by date instead of just using the versioning number like a normal person

  • @hqcart1
    @hqcart1 2 ชั่วโมงที่ผ่านมา

    It's playwright framework or similar, then LLM interacts with it, it's not new.

  • @dankprole7884
    @dankprole7884 18 นาทีที่ผ่านมา

    Software services should provide APIs and SDKs. The idea of an agent clicking around a screen like a person is so unbelievably dumb and inefficient.

  • @merlingrim2843
    @merlingrim2843 48 นาทีที่ผ่านมา

    Yeah, computer use will not pass security audits

  • @toadlguy
    @toadlguy 4 ชั่วโมงที่ผ่านมา +3

    On a Mac (or, I suppose, a Linux box) you could sandbox all app interactions under a user with diminished privileges to protect both your machine and your data. It will be interesting to see which model will prevail. Apple's very complete restrictions, Anthropic's (as I suggest) sandboxed restrictions or Google's (and perhaps MS) lack of restrictions.

    • @samwitteveenai
      @samwitteveenai  3 ชั่วโมงที่ผ่านมา

      Really good point