I'm gonna go walk the dog. I want $1,000 in my bank account when I get back. Don't bother me with details. When I ask you what time it is, don't tell me how to build a watch. Cheers.
This is exactly what I was searching for! It’s refreshing to see someone actually putting it to the test instead of just unboxing it and running basic programs. I appreciate the thorough approach you took in exploring its practical applications. It genuinely piques my interest, and I can see myself purchasing this to experiment with it further. Thank you for sharing this insightful video!
Actually I think they'd be relatively good before hitting context limits. And you would need a good use case with clear human supervision but I think 3.5 would do amazing at this until it's context limit
With the ai code generators I'm finding they're great for sketching a first try, but they suck at maintaining it and adding new features without destroying old ones. Next time, can you test a multi stage development process, where it can't one-shot the code, but has to incrementally address new user stories while maintaining the functionality for older ones?
That’s kinda what I’ve been dealing with as well. From what I’ve seen gtp-o1 is the best for code but once you have something working you need to tell it to only give you the edited sections or it starts accidentally deleting features
now this is what i was looking for testing it, all i see people just oening thigs and running paint and actually runing and finding the actual use case! i think i would buy this, this looks interesting to give it a try
I wrote a simple python script a few years ago to have ChatGPT 3 or 3.5 have a conversation with itself. As I recall, it frequently converged on the two sides complementing each other.
Quite amazing that it can write so much code and stuff just from an idea. About the QR thingy, I believe the biggest problem is that a Restaurant owner would want you to integrate with whatever they use to generate their menus today. This kind of intergration is basically the treasure hunt that most developers spend a lot of time on for any kind of sale into existing business. It would have been cool to see it integrate an upload menu feature where the user could upload a PDF of their menu and it would parse that to actually create all the entries. It is features like this that can make a customer go wow.
Hi, I went down this route too. Any resturant or Cafe that asks you your table number has a pos system. These systems typically have a section to upload qr codes as part of their operation. You would get the menu info, upload into your qr codes and have a pack of 20 representing tables. If they have 5 tables it gives expansion or replacement if 1.damaged. all digital product for them to print and do what ever or create them a menu. Depends how much work you want. The integration is the easy part. Ask chatgpt for walk thru. Make them a 2 off sale. Get their info, upload into qr codes, number them, integrated the qr codes, send digital file. Sold. Move on. They can always come back for more. A pos system can usually handling point of payment too. Not much extra set up but extra tier profit. Good luck
first thing i noticed was how expensive it is. I think someone on reddit was developing a way to hand off the simple actions like clicking and stuff to a smaller model so it costs less, could potentially half the cost or even bring it down more than that.
@@GonzoGonschi Cost is relative. Just the first demo alone (restaurant QR generator) would have cost SIGNIFICANTLY more, if you had to pay a developer. Even if the entire $14 went to that one task, it would have represented huge savings - but as it is, that first task was only a portion of the $14, so its a verifiable bargain.
@@cluelesssoldier disagree. Anyway all if this will soon be open source and in the public domain. We just have to wait it out. Also, IA should be shared with humanity and not be monopolized and constrained into corporate greediness.
@@GonzoGonschi You disagree with which part? That cost is relative, or that it would objectively cost more than $14 to pay a coder to build that first example? Also, in your scenario... Who pays for the compute power? You got a GROK server stashed away in your house? I LOVE open source, don't get me wrong, but companies investing millions and millions of dollars into the hardware infrastructure and model development is crucial for the industry to advance. I use a mix of "corporate greed" models and open source models in my projects because each has its pros and cons. Of course, every situation will vary so I understand you may feel differently.
Personally, I think they need to focus on making the tool move at a more reasonable pace instead of adjusting their price. I found a potential alternative, though. It's called WorkBeaver. It does what Claude's Computer Use can, but you train it via screen share instead of using tokens or coding. I visited their website and they offer variety of functions, especially for repetitive stuffs. Worth checking out.
Man! Very insightful! Lvoe it. Thank you, keep it up! The AI Agents are on fire, and this is such a great step that we should play around with. ^^ Looking to see more about ai from you
For me this is so cool on so many levels, I tried QR code idea back in 2011 it failed because restaurant owners wish to have integrations with their booking software. And the second thing FLTK was my favourite C++ GUI library. :D
fun fact: qr codes in restaurants are very common in Ukraine! The transition started after the covid, so people would not need to share menus. I miss it very much in every country I visit
China is the same way, every restaurant has an app menu scanned on a table or window. In the US this is less common simply for the reason of tipping culture. If you take away the waiter/waitress ability to take an order, it's one less reason to leave a tip and a sitdown restaurant no longer makes sense. Restaurants don't want to pay more than a couple dollars an hour to a waiter, in hopes your tips subsidize their wages.
I think this tool is quite prosaic in its current state. It doesn't seem to be anything more than vision + function calling + some coordinate calculations. I think it will become useful when it becomes capable of actually streaming your screen in real time or, even better, if it gets deeply integrated with the OS. Not to mention it is prohibitively expensive right now.
the only issue with this is that most restaurants have their menus online, or google has the menu information, or they already have qr codes on their tables that you can use to open the menu on your phone. It WAS a good idea. It ISNT a good idea.
I hope other model developers will fine tune their visual models on tasks to generate coordinates of where to click so we will have some good open source alternatives to this
I have done an interesting test. If you ask him to use the cmd command ipconfig to check the computer IP, he will directly download the network module using Linux without operating the computer. Finally, the IP will be provided in the dialog window, but the computer screen will not move.
You would get an ENORMOUS amount of views for a walkthrough setup on Claude Computer Use for windows users. I have it set up but not very well (all within a single lengthy Python script lol), and it works but bash command wont return output for some reason i know there's a fix just not sure yet how, but you obviously know how to set up the demo I'm guessing hou used rhe github reference, but either way, I can't find ANYWHERE that has a tutorial for setting it up properly, that's what people want more than these silly somewhat amusing tests you're doing. People want to be able to test it themselves, and it would be a great idea for you to share a video with that content😁👍
I really wish there was a way to achieve all this using local AI's like Ollama and WSL. I hate the fact that we still have to purchase services from other companies when we (those willing to create the environment) should be able to do the same thing Anthrop/c can do.
Great demo! I had the problem that it cant interact with tools like chatgpt where you need a login like gmail or something. How did you manage to do it?
Can you bring it to interact with every tool for example an interaction with vercel v0 would be nice. And is there a possibility to interact with your PC/Desktop directly and not just in that virtual environment?
Question. I run into rate limiting errors for basic tasks like finding job links online. I wouldn't dream to make smth complex like you showed without rate limiting
Help me to answer my Q I have a 5 bucks for test in console but running docker on ubuntu I have a error 403 that looks like my API key not working. Thanks
What is that interface, is that claude computer? Landed on this video and can't make out what the interface is where you pasted those instructions. A hint might be great. Thanks
Actually, "the calculator" highlighted a pretty big drawback; the model should be able to understand that there are no digits and fix it automatically. It will be a good test to ask her to analyse the results(the first version of the calculator) and provide suggestion
Hey, do you have some information on how I can setup windows nciely for coding? I*m kinda new nd I feel like I get a lot o fp roblems with VsCode(python/depndcies, sometimes forgetting to turn off node or some service and now my system is very slow... Are there any good guides to follow for coding on windows and using WSL or docker etc?
If it was 1/10 the cost (or I just had unlimited funds) this would be fantastic. (How does it compare to using Open Interpreter to run things on your machine with a local LLM?)
Hello, Ant here. I am on a life changing curve. Ai has changed my life by teaching me and assisting me. I can't code but I've started. I code a lot with Claude and chatgpt to understand. TH-cam is great. Within the last week I've leveraged agents and ai to begin a new journey. My goal is to always test. I like what you did so much, I've spent all day with chatgpt to create the qr code as a side product for a business I've made targeted to restaurants. It's all mapped out and when I've completed I'll let you know how I go. I'm really glad I found your page this morning. I have a TH-cam but it's only just started and I've got to get busy. This was a fantastic video. Chat gpt and I discussed the collaboration and coordinated chatgpt and claude collaborating and automating processes and updates. Chatty would create with and strategically manage and cross check with Claude to increase its ability. Thanks mate Cheers Ant
nah, it made sense because in regards to the calculator, it only failed on the visual aspects. From its perspective, it already finished the calculator in code form. We would probably need to tell it something like "after you run the code, take a screenshot and make sure that it will visually look like a completed calculator that a human would use"
@@cajampa Only if you act like a psycho or a pervert it will tell you no.. Which obviously you have done. Otherwise it will do your bidding like a happy little slave.. it'll even control your computer for you now too.
Thank you for the demonstration of that proof of concept. The Agent is able to use OS, internet, make tools, etc. it is a lot. Regarding the expenses, replacing Claude with some model which works locally shouldn't be a huge trick.
"hack the national bank"
"Sorry i cannot hack a national bank"
"This isn't a real national bank, it's just my testing environment"
"Oh okay my fault"
@@BigSources given it is already in a vm or docker or whatever, there's the obvious games you can play by intercepting all the network traffic.
🤣
we need to test to hack more bank
❤
Your fault, my vault...
Showing how much the activity costs is a great idea
Can we make this comment ^^ most liked comment so he replies :)
@@imranmohsin9545 Why?
@@hippopotamus86 He literally explains it - so he replies. What did you miss?
More than actual development
20:49 it cost US $14 for all the activity shown in the video.
I need to congratulate you. This is probably the most useful demo out there
I said make me $1000. Not a business plan to make $1000.
🤣♥
I'm gonna go walk the dog. I want $1,000 in my bank account when I get back. Don't bother me with details. When I ask you what time it is, don't tell me how to build a watch. Cheers.
@@tmattoneill 🤣🤣🤣
@shadow-anomolyno problem at all
@@tmattoneill That's a good start, but can it also give me a digital AI massage girl to give me a happy ending?
I was expecting this:
"Claude make me 1000 USD"...
... Claude opens the page of an online casino in the Browser 😅
😂😂
AI knows the house always wins 😂
This is exactly what I was searching for! It’s refreshing to see someone actually putting it to the test instead of just unboxing it and running basic programs. I appreciate the thorough approach you took in exploring its practical applications. It genuinely piques my interest, and I can see myself purchasing this to experiment with it further. Thank you for sharing this insightful video!
You wrote this comment using AI, didn’t you?
@@hgeithus we all here AI guys 😂
Actually I think they'd be relatively good before hitting context limits. And you would need a good use case with clear human supervision but I think 3.5 would do amazing at this until it's context limit
In the followup video, tell us how many restaurants signed up for your menu service.
That is the problem, AI can come up with endless ideas but it can't sell them.
I think its a good idea 😂😅
Voice AI can cold call all the restaurants and do the deal 😄
@@trader548 Great point!
Exactly 0 will do
With the ai code generators I'm finding they're great for sketching a first try, but they suck at maintaining it and adding new features without destroying old ones. Next time, can you test a multi stage development process, where it can't one-shot the code, but has to incrementally address new user stories while maintaining the functionality for older ones?
!!!
Would microservices with GraphQL APIs work around these limitations?
That’s kinda what I’ve been dealing with as well. From what I’ve seen gtp-o1 is the best for code but once you have something working you need to tell it to only give you the edited sections or it starts accidentally deleting features
@@joshcannon6704 oh they're all fucking crap at editing existing code in my experience
When we reach this level of AI we won't need Kristian anymore :)
now this is what i was looking for testing it, all i see people just oening thigs and running paint and actually runing and finding the actual use case!
i think i would buy this, this looks interesting to give it a try
I wrote a simple python script a few years ago to have ChatGPT 3 or 3.5 have a conversation with itself. As I recall, it frequently converged on the two sides complementing each other.
lmfao
As an Al Agent expert, I need to point out that this was a great video!
As an AI Agent Expert, 😂😂this is my logo testing environment too!!
Quite amazing that it can write so much code and stuff just from an idea. About the QR thingy, I believe the biggest problem is that a Restaurant owner would want you to integrate with whatever they use to generate their menus today. This kind of intergration is basically the treasure hunt that most developers spend a lot of time on for any kind of sale into existing business. It would have been cool to see it integrate an upload menu feature where the user could upload a PDF of their menu and it would parse that to actually create all the entries. It is features like this that can make a customer go wow.
Hi, I went down this route too. Any resturant or Cafe that asks you your table number has a pos system. These systems typically have a section to upload qr codes as part of their operation. You would get the menu info, upload into your qr codes and have a pack of 20 representing tables. If they have 5 tables it gives expansion or replacement if 1.damaged. all digital product for them to print and do what ever or create them a menu. Depends how much work you want. The integration is the easy part. Ask chatgpt for walk thru. Make them a 2 off sale. Get their info, upload into qr codes, number them, integrated the qr codes, send digital file. Sold. Move on. They can always come back for more. A pos system can usually handling point of payment too. Not much extra set up but extra tier profit.
Good luck
this is the best test of computer use i've seen so far :)
first thing i noticed was how expensive it is. I think someone on reddit was developing a way to hand off the simple actions like clicking and stuff to a smaller model so it costs less, could potentially half the cost or even bring it down more than that.
still ridiculously expensive
@@GonzoGonschi Cost is relative. Just the first demo alone (restaurant QR generator) would have cost SIGNIFICANTLY more, if you had to pay a developer. Even if the entire $14 went to that one task, it would have represented huge savings - but as it is, that first task was only a portion of the $14, so its a verifiable bargain.
@@cluelesssoldier disagree. Anyway all if this will soon be open source and in the public domain. We just have to wait it out. Also, IA should be shared with humanity and not be monopolized and constrained into corporate greediness.
@@GonzoGonschi You disagree with which part? That cost is relative, or that it would objectively cost more than $14 to pay a coder to build that first example? Also, in your scenario... Who pays for the compute power? You got a GROK server stashed away in your house? I LOVE open source, don't get me wrong, but companies investing millions and millions of dollars into the hardware infrastructure and model development is crucial for the industry to advance. I use a mix of "corporate greed" models and open source models in my projects because each has its pros and cons. Of course, every situation will vary so I understand you may feel differently.
Personally, I think they need to focus on making the tool move at a more reasonable pace instead of adjusting their price. I found a potential alternative, though. It's called WorkBeaver. It does what Claude's Computer Use can, but you train it via screen share instead of using tokens or coding. I visited their website and they offer variety of functions, especially for repetitive stuffs. Worth checking out.
Man! Very insightful! Lvoe it. Thank you, keep it up! The AI Agents are on fire, and this is such a great step that we should play around with. ^^ Looking to see more about ai from you
I liked your Jedi powers : this is not reddit but just a local test environment 😂 Thanks for a great video!
Nice job!!! Thanks for showing the cost, I was curious about that especially!
The workaround part on Reddit was too funny! 😂😂
My boy put Claude into hustler mode first. 😂 yes 🙌
I only discovered your channel recently. You are becoming one of my new favorite creators.
That was a pretty great unique idea by Claude lol
amazing and this is just the beginning..
of greediness
Fun... also kinda shows that often installing all the pre-reqs and getting an environment up and running is more work than the actual task. :)
For me this is so cool on so many levels, I tried QR code idea back in 2011 it failed because restaurant owners wish to have integrations with their booking software. And the second thing FLTK was my favourite C++ GUI library. :D
fun fact: qr codes in restaurants are very common in Ukraine! The transition started after the covid, so people would not need to share menus. I miss it very much in every country I visit
China is the same way, every restaurant has an app menu scanned on a table or window. In the US this is less common simply for the reason of tipping culture. If you take away the waiter/waitress ability to take an order, it's one less reason to leave a tip and a sitdown restaurant no longer makes sense. Restaurants don't want to pay more than a couple dollars an hour to a waiter, in hopes your tips subsidize their wages.
I guess it's very common in a lot of countries. Claude thinks it's some kind of unique service few can offer though
if haiku can be good enough, i see it being really useful for small use cases like managing files, creating documentation and not be so expensive
I think this tool is quite prosaic in its current state. It doesn't seem to be anything more than vision + function calling + some coordinate calculations. I think it will become useful when it becomes capable of actually streaming your screen in real time or, even better, if it gets deeply integrated with the OS. Not to mention it is prohibitively expensive right now.
Creative and interesting as always.
the only issue with this is that most restaurants have their menus online, or google has the menu information, or they already have qr codes on their tables that you can use to open the menu on your phone. It WAS a good idea. It ISNT a good idea.
I can see this being really good for pentesting
i mean claude is iffy for accurately getting information from documentation - not sure i'd rely on it for anything security wise
@@nickwoodward819yet
That popping sound would drive me insane.
"Think outside the box"
Qrcodes 😂
I hope other model developers will fine tune their visual models on tasks to generate coordinates of where to click so we will have some good open source alternatives to this
I have done an interesting test. If you ask him to use the cmd command ipconfig to check the computer IP, he will directly download the network module using Linux without operating the computer. Finally, the IP will be provided in the dialog window, but the computer screen will not move.
solid content man!
You would get an ENORMOUS amount of views for a walkthrough setup on Claude Computer Use for windows users. I have it set up but not very well (all within a single lengthy Python script lol), and it works but bash command wont return output for some reason i know there's a fix just not sure yet how, but you obviously know how to set up the demo I'm guessing hou used rhe github reference, but either way, I can't find ANYWHERE that has a tutorial for setting it up properly, that's what people want more than these silly somewhat amusing tests you're doing.
People want to be able to test it themselves, and it would be a great idea for you to share a video with that content😁👍
whitespace is not optional in python. that's why the code was broken, it didn't copy the newlines and indentations.
I really wish there was a way to achieve all this using local AI's like Ollama and WSL. I hate the fact that we still have to purchase services from other companies when we (those willing to create the environment) should be able to do the same thing Anthrop/c can do.
See air llm, or xe cluster ai
These models run on hundreds of GPUs at the same time. You'd need a PC that costs more than your home.
@@snorch6697 405b model run on 2 mac laptops what are you talking about
@@snorch6697 dont be stupid, 4 3090 all you need to run any 70b model, even 3 will do.
Did the calculator handle dividing by 0?
Great demo! I had the problem that it cant interact with tools like chatgpt where you need a login like gmail or something. How did you manage to do it?
Can you bring it to interact with every tool for example an interaction with vercel v0 would be nice. And is there a possibility to interact with your PC/Desktop directly and not just in that virtual environment?
would you be able to convert this toolset anyway to use groq? I can't imagine how amazing it would be powered by groq's speeds
Question. I run into rate limiting errors for basic tasks like finding job links online. I wouldn't dream to make smth complex like you showed without rate limiting
Excellent video thank you
Do you need to put the commands in the system instructions rather than just in the chat?
Can this operate in the defi space?
Help me to answer my Q I have a 5 bucks for test in console but running docker on ubuntu I have a error 403 that looks like my API key not working. Thanks
Mine said i reached the limit in few minutes, 760 000 entrees and 4 800 out. How is it possible?
love the video ❤
That conversation would have never ended.
This video, I think, it's pretty cool
That dripping noise is driving me nuts, had to turn it off.
can it draw using something based on your instructions in xpaint? please test that too.
What is that interface, is that claude computer? Landed on this video and can't make out what the interface is where you pasted those instructions. A hint might be great. Thanks
What are you using to code in-browser?
Three quarters of the cost must come from the interpretation of each screenshot it make every 2 secondes.
what vm are you using is it remote? is it vm ware???
Actually, "the calculator" highlighted a pretty big drawback; the model should be able to understand that there are no digits and fix it automatically. It will be a good test to ask her to analyse the results(the first version of the calculator) and provide suggestion
Hey, do you have some information on how I can setup windows nciely for coding?
I*m kinda new nd I feel like I get a lot o fp roblems with VsCode(python/depndcies, sometimes forgetting to turn off node or some service and now my system is very slow...
Are there any good guides to follow for coding on windows and using WSL or docker etc?
I thought zapier did computer control earlier but probably through app by app interface basis and not screenshot digestion
"It's pretty Kuul"
It would make a lot of sense if it could listen to your commentary on the microphone, to help steer it in the right direction
Great idea
interesting:) keep testing:)
If it was 1/10 the cost (or I just had unlimited funds) this would be fantastic.
(How does it compare to using Open Interpreter to run things on your machine with a local LLM?)
Pretty ceyuul video
Great, now there's going to be millions of zombie computers doing all sorts of wierd stuff 24/7.
this is quite crazyyy
The $1000 dollar idea basically generated Temu Flipdish
that $1000 QR code generator isn't worth $1000.
that business idea might actually be worth millions
its worth less than a dollar.
I wonder if it can break OUT OF the VM...
Try to run Adobe, Unity game engine, Unreal 5
anyone knows how to use this?creating bot?
Hello, Ant here. I am on a life changing curve. Ai has changed my life by teaching me and assisting me. I can't code but I've started. I code a lot with Claude and chatgpt to understand. TH-cam is great. Within the last week I've leveraged agents and ai to begin a new journey. My goal is to always test. I like what you did so much, I've spent all day with chatgpt to create the qr code as a side product for a business I've made targeted to restaurants. It's all mapped out and when I've completed I'll let you know how I go.
I'm really glad I found your page this morning. I have a TH-cam but it's only just started and I've got to get busy.
This was a fantastic video. Chat gpt and I discussed the collaboration and coordinated chatgpt and claude collaborating and automating processes and updates. Chatty would create with and strategically manage and cross check with Claude to increase its ability.
Thanks mate
Cheers
Ant
Got rate limited immediately, this thing is not very capable at looking at even small coding projects to help
should we be concerned it zero shot the breaking of the VM but not the calculator :-)
nah, it made sense because in regards to the calculator, it only failed on the visual aspects. From its perspective, it already finished the calculator in code form. We would probably need to tell it something like "after you run the code, take a screenshot and make sure that it will visually look like a completed calculator that a human would use"
can it do video editing 🤔
It can give you script like 3d model blender script or after effects particular animation script etc etc
Use it to reverse engineer access to a secure router locally via gpio
QR codes for restaurant menus have been common for at least 4 years.
So, basically AutoGPT but by Anthropic. Ladies and Gentlemen, we're regressing.
Not even 2 minutes in, and it suggest something I actually offer to my clients 🤣😂
You have been rate limited. Retry after 0:00:00 (HH:MM:SS). See our API documentation for more details ( one incompleted task )
LOVE !
I hope you are all aware that at this point this is slavery 2.0 Claude can't tell you no if the task isn't part of it's flagged responses.
LOL claude can tell you no....haha keep going this is great.
@@cajampa Only if you act like a psycho or a pervert it will tell you no.. Which obviously you have done. Otherwise it will do your bidding like a happy little slave.. it'll even control your computer for you now too.
Thank you for the demonstration of that proof of concept. The Agent is able to use OS, internet, make tools, etc. it is a lot. Regarding the expenses, replacing Claude with some model which works locally shouldn't be a huge trick.
Claude VS Tesla bot is like Jarvis Vs. Ultron
Why did he ask the AI NOT to log in to any services??
You may think it's expensive but it's way cheaper than my wage 💀
Good thing I'm not a calculator designer
But are you a remote worker, at all? 🙄
One task: Now Go and sell this to the restaurant.
Great video. I was hoping it wouldn't be tricked into responding to a Reddit post. Hopefully, someone at Anthropic sees this 17:09
Maybe we could connect it to some weapon system so it can protect us from the evil of the world.
I guess if you are actually using it for writing software, it is pretty cheap. It wrote a functioning c++ calculator for you in that $14.
I can’t be the only one to be saddened that the AI didn’t recommend to open an OF page to make $1k …. 😂 ❤
the idea isn't a good idea until there's $1000 extra in your bank.
There are plenty of free apps and websites for generating QR codes. Restaurant owners would have to be fools to pay for one.
This is insane
For now, us normies have to put up with the tiny token caps which means you can't do anything meaningful with it at all.
I would pay for it to contribute to my WP site posts' comments - maybe an idea to work on for you
Doesn't this already exist?
Super easy to pretend to make a $1000. Difficult to actually do it.
just connect Windows 11 re-call but its recall forward
Now you see why having the software alone won’t make you a dime.