@@greatwazzoo you'll have to pose as Brad Pitt first 🤣😂 if you don't know what I mean, search for Brad Pitt romance scam made by Nigerians. Yeap, stupid people sometimes deserve it... Just like 50% of Americans that voted for agent orange 🍊
I mean these are examples of where these websites have partnered with open ai, so they are allowing open ai's web crawlers. Idk how it would work with other websites.
@@istvann.huszar420 Yeah probably useless for identifying bots but I'm sure the captchas will still be used because the main purpose of those is tremendous data collection
at this point pls give something to European paying user we can't use sora and operator while we pay the same maybe give us a few more o1 uses or something like that
@@juanperez-lh9mt wrong, def more countries than just the US can have access. I'm in Bosnia right now, not part of EU, and operator would def be allowed. Your brainwash makes you think that Operator can ONLY be in the US right now. OpenAI just wants to limit which pro users get it for now while research mode
It’s the EU governing blocking these. EU is rampant with regulation to the point it’s impossible to innovate anything. Will probably be a 1 year delay on any AI product
Timestamps 00:10 - Introducing Operator, an AI agent enhancing productivity through independent task execution. 01:55 - Operator enhances user experiences on various platforms through intelligent assistance. 05:48 - Operator utilizes AI to streamline online grocery shopping tasks. 07:46 - Kua model uses keyboard and mouse for enhanced digital interaction. 11:31 - Demonstrating operator's functionality with live shopping and ticket purchasing. 13:46 - Utilizing AI for daily tasks like ordering food and finding services. 18:04 - Operator ensures safe transactions with confirmation and monitoring. 19:58 - Operator shows promise but has reliability issues compared to human performance. 23:49 - Encouragement and appreciation for viewer engagement.
One thing I really admire about OpenAI and Sam is that he brings his team and engineers on screen when announcing products, giving everyone credit and honoring their hard work. In contrast, there are many CEOs who go out and talk about the amazing products their company creates but never acknowledge their engineers or give them visibility.
I actually don’t really like being applauded and praised publicly, like in town hall meetings or even in a small team meeting… a quite raise or a bonus is enough from me😅 I would consider leaving the current job if they make me do a live demo😭😭😭 So I think it depends; but they all look happy and up for it, so am sure it’s working for the OpenAI team for good❤
@@bqpdobqpd I don’t think they will force you, but having opportunities to showcase your products to the world and getting visibility is always great, and also open so many doors
The fact that someone probably was watching the livestream, went on OpenTable and took the table they were selecting for 7:45pm -- kinda annoying :) (5:00)
very nice, Thank you for this Operator. 00:10 - Introducing Operator, an AI agent enhancing productivity through independent task execution. 01:55 - Operator enhances user experiences on various platforms through intelligent assistance. 05:48 - Operator utilizes AI to streamline online grocery shopping tasks. 07:46 - Kua model uses keyboard and mouse for enhanced digital interaction. 11:31 - Demonstrating operator's functionality with live shopping and ticket purchasing. 13:46 - Utilizing AI for daily tasks like ordering food and finding services. 18:04 - Operator ensures safe transactions with confirmation and monitoring. 19:58 - Operator shows promise but has reliability issues compared to human performance. 23:49 - Encouragement and appreciation for viewer engagement.
It might have the opposite effect. What's the point of making convoluted websites full of ads if no one sees them. Or maybe they'll have to make them worse to trap human attention when the bot gets stumped.
@dogprez I'm less worried about the websites than about the "people" which will be on them. Bots that seem very human, and perhaps even start life by posting innocuous stuff snd building a human looking history - but ultimately they just exist to try and sell stuff, or steal your identity, or whatever else.
@@dogprez If I had a corporation, I would pay millions to this AI so it recommends my products to everyone. Like, if you ask it "buy me the nicest clothes available on my city" it would directly buy the clothes I sell. That'll be with food, restaurants, etc, do you get my point? A whole new kind of ads is coming.
Could you add a YES and NO button after ChatGPT asked a question? It would make it so much easier... and we don't have to type it out every single time; this is all about productivity improvement, right?🙄
Next step: remove GUI, instead put a server sided AI operator on the business' website which the client AI operator talks to, they negotiate the matter among themselves. When done, result gets communicated to the user by client AI operator. -> webdesign, web programming, websites, web browser, keyboard, mouse, touchscreen: all obsolete.
And if the website is for business that is a retailer or store, the operator could skip them entirely and just talk to the factory direct. If the product the user was trying to buy from the website is something like a chair, desk or laptop for a human to use to perform a task, you could use operator instead of the human, removing the need to purchase the product in the first place. If / when day comes when one of these things is obsolete, many other things will too, either that day or shortly after I think. Which is the scary part
CEO of Microsoft already mentioned it. Current Web Interface will be obsolete once all browsing will be mostly by Agent. Only APi is needed & final results will be in a human readable format Same thing with email exchange
@ you’ll still want it visualised in some sort of ui I think, it’s just that a lot of current ui will go away. It’s much quicker to click on which of four options you want with an image and a title for each than read a description of them and type back the name of the thing you want.
@@taganaafaw3970 Nah, nothing is going to change, at least in a way that affects our current workflows. Humans can still perceive information faster from an UI rather than from a block of text.
There's two camps: Those who went to college and bought homes here when everything cost a fraction of it was today, and those of us just finishing our masters or phds... The first camp it doing great, but are probably stats majors cause data science wasn't a major yet, so... I respect the pivot. The second camp is where I am, where I'm over 200k in debt to get he proper education and experience. Now I get to be broke for another 5 years to work out of this hole, and with housing prices many times higher than inflation stats, you will need to make 500k or more by then to afford to buy a starter home and stay here in the bay area. $145k is now the poverty line in Santa Clara for example... Just because your a ML or AI engineer, doesn't mean your living life. Just some are...
@@heyaisdabomb hopefully you find some way to change your circumstance, it'll be difficult but maybe try to get a job opportunity in a different state with lower cost of living. Cali's COL is insane to begin with
Because users don't have api keys and openai can't buy stuff like: user email --> buy it now. Maybe some open protocol but this will take 5+ years, like with responsive design, SSL etc.
On top of some missing features, a browser offers a lot more options/websites to choose from & isn't restricted to one particular service that has an API.
Yes, some don't have any web site, only mobile app developed in mobile-first era. The demo should have given more time on planning capabilities of the model. This looks like RPA taken to one level higher !!
A bug problem I see with this, which hasn't be discussed, is the AI's 'bias' on which platforms or services to use. Will it use Opentable by default by virtue of its training data, pushing out its competitors? This would lead to a world of 'ai marketing', where it's not humans deciding on the underlying apps they want to use in most cases, but ai agents that have been trained to bias towards one app over another. The same goes for products. If I want 'eggs', will there be a bias if unspecified, towards specific brand of eggs? I can't imagine a statistically significantly number of people will be prompting "please buy local products over those from large chains". This could introduce a very very strong incentive for companies to "buy a particular bias", and change marketing as we know it. The marketing phase is effectively training data + a statistically insignificant number of humans manually prompting AI with great detail.
I am pretty sure they will add memory feature just like chatgpt already has. It will remember your preferences and there will be further customization open.
single source of truth problem. But there will probably a quality score for pages in place soon to "rank" better for KI Operator preferences. Next evolution SEO.
In all seriousness, have any of the openAI developers thought about the point of all this? Ask yourself this question: do you think people hate doing things? Like, you're automating all the daily tasks and you're also automating art, video etc and all the fun stuff. What is even going to be left for humans to do? What's the endgame if all of this actually works as advertised? I worked in tech and know this for a fact: we get so focused on the solution we forget what we are solving for. Are we trying to augment the human experience or homogenize it?
It's called a headless Chromium browser bot in Node.js. Millions of developers have been using it since 2017 for tasks like view generation, video watching, and capturing screenshots. The difference is that developers typically set static tasks, such as logging into a target form, clicking buttons, watching videos, or performing specific actions. However, in this case, the AI dynamically converts user input into Node.js programming for more dynamic tasks. So, I don't think this is something entirely new.
Its not headless you can view it browsing the web. I've built something similar for web-scraping but i have addons to get rid of the adds and crap which makes it nicer to scrap data
I would use it to find the best possible price for a certain product across the entire internet. This will increase competition and overall be great for the market
As cool of a demo as this is, I'd love to see something like this doing far more complicated tasks. Every single thing here is something I could easily do myself. I'd like to see how well this does on a genuinely difficult web browsing task.
Like searching through thousands of PH videos and creating a daily playlist based on your specific kinks? It's such a time waster when only 1 in every 100 or so will do...
Awesome as always! But can we move past the boiler plate "agent making restaurant reservation using opentable" use case please? It is easy to process as a prospective consumer but I feel like it is too cliched by now. Does this really represent a enough of a burning problem?
It's like micromanaging a toddler as it tries to do chores. Promising. I can see in the future this could function as an autonomous office employee that does administrative tasks with minimal hand holding if it's trained correctly to learn a repetitive set of tasks
I think they improvised with the demo, the engineers lacked a bit of preparation although the production looks good. It reminded me of the presentations I used to do at university.
Sam remaining quiet, respectful, and not wanting to be the center of attention whilst the devs demo their incredible work is a great sight to see. It's a small thing, but it sends a big message.
Interesting. Can the operator be used to buy, for example, flight tickets or concert tickets as soon as they become available? It would need additional details like a passport, name, and surname, and it would have to run every 5 or 10 minutes to check availability if they sell out very quickly.
It's like Tesla's FSD; you must put more energy and time into supervising it than operating it yourself. Also, sharing all your credentials with lovely people at OpenAI sounds super sensible!
The browser control is awesome however the use case shown isn't worth the $200/month in my opinion. Still very awesome seeing where this is all heading and I will likely spend the $200/month for the sake of testing it for other things. I wish it learned how I answered emails and did that for me. Keep up the great work and this is really getting exciting.
Hey man I actually build these systems from professionals, we have an automated email responder. If you're interested I'd be happy to do a free setup to show you how it works👍
Agreed. Shocked this isn't for Plus users. UI-TARS is open source, has a desktop app and can control LOTS of software, not just a browser. And you can FOR CERTAIN have it hosted for WAY less than $200/mo. Most of all, since Google needs to maintain Web dominance, fully expect this to simply light a fire under Google to get Project Mariner out the door bundled into Google One for $20/mo.
@@Vastfill Just something else that Apple will screw up then. Siri is still crap even with the latest AI adds: "Performance Issues: Users report that the AI features are slow, buggy, and often underwhelming12. Functionality Concerns: Siri remains largely ineffective, with many features feeling more like gimmicks than substantial improvements2. Delayed Rollout: Apple's cautious, privacy-focused approach has led to a fragmented and frustrating user experience1. Specific Complaints: The new AI image cleanup feature takes 10-15 seconds and doesn't produce impressive results2 Notification summaries have been criticized as inaccurate and unhelpful4 Many features seem to lag behind competitors like OpenAI and Google"
An excellent opportunity for practical applications. As with anything else, the operators can also be used to create chaos, but it's a human tendency to take something good and bend it out of shape to misuse it, as simple as a wooden stick. I would not go so far to put that blame on the tool itself. Of course, real-life problems, mistakes and misuse will give an opportunity to improve the tools. But, purely based on this video, it looks promising.
Love this! Is there another efficient way for the model to record the operator actions or clicks? Maybe a screenshot is expensive on GPU versus an action tag. Thoughts?
As a poor person, I promise you I’ve been waiting for this! I’m always fighting with bill companies and wrong charges. PLEASE could you offer a few to poor people with disabilities? Seriously, we’re overworked and can’t fight the powers (bureaucracy) that be.
and most of us doesn't rely on AI for online shopping .. i doenst shop groceries.. i shop big stuff and i read comments, look for different sites find chepeast buyers, check the stars, sometimes watch reviews on YT etc..etc..
i was thinking the same thing. the presenter saying “feels just like a local browser when you take control” is kinda shady cos there’s no guarantee of safety of the cloud browser. this is so unsafe in case there’s a data breach some day.
1. What if the site has reCaptcha/hCaptcha (whatever) ? 2. How secure is the remote browser since sometimes, the user may need to log in or enter payment details? (Is it a private cloud?) 3. Can the agent use the native user browser within the Desktop/Mobile app?
We have come to this now! This is a 5 day job for a team of 2 engineers to build something like this and offer it for a lot a lot cheaper. For a lot of stuff you can simply use platform's API and for others you can use a browser.
Oh dude, it's not even close either. Yes, Sora is.... cool, I guess. But this can be real world actually useful, soon. I can already see ways it could help me with all the easy stuff that just takes time out of my day.
Am I missing something? hasn't the google assistant been doing this 2019, just w/o the browser? I really want these tools be useful, but it seems like we're just rehashing the same functionality we had pre-covid.
Sitting down on the computer, reading all the text it writes takes longer than just buying the stuff myself. Hope you gonna do a voice only version some day.
I like this! I do hope in the future that the web browser can be a local one instead of a cloud browser, but this is an excellent start to 2025. I'm excited to see what new agents are coming out over the next few months.
In the shopping example, how does it decide which product brand to choose if not specified? Is it just the pick any from first row of results or cost or some bias in your training data which makes the model prefer organic eggs or is there any personalization here like the location in the table booking example?
Notice the bias? There were two reservations that were 45 minutes different from the 7pm preferred time. Operator only offered the later one and called it “the closest” which is not true.
What’s the use case here? It took far too long to make a reservation at the restaurant. We still have to repeatedly instruct the model on what to do. Why can’t we, for example, make a reservation directly through an API? This way, the model wouldn’t need to navigate through the website step by step but could complete the process much faster and more efficiently via the API.
Also, browsing web pages could inspire for new ideas and thus decisions. If your AI knows you very well it might come back to you and tell you there might be better options for what you are looking for while browsing.
In the middle of this video he's like the invisible kid at the party. But I guess a part of that peculiar behaviour somehow got him to where he is now.
“Operator, I’m a Nigerian Prince who needs your help securing money from his bank”.
@@greatwazzoo you'll have to pose as Brad Pitt first 🤣😂 if you don't know what I mean, search for Brad Pitt romance scam made by Nigerians. Yeap, stupid people sometimes deserve it... Just like 50% of Americans that voted for agent orange 🍊
lol that’s very retro .. .
Best comment!!😂
Hey, it's me your brother...
or
Hi it's brad Pitt, we should be a couple, also, I'm in the hospital, please send me 830k to help me pay the bill
😂😂💀
Operator, please fix the audio in this video.
and video.. why they cant record in 4k is beyond me xD its a multi billion dolar company xD
what if the website has an "are you a robot" thing? will it lie?
Good point. Those barriers are for classic programmed crawlers, so I think they've just become obsolete too.
Chatgpt already lied like that, it hired a guy to solve the captcha pretending to have eyesight issues
I mean these are examples of where these websites have partnered with open ai, so they are allowing open ai's web crawlers. Idk how it would work with other websites.
If it cant, it'll just throw it back to the human to solve
@@istvann.huszar420 Yeah probably useless for identifying bots but I'm sure the captchas will still be used because the main purpose of those is tremendous data collection
Do you all have throat gonorrhoea in San Francisco ?
Exactly my thought!
For a moment I thought my speakers are gone
@@anton_roos No, it's not.
🤣😂😂
Right, the vocal fry was pmo
AI, please book every available table at every restaurant in my area thanks
You are the reason the world is on fire.
@@RareBirdGames Let the man have his fun while he still can 😂😂😂
Imagine this on ticket sales and scalping.
this is soo funny man
@ I am merely pointing out the absurdity of this and how it will probably cause more harm than good.
Legend has it, that Sam and whiteshirt are still engaged in an epic battle of vocal fry to the death!
Same thoughts. Is it required to sound robotic when working under OpenAI?
It’s a regional thing
My ears don’t like that 😅
I am so glad I am not the only one being distracted and annoyed by this!
It's common behavior for people to imitate their superiors to seem likable. Very funny to watch
The guy on the right end is the closest vocal competitor to Sam.
Exactly my thought
Can’t listen to it
The guy on the right sounds like a rebellious teen who just discovered rock
lol fr, the amount of vocal fry around that table could make a KFC deep fryer blush
He left Sam behind...
The San Francisco vocal fry situation is insane
Is it lack of confidence, low T, a status thing, a burnout? Let me ask the operator.
A lot of lisps too.
Is there a cure? Or is their voices not deep enough? Why the f**k are people talking like this??
Its not vocal fry. They are all secretly cyborgs obviously.
I read that in Charlie (Moist critikal) voice
at this point pls give something to European paying user we can't use sora and operator while we pay the same maybe give us a few more o1 uses or something like that
Talk to your government
@@juanperez-lh9mt wrong, def more countries than just the US can have access. I'm in Bosnia right now, not part of EU, and operator would def be allowed. Your brainwash makes you think that Operator can ONLY be in the US right now. OpenAI just wants to limit which pro users get it for now while research mode
It’s the EU governing blocking these. EU is rampant with regulation to the point it’s impossible to innovate anything. Will probably be a 1 year delay on any AI product
Just use some serious company like Google. Otherwise just get a VPN.
Use deep seek new o1 level model for free
Operator is Tasker on the web. Being a Tasker app user for almost 10 yrs now. And this video is just showing what I did with Tasker 10 years back.
Timestamps
00:10 - Introducing Operator, an AI agent enhancing productivity through independent task execution.
01:55 - Operator enhances user experiences on various platforms through intelligent assistance.
05:48 - Operator utilizes AI to streamline online grocery shopping tasks.
07:46 - Kua model uses keyboard and mouse for enhanced digital interaction.
11:31 - Demonstrating operator's functionality with live shopping and ticket purchasing.
13:46 - Utilizing AI for daily tasks like ordering food and finding services.
18:04 - Operator ensures safe transactions with confirmation and monitoring.
19:58 - Operator shows promise but has reliability issues compared to human performance.
23:49 - Encouragement and appreciation for viewer engagement.
Thank you bot
Good bot
Bless you
Should've ran a demo to buy the dude on the right some cough drops
😂😂😂😂
Sam Altman also needs some cough drops to help with his vocal fry.
@@anton_roos 🤣🤣🤣
The vocal fry in San Francisco is getting ridiculous!
Sam Altman has the worst vocal fry I've ever heard
@@ReginaCæliLætarei feel like it fits his voice and demeanor
I can understand women do it, but a man with such pronounced vocal fry?
The bigger the announcement, the crispier the Fry.
Cause it’s gotten so soy, for men in San Fran. They’ve forgotten how to say it with their chest.
One thing I really admire about OpenAI and Sam is that he brings his team and engineers on screen when announcing products, giving everyone credit and honoring their hard work. In contrast, there are many CEOs who go out and talk about the amazing products their company creates but never acknowledge their engineers or give them visibility.
LinkedIn chatgpt ass post
Wolf in sheep's clothing😊
Damn that's so true ❤
I actually don’t really like being applauded and praised publicly, like in town hall meetings or even in a small team meeting… a quite raise or a bonus is enough from me😅
I would consider leaving the current job if they make me do a live demo😭😭😭 So I think it depends; but they all look happy and up for it, so am sure it’s working for the OpenAI team for good❤
@@bqpdobqpd I don’t think they will force you, but having opportunities to showcase your products to the world and getting visibility is always great, and also open so many doors
I’m gonna use agents to leave hundreds of “vocal fry” comments
7:16 Whiteshirt won the Frog Voice contest. 🐸🍟🏅
The fact that someone probably was watching the livestream, went on OpenTable and took the table they were selecting for 7:45pm -- kinda annoying :) (5:00)
I didn’t think about that but I think you’re right!
It was not actually "live".
most likely a delayed live stream.
if anything it just shows its capabilities
very nice, Thank you for this Operator.
00:10 - Introducing Operator, an AI agent enhancing productivity through independent task execution.
01:55 - Operator enhances user experiences on various platforms through intelligent assistance.
05:48 - Operator utilizes AI to streamline online grocery shopping tasks.
07:46 - Kua model uses keyboard and mouse for enhanced digital interaction.
11:31 - Demonstrating operator's functionality with live shopping and ticket purchasing.
13:46 - Utilizing AI for daily tasks like ordering food and finding services.
18:04 - Operator ensures safe transactions with confirmation and monitoring.
19:58 - Operator shows promise but has reliability issues compared to human performance.
23:49 - Encouragement and appreciation for viewer engagement.
Open ChatGPT, open an operator tab who opens a ChatGPT operator who …
Lol, if it wasn't for acc login, that might have worked and I love that
@@aryanmn1569 you could give the creds for the login
I heard you like operators
yo, dawg
Endless loop of operators
I don’t think people understand how crazy strong this is, just the beginning
I'm so excited for deepseek to make this completely free ❤ Thanks Closed Ai ❤❤❤
😂😂😂😂😂😂😂😂
Servers cost money
@@PurposeIsEverything 25usd/month makes sense, 200 doesn't.
Best comment :)
@@anarchist6204 ❤
Wow I can't wait for the incredible enshittification of the open internet.
It might have the opposite effect. What's the point of making convoluted websites full of ads if no one sees them. Or maybe they'll have to make them worse to trap human attention when the bot gets stumped.
@dogprez I'm less worried about the websites than about the "people" which will be on them. Bots that seem very human, and perhaps even start life by posting innocuous stuff snd building a human looking history - but ultimately they just exist to try and sell stuff, or steal your identity, or whatever else.
@@dogprez If I had a corporation, I would pay millions to this AI so it recommends my products to everyone. Like, if you ask it "buy me the nicest clothes available on my city" it would directly buy the clothes I sell. That'll be with food, restaurants, etc, do you get my point? A whole new kind of ads is coming.
Exactly. The future is potentially bleak af@@taicunmusic
”Does what a user would do” -> goes to Bing… instant demo effect
Well... I use Bing.
@@chazthecheeseguy You are a bot
Bing because of Microsoft’s relationship with OpenAI
Could you add a YES and NO button after ChatGPT asked a question? It would make it so much easier... and we don't have to type it out every single time; this is all about productivity improvement, right?🙄
Copilot does that.
Or just press y or n
@@-.TS.- since when?👀Does automation as well? I have never seen that...
Yes good idea
I thought the same.
The old "going to the restaurant's website" or calling seemed faster.😂🤷🏼♀️
Next step: remove GUI, instead put a server sided AI operator on the business' website which the client AI operator talks to, they negotiate the matter among themselves. When done, result gets communicated to the user by client AI operator. -> webdesign, web programming, websites, web browser, keyboard, mouse, touchscreen: all obsolete.
And if the website is for business that is a retailer or store, the operator could skip them entirely and just talk to the factory direct.
If the product the user was trying to buy from the website is something like a chair, desk or laptop for a human to use to perform a task, you could use operator instead of the human, removing the need to purchase the product in the first place.
If / when day comes when one of these things is obsolete, many other things will too, either that day or shortly after I think. Which is the scary part
CEO of Microsoft already mentioned it.
Current Web Interface will be obsolete once all browsing will be mostly by Agent. Only APi is needed & final results will be in a human readable format
Same thing with email exchange
@ you’ll still want it visualised in some sort of ui I think, it’s just that a lot of current ui will go away. It’s much quicker to click on which of four options you want with an image and a title for each than read a description of them and type back the name of the thing you want.
@@taganaafaw3970 Nah, nothing is going to change, at least in a way that affects our current workflows. Humans can still perceive information faster from an UI rather than from a block of text.
@@someghostswho’s gonna buy food if no one works anymore ? 🤣 the ultra rich
Hopefully, a $500 billion investment is enough to get them a better microphone
Instacart, Warriors tickets, Thumbtack maid, Doordash... Can you imagine what life must be like for AI programmers in the SF area..?
Yea, it is a all in gay time
There's two camps: Those who went to college and bought homes here when everything cost a fraction of it was today, and those of us just finishing our masters or phds... The first camp it doing great, but are probably stats majors cause data science wasn't a major yet, so... I respect the pivot. The second camp is where I am, where I'm over 200k in debt to get he proper education and experience. Now I get to be broke for another 5 years to work out of this hole, and with housing prices many times higher than inflation stats, you will need to make 500k or more by then to afford to buy a starter home and stay here in the bay area. $145k is now the poverty line in Santa Clara for example... Just because your a ML or AI engineer, doesn't mean your living life. Just some are...
@@heyaisdabomb hopefully you find some way to change your circumstance, it'll be difficult but maybe try to get a job opportunity in a different state with lower cost of living. Cali's COL is insane to begin with
@@heyaisdabomb source of the $145k figure?
I thought about this the whole time!
"user is misaligned" is somehow a particularly chilling way of saying a person with bad intent.
Why use a browser ? Why not use the APIs to do the same conversational flow ?
Never mind, someone answered as APIs may lack some features.
Because users don't have api keys and openai can't buy stuff like: user email --> buy it now. Maybe some open protocol but this will take 5+ years, like with responsive design, SSL etc.
Exactly. And that already exists, it's called Zapier.
On top of some missing features, a browser offers a lot more options/websites to choose from & isn't restricted to one particular service that has an API.
Yes, some don't have any web site, only mobile app developed in mobile-first era. The demo should have given more time on planning capabilities of the model. This looks like RPA taken to one level higher !!
In few years we will look back to this and be amazed as always 🎉
The "vocal fry" was most prominently featured in this presentation
Sam looking at the guy on the right, realising he is mimicking his voice :rofl:
*My only issue is the operator had a chance to click on the Baby Spinach that’s on sale and it choose the more expensive one.*
simple prompt addition to buy cheapest. That isn't the issue, spending $2,388/year to order spinach IS.
@@brianmi40 PLEASE BUY THE ONE ON SALE IF THERE IS hahahaha
@@brianmi40$20 a month isn’t $2388 a year bud. It’s coming to the plus users eventually. Just wait lol.
A bug problem I see with this, which hasn't be discussed, is the AI's 'bias' on which platforms or services to use. Will it use Opentable by default by virtue of its training data, pushing out its competitors? This would lead to a world of 'ai marketing', where it's not humans deciding on the underlying apps they want to use in most cases, but ai agents that have been trained to bias towards one app over another.
The same goes for products. If I want 'eggs', will there be a bias if unspecified, towards specific brand of eggs? I can't imagine a statistically significantly number of people will be prompting "please buy local products over those from large chains". This could introduce a very very strong incentive for companies to "buy a particular bias", and change marketing as we know it. The marketing phase is effectively training data + a statistically insignificant number of humans manually prompting AI with great detail.
I am pretty sure they will add memory feature just like chatgpt already has. It will remember your preferences and there will be further customization open.
haha yea this has been a concern of mine for awhile... "Prioritize your data in our AI model training" or something.
Lol they should have a right to wathever bias they want, its a free country no one is forcing you to buy through this software
I think you're giving this way too much thought as I really don't think their "Operator" product will catch on and take the internet by storm.
single source of truth problem. But there will probably a quality score for pages in place soon to "rank" better for KI Operator preferences. Next evolution SEO.
After looking at Deepseek I am wondering if $200 month should rather be $2 a month.
So congrats! You've just deployed an impressive end-to-end suite of smart, large-scale testing tools. 🎉
That's really smart! QA testing via agent seems like a cool use case!
I will now automate all our test scripts
🤯
Indeed, this is a great use of it
Why they all have robot voice ?
because they are virgin dorks
Audio quality is a key feature of OpenAI streams.
because their audio guy is chatgpt
It's vocal fry
Too much throat stimulus? Apparently it's how you secure a lustrous job in California 😶
imagine how great this gpt operator will develop itself for the next two years... it will change the way we live, im all in
We got autonomous agents before a "select all" function in ipadOS
In all seriousness, have any of the openAI developers thought about the point of all this? Ask yourself this question: do you think people hate doing things? Like, you're automating all the daily tasks and you're also automating art, video etc and all the fun stuff. What is even going to be left for humans to do? What's the endgame if all of this actually works as advertised? I worked in tech and know this for a fact: we get so focused on the solution we forget what we are solving for. Are we trying to augment the human experience or homogenize it?
It's called a headless Chromium browser bot in Node.js. Millions of developers have been using it since 2017 for tasks like view generation, video watching, and capturing screenshots.
The difference is that developers typically set static tasks, such as logging into a target form, clicking buttons, watching videos, or performing specific actions. However, in this case, the AI dynamically converts user input into Node.js programming for more dynamic tasks. So, I don't think this is something entirely new.
Its not headless you can view it browsing the web. I've built something similar for web-scraping but i have addons to get rid of the adds and crap which makes it nicer to scrap data
@definty { headless = false; }
that's set.
@@ChatGPT-5.0 bro what are you on about haha
@@zitronekoma30 As a programmer, I don't think there is anything new, am I not right?
How can the script know the css selectors or xpath of the elements to interact with?
This just opened the doors to scammers acting as “AI agents” controlling your browser remotely to steal all kinds of info/funds
I'm sorry but instead of typing the prompt i can do these things by myself?
For a second, when opening stubhub, the developers panicked 😂. With your boss right beside you. The boss seems nice
I would use it to find the best possible price for a certain product across the entire internet. This will increase competition and overall be great for the market
Old technology, but pleased you are making it available to the masses :) well done
As cool of a demo as this is, I'd love to see something like this doing far more complicated tasks. Every single thing here is something I could easily do myself. I'd like to see how well this does on a genuinely difficult web browsing task.
It's more of an mvp right now, still in its early days, so I don't think it could perform genuinely difficult web browsing tasks yet
Like searching through thousands of PH videos and creating a daily playlist based on your specific kinks? It's such a time waster when only 1 in every 100 or so will do...
It can control v.s code and code everything that coders want in just a second, it's amazing 😮
Awesome as always! But can we move past the boiler plate "agent making restaurant reservation using opentable" use case please? It is easy to process as a prospective consumer but I feel like it is too cliched by now. Does this really represent a enough of a burning problem?
It's like micromanaging a toddler as it tries to do chores. Promising. I can see in the future this could function as an autonomous office employee that does administrative tasks with minimal hand holding if it's trained correctly to learn a repetitive set of tasks
Operator, find the cure for vocal frying
The impact that this could have on people with disabilities if were made accessible to assistive technologies 🙏
1:27 bro was flabbergasted that he said it makes mistakes 1 minute in
I think they improvised with the demo, the engineers lacked a bit of preparation although the production looks good. It reminded me of the presentations I used to do at university.
Sam remaining quiet, respectful, and not wanting to be the center of attention whilst the devs demo their incredible work is a great sight to see. It's a small thing, but it sends a big message.
He sexually abused his sister tho
Interesting. Can the operator be used to buy, for example, flight tickets or concert tickets as soon as they become available? It would need additional details like a passport, name, and surname, and it would have to run every 5 or 10 minutes to check availability if they sell out very quickly.
It's like Tesla's FSD; you must put more energy and time into supervising it than operating it yourself. Also, sharing all your credentials with lovely people at OpenAI sounds super sensible!
To be fair the latest version of Tesla FSD is REALLY good, I recommended checking out AI driver’s video on it
@@sirkiz1181 cant wait for 2014 when the FSD finally will be ready. PRAISE ELON MUSK !!! I LOVE SCIENCE
@@znubionek also a fair point
Unfortunately, it cannot access external links or download PDF files for research, making it useless.
is it a job requirement to do the throat thing to work at OpenAI
I love that they use "please" in their prompts!
Operator= Chain of thoughts + selenium 😂
that means open ai will have our credit card details now ?
The browser control is awesome however the use case shown isn't worth the $200/month in my opinion. Still very awesome seeing where this is all heading and I will likely spend the $200/month for the sake of testing it for other things. I wish it learned how I answered emails and did that for me. Keep up the great work and this is really getting exciting.
Hey man I actually build these systems from professionals, we have an automated email responder. If you're interested I'd be happy to do a free setup to show you how it works👍
Could be free one day when apple finds it smooth enough to add it to siri
Agreed. Shocked this isn't for Plus users. UI-TARS is open source, has a desktop app and can control LOTS of software, not just a browser. And you can FOR CERTAIN have it hosted for WAY less than $200/mo.
Most of all, since Google needs to maintain Web dominance, fully expect this to simply light a fire under Google to get Project Mariner out the door bundled into Google One for $20/mo.
@@Vastfill Just something else that Apple will screw up then. Siri is still crap even with the latest AI adds:
"Performance Issues: Users report that the AI features are slow, buggy, and often underwhelming12.
Functionality Concerns: Siri remains largely ineffective, with many features feeling more like gimmicks than substantial improvements2.
Delayed Rollout: Apple's cautious, privacy-focused approach has led to a fragmented and frustrating user experience1.
Specific Complaints:
The new AI image cleanup feature takes 10-15 seconds and doesn't produce impressive results2
Notification summaries have been criticized as inaccurate and unhelpful4
Many features seem to lag behind competitors like OpenAI and Google"
@@batteryhookup massive waste of money even for testing, what could you possibly get out of that?
Sam’s role in this video is to nod on whatever the other dudes are saying
Can operator be used to apply to jobs.
Why not? It could do any web task in theory
Of course. But when this thing can operate a whole computer, then jobs will be obsolete. (At least the jobs done on a computer)
Incomplete question! Can operator be used to apply AND do your job(s)? All you have to do is to collect your salary.
Operator can literally work a shift for you
@@Truufbtold Well it's an Operator
An excellent opportunity for practical applications. As with anything else, the operators can also be used to create chaos, but it's a human tendency to take something good and bend it out of shape to misuse it, as simple as a wooden stick. I would not go so far to put that blame on the tool itself. Of course, real-life problems, mistakes and misuse will give an opportunity to improve the tools. But, purely based on this video, it looks promising.
Funny the white t-shirt guy sounds more like a AI than the AI from chatGPT lol
And here I thought that Sam's vocal fry was extreme 😂😂
@kigojomo you're right. That guy sounds like C3PO
Love this! Is there another efficient way for the model to record the operator actions or clicks? Maybe a screenshot is expensive on GPU versus an action tag. Thoughts?
This is awesome! We will all have our own virtual assistants and soon we will all have our own robot assistants! Incredible time to be alive! 😎🤖
As a poor person, I promise you I’ve been waiting for this! I’m always fighting with bill companies and wrong charges. PLEASE could you offer a few to poor people with disabilities? Seriously, we’re overworked and can’t fight the powers (bureaucracy) that be.
This is incredible! Operator will change the world for people relying on assistive technologies to browse the web! Can’t wait 👏
the time is required to type promts is same as going to website book a table
It doesn't make sense to enter all the sensetive information in a cloud browser !
It's for hype! They need to cover the HUGE costs. Panic mode installed :))
and most of us doesn't rely on AI for online shopping .. i doenst shop groceries.. i shop big stuff and i read comments, look for different sites find chepeast buyers, check the stars, sometimes watch reviews on YT etc..etc..
i was thinking the same thing. the presenter saying “feels just like a local browser when you take control” is kinda shady cos there’s no guarantee of safety of the cloud browser. this is so unsafe in case there’s a data breach some day.
Can it handle captchas?
seems like a puppeteer wrapper
1. What if the site has reCaptcha/hCaptcha (whatever) ?
2. How secure is the remote browser since sometimes, the user may need to log in or enter payment details? (Is it a private cloud?)
3. Can the agent use the native user browser within the Desktop/Mobile app?
Oooo, operator get me the popcorn, I'm watching an embarrassing live demo of things we've had for two years 😂
from where?
Yeah where?
@@bhekistomakonnencustom gpts
??
6:55 - Sam Altman impression with that voice???
lmao
Do all time travelling cyborgs have the same vocal fry ?
Too many confirmations, 2 are enough. Will it work correctly if sites update their interface?
I am going to use this to watch livestreams and shows and let the agent tell me when an Ads are done so I can get back
We have come to this now! This is a 5 day job for a team of 2 engineers to build something like this and offer it for a lot a lot cheaper. For a lot of stuff you can simply use platform's API and for others you can use a browser.
Solving a problem we did not need. Bubble alert!
This is way more groundbreaking than Sora and yet it dropped so quietly
Oh dude, it's not even close either. Yes, Sora is.... cool, I guess. But this can be real world actually useful, soon. I can already see ways it could help me with all the easy stuff that just takes time out of my day.
Am I missing something? hasn't the google assistant been doing this 2019, just w/o the browser? I really want these tools be useful, but it seems like we're just rehashing the same functionality we had pre-covid.
Remember the guy who scraped JSTOR ?
Never forget
died for it...
Sitting down on the computer, reading all the text it writes takes longer than just buying the stuff myself. Hope you gonna do a voice only version some day.
It's nice to see Altman have a personality once in a while.
"We just released this revolutionary tool! Watch it book me dinner!"
“Operator, look at all my sales dashboards and give me a financial statement”
I like this! I do hope in the future that the web browser can be a local one instead of a cloud browser, but this is an excellent start to 2025. I'm excited to see what new agents are coming out over the next few months.
I’m guessing this AI agent has a side hustle solving CAPTCHAs for fun.
In the shopping example, how does it decide which product brand to choose if not specified? Is it just the pick any from first row of results or cost or some bias in your training data which makes the model prefer organic eggs or is there any personalization here like the location in the table booking example?
Amazing breakthrough guys! This is the going to make humanity alot of productive by automating some of the mundane tasks using AI
Notice the bias? There were two reservations that were 45 minutes different from the 7pm preferred time. Operator only offered the later one and called it “the closest” which is not true.
What’s the use case here? It took far too long to make a reservation at the restaurant. We still have to repeatedly instruct the model on what to do.
Why can’t we, for example, make a reservation directly through an API? This way, the model wouldn’t need to navigate through the website step by step but could complete the process much faster and more efficiently via the API.
I'm guessing it's because then you'd require an API, and be restricted to the API's functionality. Whereas Operator can technically be used anywhere.
Not all websites have an api, and even those with a public api may not expose all the functionality.
Also, browsing web pages could inspire for new ideas and thus decisions. If your AI knows you very well it might come back to you and tell you there might be better options for what you are looking for while browsing.
@@andyzhanmusic Wait.. don't phones can access through an API? 😦😦 A normal user can set it an API key manually in his phone 😥😥
@@wtl912 I noticed this in a video editing program... The developers said me no one can access to a function that can easily get by a single clic 😥😥
In the middle of this video he's like the invisible kid at the party. But I guess a part of that peculiar behaviour somehow got him to where he is now.
fry voice overload
All the use cases are to help you spend money. I like to think that many people might sometimes want to do things other than spend money.
It would had been nice if you also let us Plus users who pay $20 per month use it now. Very much disappointed.
Cry harder
I bet plus users will get it in about 6 months
@@DarkandTwisted cry harder