OpenAI's Agent 2.0: Excited or Scared?

AI Jason

มุมมอง 64 060

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 22 ต.ค. 2024

ความคิดเห็น • 143

@quickcinemarecap 8 หลายเดือนก่อน ⁺⁹
00:05 OpenAI is developing a new type of agent to automate tasks.
02:02 Simulating real human interaction on computer devices unlocks everyday personal assistant use cases.
05:59 Training Agent 2.0 requires more steps and reasoning ability, but the potential market opportunities are exciting.
07:59 Using HTML or XML based approach to provide context for web agents.
12:03 Improving accuracy of self-operating computer vision tasks
14:03 Using multiple models together to interact with GUI screenshots
17:47 WebQ basic allows easy web automation
19:27 Setting up web ql and play right for web browser interaction
22:40 Developed a script for universal e-commerce product information scraping.
24:32 WebQ allows you to build powerful web agents for various tasks.
@thesvenni 8 หลายเดือนก่อน ⁺³¹
Thank god, I thought you were gone. Been waiting for another video. You are going places Jason, incredibly talented at teaching complex subjects in easy to understand digestible videos.
Thank you!
@canadianblackops2412 8 หลายเดือนก่อน ⁺¹
this is such a sweet comment :-)
@ryzikx 8 หลายเดือนก่อน ⁺¹
he doesnt upload often anyway, he's never truly gone. the quality is among the best though
@gokulakrishnanr8414 8 หลายเดือนก่อน
Thanks for the kind words! Glad you're enjoying the content!
@ScottBrooks415 8 หลายเดือนก่อน ⁺⁶
Hi Jason, I subscribed a few weeks ago and have been really appreciating your videos! As others have said, your teaching style is very thorough and contains enough depth without being overwhelming. Thanks for keeping us all up to date on the latest AI tech and for taking the time to break it down for easy understanding. Wondering if you have a community?
@Jim-ey3ry 8 หลายเดือนก่อน ⁺¹⁰
The breakdown of different methods to get vision model identify UI elements to interact with is very useful;
I just start imaging - if we have super powerful web agents & spin up 1000+ virtual machines and letting them completing web tasks simutanously - it's gonna be so powerful
@FintechFatherAI 8 หลายเดือนก่อน
True. The limit will be: A lotta people use windows, but windows is extremely resource heavy, even for VMs. *I guess people could try to use Linux and "Wine", but I'm not sure how good that will be, and won't be pure AI. You can spin up say 100 VMs of Linux, and because Windows is so much more heavy and requires licenses and stuff, thats like probably 10 windows machines only (just a off the cuff, but it is probably something like that)
Anyway TL/DR: hopefully windows OS can help deal with their limitations so we all can do/have this be effective at scale
@FintechFatherAI 8 หลายเดือนก่อน
Oh, you said WEB tasks. Yeah pure web is different, since you could use any OS for that probably and use a "headless browser". Great point.
@silent.-killer 8 หลายเดือนก่อน
What kinds of tasks do you envisage?
@ayushmansingh1470 7 หลายเดือนก่อน
This is my favourite AI channel, perfect mix of theory and practical application. Would love to see an indepth video on this. Please keep it up, these videoes help a lot!
@EachDayForever 8 หลายเดือนก่อน ⁺²
I’m only a minute into this video, but I just want to say- what amazing visuals (like the diagrams)!
Also, this is such a great premise for a video, and you are great at explaining complex things in a way such that people of varying levels of technical knowledge can understand them.
@AIJasonZ 8 หลายเดือนก่อน
Thanks for the kind words, I will keep it up!
@starmap 8 หลายเดือนก่อน ⁺¹
Great video, well researched. This is what I expect from a quality channel. You got yourself a subscriber!
@PseudoProphet 5 หลายเดือนก่อน ⁺¹
I also worked on a similar project right after they announced rabbit r1, it was using ocr+yolo+ llm to control the computer.
I was able to get it to click wherever I wanted, but failed to build the backend for the llm to orchestrate the high level tasks.
It was simply too much work for me. 😅😅
@AIGooroo 8 หลายเดือนก่อน ⁺¹
Jason, you are a hero! this is great. Please a video for an agent who can browse the web discover websites that can be useful and organize the URLs in a spreadsheet. Thank you!!!
@gabrieleguo 8 หลายเดือนก่อน
DAYUMMM man you are always dropping fire contents. I'm always curious on what you can bring to the table for each upload and they always amaze me the quality you can bring. Thanks a lot, much appreciated and keep it up Jason! Big fan right here :)
@JoaquinTorroba 8 หลายเดือนก่อน
When I watch your videos, I feel I'm watching the future!
Thanks Jason for your content 👏🏼
@free_thinker4958 8 หลายเดือนก่อน ⁺⁶
High quality video from Jason as always ❤
@TheSacredGrove 8 หลายเดือนก่อน ⁺¹⁰
yes sir i would love a github link to check out the code of your scraper please.
@matten_zero 8 หลายเดือนก่อน
Same here
@thesvenni 8 หลายเดือนก่อน
Very interested in the "in-depth" video you mentioned at the end, looking forward to seeing you add GPT Vision with WebQL
@thesilentcitadel 8 หลายเดือนก่อน ⁺²
Looking forward to the in depth video. 👍🏻
@iakov_volf 8 หลายเดือนก่อน ⁺¹
Thanks alot for this video!
I am also actively exploring this tools - so far I used the approach of Self Operating Computer with modifications - instead of using Selenuim or playwrite and struggle with web elements locators, I just define coordinates of those elements and interact with them. Whebql looks really interesting and I definitely will try it. I think that it's real potential can be used in a multi agents team, like Autogen or CrewAI. Thanks again!
@theBLAMfam 8 หลายเดือนก่อน
Thanks Jason. I’ve only just found your channel but I’m glad I did. This is great content!
@gofastandfar 7 หลายเดือนก่อน
Great video Jason, really informative, thank you
@kayodeejisun2211 8 หลายเดือนก่อน ⁺⁴³
I'm convinced open AI has an alien locked deep underground spilling all this black magic technology.
@sedat4151 8 หลายเดือนก่อน
Demonolotry practitioners and others are known to channel demons (defined in many more ways than most realize) to write books through them and develop technological/scientific advancements.
@sedat4151 8 หลายเดือนก่อน
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
@sedat4151 8 หลายเดือนก่อน
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
@sedat4151 8 หลายเดือนก่อน
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
@sedat4151 8 หลายเดือนก่อน
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
@MrAndyisReal 8 หลายเดือนก่อน ⁺¹³
babe wake tf up ai jason posted
@MartinPleasant-ty1rw 8 หลายเดือนก่อน
This is incredible, this will definitely change the way people interact with apps and the internet
@devfromthefuture506 8 หลายเดือนก่อน ⁺²
I believe that a viable solution would be to create a new city designed specifically for autonomous cars to flow smoothly, rather than trying to adapt autonomous cars in cities designed for human drivers. Similarly, I propose creating a dedicated part of the internet for interactions between artificial intelligences, where everything could be automated or accessed by voice commands. This would even include developing an operating system that accepts these types of interactions. For example, apps like Uber could be entirely accessible by artificial intelligences, allowing users to request services through voice commands. This could simplify tasks such as requesting a ride to downtown New York, where artificial intelligence could perform various actions within the app to meet the user's needs. A webpages designer that way would be the solution to agents trying to do many tasks on an website
@jbo8540 8 หลายเดือนก่อน ⁺¹
If we start building whole cities to accommodate ai, then we are working for ai, ai is not working for us
@eightysevenmoore 8 หลายเดือนก่อน
Mmmmm… I like the approach… like a hybrid environment. Not necessarily force AI into ours or us into AI… but two distinct environments designed for human in one environment and AI in another to optimize performance in either environment with freedom to access the other environment knowing well that performance will be limited when not in the “native environment”.
@chrisoffersen 8 หลายเดือนก่อน
I think at this point, for practically speaking all of us, _how_ it works no longer matters. It’s now much more pressing to know what to do with it, as well as with ourselves.
@jasonfinance 8 หลายเดือนก่อน ⁺²
This is awesome, a few month ago web agent still feel like unusable, didn't know it has come along so far;
The demo in the end is super pratical, an universal web scraper itself will unlock lots of use case, trying it out tonight!!
@lawalexlaw 8 หลายเดือนก่อน
3 minutes in, I already knew how this will help my actual work
@oooooooo347 8 หลายเดือนก่อน ⁺¹
Great video thanks Jason
@joeternasky 8 หลายเดือนก่อน
Fantastic introduction to WebQL and what it can do. Well done!
I'm curious about the name change to "AgentQL" and what it portends. Lots of agent frameworks use image snapshots, possibly with overlaid annotations, then drive the desktop or browser with basic actions (click, text entry, etc). Maybe generating WebQL is a better approach to driving the browser?
@carstenli 8 หลายเดือนก่อน
Super interesting topic. I'd really like to see GPT-4V working with WebQL/AgentQL. 🎉
@twokayoh9347 8 หลายเดือนก่อน
Wow this information is so well done. Thank you
@sitedev 8 หลายเดือนก่อน
Brilliant video - thanks for this. It also makes me concerned for the future of web security - we're almost at the point where an AI tool can be built that will call your mobile, speak to you in a real voice and extract enough information from you to enable it to then browse the web, log into your bank account and ... well, you get the rest.
@RoadTo19 8 หลายเดือนก่อน
🏆 Well done! Happy to see your subscriber count is growing as your videos are quite valuable!
I look forward to an AI agent where a topic/task can be given with little to no guidance. It uses a swarm to find relevant websites, grabs relevant info and creates a spreadsheet to compared the websites found. Use case: Best price for an item you want to purchase. Best features on offer for online tool/service to meet one's needed. Performing competitor analysis as part of business idea validation process. Many others I'm sure you can think of, too.
@MagagnaJayzxui 8 หลายเดือนก่อน
Love your videos Jason. You are one of the few guys who make good content for someone who is not entirely new in the space. Greatly appreciated! You are a GOAT in my opinion 🙌🐐🤩
@siper1686 8 หลายเดือนก่อน
You are the only real pioneer of AI education that is readily avaliable to people that are not directly involved in this new developing area of CS.
All others are just baiting for views and scamming saying "make xyz $$$ with my shitty code that I just copied from somewhere else" lol
@righttiming 8 หลายเดือนก่อน
incredible video jason. When do you think they'll release it approximately? Does this kill all agent startups?
@JoshuaGottlieb-oz4er 8 หลายเดือนก่อน
Great analysis, thanks for sharing
@DannyGerst 8 หลายเดือนก่อน
Great video!! Thx for your regular insights the perfect balance between new tech and tutorial. In your video you said that webql is open source. Where do I find these sources.
@aerocodes 8 หลายเดือนก่อน ⁺²
WebGl seems pretty nice, will give it a shot, is there a javascript version of it?
@PavelDudka 8 หลายเดือนก่อน
Not yet, but its in a roadmap
@wxcarbon หลายเดือนก่อน
Nice video, BTW do you have the new link for webQL ? Does not work
@saivamsi441 8 หลายเดือนก่อน
It was great explanation...
I have been using Lang chain shell tool to perform various actions on my desktop that can be done by cmd. I believe a mix of power shell, HTML parsing and a ocr can act a good model to be in production.
So based on the given prompt, a lllm master agent can take a decision to use any of the path. And after 3-4 tries if It fails to do it. Then it can come back and get redirected to another path to fulfil the task.
@xonack 8 หลายเดือนก่อน
these are the beast AI Agent tutorials
@anubisai 8 หลายเดือนก่อน
Simple solution going forwards would be to add comments in all UI elements when designing a website. Describe exactly what they do: ie. //This is the element to submit the login form. Etc. Would take a while to catch on in the web dev community. I for one will be doing this on any of my sites going forward. Accessibility for AI is a genuine concern at this point. 🎉
@AIJasonZ 8 หลายเดือนก่อน
Oh that’s a great point! Yea some portal around it will be great
@VaibhavShewale 8 หลายเดือนก่อน ⁺²
i wanna know how ya brainstorm the thumbnail idea?
@anfiiaidev 8 หลายเดือนก่อน
Dude its amazing love it. Keep it up
@j_s_h9 8 หลายเดือนก่อน
This is gold. Thanks!
@DeepfriedBaby 8 หลายเดือนก่อน
How does it deal with hint modals that pop up, like in 23:56? If its a screenshot, it'll obscure things.
@tsenri2743 8 หลายเดือนก่อน ⁺¹
ありがとうございます！
@AIJasonZ 8 หลายเดือนก่อน
Thank you 🙏
@nathank5140 8 หลายเดือนก่อน
If sora has a universal model of real world physics, it sure could have a model of the universe of web browsers interfaces. That would make all these hack work arounds redundant. Open ai could have been using gpt-4 to build this training data for ages and be streaks ahead. If agents can learn to play video games, sora has a multi-world model of physics, and gpt-4 can reason better than most humans…. Wow
@mikew2883 8 หลายเดือนก่อน
Great breakdown! 👍
@thesilentcitadel 8 หลายเดือนก่อน ⁺²
Thanks
@AIJasonZ 8 หลายเดือนก่อน
Thank you 🙏
@RichardGetzPhotography 8 หลายเดือนก่อน ⁺¹
Thanks Jason!!
What is the cost of AgentQL
@AIJasonZ 8 หลายเดือนก่อน
They are beta testing now so don’t think they finalised pricing yet!
@realCleanK 8 หลายเดือนก่อน
Thank you!
@venugopalt6861 8 หลายเดือนก่อน
hey json this was amazing , can you please prepare a hands on tutorial on cogAgent
@fulowa 8 หลายเดือนก่อน
great content sir
@webdancer 8 หลายเดือนก่อน
This is good Stuff, thanks.
@dawid_dahl 8 หลายเดือนก่อน
That WebQL sounds crazy. I searched for it but couldn’t find their site, just some placeholder. You know what happened?
@AIJasonZ 8 หลายเดือนก่อน
I’ve added link in description!
@far.k.3112 7 หลายเดือนก่อน
thank you
@stanleylu3625 7 หลายเดือนก่อน
Can you use an LLM to talk to the editor instead of coding?
@drustan6890 7 หลายเดือนก่อน
thoughts on $OLAS? its a framework to develop AI agents
@skateking8 8 หลายเดือนก่อน
Before agents were a thing I was using another gpt to determine which tools that were needed and then would format the request for me.
@georgestander2682 8 หลายเดือนก่อน
amazing content!
@bertstevens245 8 หลายเดือนก่อน
I tried tools like MultiOn but no no-code tool seems to work well yet. Open to suggestions.
@whatyoumissed9994 8 หลายเดือนก่อน
we want another autonomous agent part 4 with langgraph
@oleksandrsova4803 8 หลายเดือนก่อน
Isn't it just a fancy Selenium Webdriver?
@mazkaibil9108 8 หลายเดือนก่อน
Chrome extension - You need to request permission to download the extension. Not sure how long will it take to get the access.
@Livanback 8 หลายเดือนก่อน
Can you test Gemini 1,5 with rpa?
@alden6321 8 หลายเดือนก่อน ⁺¹
when it needs to complete a captcha to log in for you 0_0
@brianmi40 8 หลายเดือนก่อน
Already AI able at 97%.
@darkbelg 8 หลายเดือนก่อน
Could you make a QA bot that is given a scenario with steps to test happy flows and maybe also negative flows?
@AIJasonZ 8 หลายเดือนก่อน
Totally can, it is actually a perfect use case
@Bt_allen22 8 หลายเดือนก่อน
I was using chatGPT to build an agent exactly like this. Does OpenAI have access to the contents of chatGPT chats? This is odd timing for this to all of a sudden be announced💀 I’d be more inclined to call it a coincidence if the company that’s building this wasn’t the same company I was using to develop this. Not making any claims but I’m curious now. Does OpenAI have access to user chat data?
@MeatCatCheesyBlaster 8 หลายเดือนก่อน
That’s just parallel thinking Amy Schumer
@74Gee 8 หลายเดือนก่อน
Yeah, Agents are great but what happens when someone prompts for their agent to find their banking password on the computer, log into the bank and transfer all funds to xxx - on someone else's computer - on an entire botnet of computers (millions)
@DeepfriedBaby 8 หลายเดือนก่อน
omg, this means that designers have to design for another viewport/user agent... AI
@errmmm 8 หลายเดือนก่อน ⁺¹
Why am I watching this at 2am. I don't even know how to code 😭
@semosemo3827 8 หลายเดือนก่อน
I can work on that and fix the problems, I need research center to work in
@rahuldinesh2840 8 หลายเดือนก่อน
It exist now multion, uipath.
@GvRy8_5x46o7yXgSGaaJ. 8 หลายเดือนก่อน
It’s only a matter of time. Soon AI will be walking on CPU’s.
@jirivchi 8 หลายเดือนก่อน
Do we need API webql to try this?
@AIJasonZ 8 หลายเดือนก่อน ⁺¹
Yes I believe you do, but they are planning open source it too
@neponel 8 หลายเดือนก่อน
can anyone recommend similar channels around the web?
@hqcart1 8 หลายเดือนก่อน
what the hell is webQ??? where is the AI scraper????
@PavelDudka 8 หลายเดือนก่อน ⁺¹
good point. We renamed it to AgentQL :)
@hbruceweaver 8 หลายเดือนก่อน ⁺¹
Release the scraper code ❤
@AIJasonZ 8 หลายเดือนก่อน
Added GitHub link in description! But you need api key first
@skateking8 8 หลายเดือนก่อน
Chat GPT + selenium = scary
@theWebViking 8 หลายเดือนก่อน
How does OpenAI keep pushing shitc?
@anatolydyatlov963 8 หลายเดือนก่อน
Why should I be excited or scared? I've had this thing for over a year now - built not long after GPT-3.5 was released. It has full control over my linux machine and works fairly well.
@JamesHoffmannLover 6 หลายเดือนก่อน
Did it also write this comment because that would explain a lot
@anatolydyatlov963 6 หลายเดือนก่อน
@@JamesHoffmannLover Why so cheeky? Sure, they're doing a bit more than simply allowing an LLM to control your OS through the CLI, but do you really think the leap is that big? Like... I'm genuinely curious about your opinion on this. Which unique feature that we didn't already have in similar open-source projects is so exciting or fear-inducing?
@JamesHoffmannLover 6 หลายเดือนก่อน
@@anatolydyatlov963 Just keep playing with your Linux toy while the rest of us keep an open mind on new technology advancements 👍. But please try to show some respect for people like ai Jason who cover these topics for us
@anatolydyatlov963 6 หลายเดือนก่อน
@@JamesHoffmannLover Why do you refuse to acknowledge the hard work of numerous software developers from the whole world who have created similar projects, FAR exceeding what I'm describing here? Have you even heard of the Self-Operating Computer Framework by OthersideAI? You're treating them like ghosts who don't even exist, and when a big corporation creates something similar, you're cheering as if they made a groundbreaking discovery. Own it up.
@JamesHoffmannLover 6 หลายเดือนก่อน
@@anatolydyatlov963 lol since when am I doing any of that? Maybe re-read the comments and think about it for a while.
@Dron008 8 หลายเดือนก่อน
Can it solve any capcha?
@AIJasonZ 8 หลายเดือนก่อน
Yes, I believe so for simple ones
@alfonsopayra 8 หลายเดือนก่อน
I would put it to play poker 😂
@whosaidthat2201 8 หลายเดือนก่อน
This is just going to end up with a bunch of ai talking to each other lol
@brando2818 8 หลายเดือนก่อน
🔥🔥🔥🔥🔥/🔥🔥🔥🔥🔥
@ahsin.shabbir 8 หลายเดือนก่อน
Self operating computer doesn't actually work...
@Mehrdadkh87 8 หลายเดือนก่อน
To be honest selenium is still easier.
@mixmax6027 8 หลายเดือนก่อน
AutohotKey has been doing this for years. Requires basic programming skills.
@MeatCatCheesyBlaster 8 หลายเดือนก่อน
lol at the soy face thumbnails
@lexchirita 8 หลายเดือนก่อน
Did you try CogAgent?
@scaledeals-io 8 หลายเดือนก่อน
This is insane!

ต่อไป

เล่นอัตโนมัติ

Unlock AI Agent real power?! Long term memory & Self improving