00:05 OpenAI is developing a new type of agent to automate tasks. 02:02 Simulating real human interaction on computer devices unlocks everyday personal assistant use cases. 05:59 Training Agent 2.0 requires more steps and reasoning ability, but the potential market opportunities are exciting. 07:59 Using HTML or XML based approach to provide context for web agents. 12:03 Improving accuracy of self-operating computer vision tasks 14:03 Using multiple models together to interact with GUI screenshots 17:47 WebQ basic allows easy web automation 19:27 Setting up web ql and play right for web browser interaction 22:40 Developed a script for universal e-commerce product information scraping. 24:32 WebQ allows you to build powerful web agents for various tasks.
Thank god, I thought you were gone. Been waiting for another video. You are going places Jason, incredibly talented at teaching complex subjects in easy to understand digestible videos. Thank you!
Hi Jason, I subscribed a few weeks ago and have been really appreciating your videos! As others have said, your teaching style is very thorough and contains enough depth without being overwhelming. Thanks for keeping us all up to date on the latest AI tech and for taking the time to break it down for easy understanding. Wondering if you have a community?
The breakdown of different methods to get vision model identify UI elements to interact with is very useful; I just start imaging - if we have super powerful web agents & spin up 1000+ virtual machines and letting them completing web tasks simutanously - it's gonna be so powerful
True. The limit will be: A lotta people use windows, but windows is extremely resource heavy, even for VMs. *I guess people could try to use Linux and "Wine", but I'm not sure how good that will be, and won't be pure AI. You can spin up say 100 VMs of Linux, and because Windows is so much more heavy and requires licenses and stuff, thats like probably 10 windows machines only (just a off the cuff, but it is probably something like that) Anyway TL/DR: hopefully windows OS can help deal with their limitations so we all can do/have this be effective at scale
This is my favourite AI channel, perfect mix of theory and practical application. Would love to see an indepth video on this. Please keep it up, these videoes help a lot!
I’m only a minute into this video, but I just want to say- what amazing visuals (like the diagrams)! Also, this is such a great premise for a video, and you are great at explaining complex things in a way such that people of varying levels of technical knowledge can understand them.
I also worked on a similar project right after they announced rabbit r1, it was using ocr+yolo+ llm to control the computer. I was able to get it to click wherever I wanted, but failed to build the backend for the llm to orchestrate the high level tasks. It was simply too much work for me. 😅😅
Jason, you are a hero! this is great. Please a video for an agent who can browse the web discover websites that can be useful and organize the URLs in a spreadsheet. Thank you!!!
DAYUMMM man you are always dropping fire contents. I'm always curious on what you can bring to the table for each upload and they always amaze me the quality you can bring. Thanks a lot, much appreciated and keep it up Jason! Big fan right here :)
Thanks alot for this video! I am also actively exploring this tools - so far I used the approach of Self Operating Computer with modifications - instead of using Selenuim or playwrite and struggle with web elements locators, I just define coordinates of those elements and interact with them. Whebql looks really interesting and I definitely will try it. I think that it's real potential can be used in a multi agents team, like Autogen or CrewAI. Thanks again!
Demonolotry practitioners and others are known to channel demons (defined in many more ways than most realize) to write books through them and develop technological/scientific advancements.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
I believe that a viable solution would be to create a new city designed specifically for autonomous cars to flow smoothly, rather than trying to adapt autonomous cars in cities designed for human drivers. Similarly, I propose creating a dedicated part of the internet for interactions between artificial intelligences, where everything could be automated or accessed by voice commands. This would even include developing an operating system that accepts these types of interactions. For example, apps like Uber could be entirely accessible by artificial intelligences, allowing users to request services through voice commands. This could simplify tasks such as requesting a ride to downtown New York, where artificial intelligence could perform various actions within the app to meet the user's needs. A webpages designer that way would be the solution to agents trying to do many tasks on an website
Mmmmm… I like the approach… like a hybrid environment. Not necessarily force AI into ours or us into AI… but two distinct environments designed for human in one environment and AI in another to optimize performance in either environment with freedom to access the other environment knowing well that performance will be limited when not in the “native environment”.
I think at this point, for practically speaking all of us, _how_ it works no longer matters. It’s now much more pressing to know what to do with it, as well as with ourselves.
This is awesome, a few month ago web agent still feel like unusable, didn't know it has come along so far; The demo in the end is super pratical, an universal web scraper itself will unlock lots of use case, trying it out tonight!!
Fantastic introduction to WebQL and what it can do. Well done! I'm curious about the name change to "AgentQL" and what it portends. Lots of agent frameworks use image snapshots, possibly with overlaid annotations, then drive the desktop or browser with basic actions (click, text entry, etc). Maybe generating WebQL is a better approach to driving the browser?
Brilliant video - thanks for this. It also makes me concerned for the future of web security - we're almost at the point where an AI tool can be built that will call your mobile, speak to you in a real voice and extract enough information from you to enable it to then browse the web, log into your bank account and ... well, you get the rest.
🏆 Well done! Happy to see your subscriber count is growing as your videos are quite valuable! I look forward to an AI agent where a topic/task can be given with little to no guidance. It uses a swarm to find relevant websites, grabs relevant info and creates a spreadsheet to compared the websites found. Use case: Best price for an item you want to purchase. Best features on offer for online tool/service to meet one's needed. Performing competitor analysis as part of business idea validation process. Many others I'm sure you can think of, too.
Love your videos Jason. You are one of the few guys who make good content for someone who is not entirely new in the space. Greatly appreciated! You are a GOAT in my opinion 🙌🐐🤩
You are the only real pioneer of AI education that is readily avaliable to people that are not directly involved in this new developing area of CS. All others are just baiting for views and scamming saying "make xyz $$$ with my shitty code that I just copied from somewhere else" lol
Great video!! Thx for your regular insights the perfect balance between new tech and tutorial. In your video you said that webql is open source. Where do I find these sources.
It was great explanation... I have been using Lang chain shell tool to perform various actions on my desktop that can be done by cmd. I believe a mix of power shell, HTML parsing and a ocr can act a good model to be in production. So based on the given prompt, a lllm master agent can take a decision to use any of the path. And after 3-4 tries if It fails to do it. Then it can come back and get redirected to another path to fulfil the task.
Simple solution going forwards would be to add comments in all UI elements when designing a website. Describe exactly what they do: ie. //This is the element to submit the login form. Etc. Would take a while to catch on in the web dev community. I for one will be doing this on any of my sites going forward. Accessibility for AI is a genuine concern at this point. 🎉
If sora has a universal model of real world physics, it sure could have a model of the universe of web browsers interfaces. That would make all these hack work arounds redundant. Open ai could have been using gpt-4 to build this training data for ages and be streaks ahead. If agents can learn to play video games, sora has a multi-world model of physics, and gpt-4 can reason better than most humans…. Wow
I was using chatGPT to build an agent exactly like this. Does OpenAI have access to the contents of chatGPT chats? This is odd timing for this to all of a sudden be announced💀 I’d be more inclined to call it a coincidence if the company that’s building this wasn’t the same company I was using to develop this. Not making any claims but I’m curious now. Does OpenAI have access to user chat data?
Yeah, Agents are great but what happens when someone prompts for their agent to find their banking password on the computer, log into the bank and transfer all funds to xxx - on someone else's computer - on an entire botnet of computers (millions)
Why should I be excited or scared? I've had this thing for over a year now - built not long after GPT-3.5 was released. It has full control over my linux machine and works fairly well.
@@JamesHoffmannLover Why so cheeky? Sure, they're doing a bit more than simply allowing an LLM to control your OS through the CLI, but do you really think the leap is that big? Like... I'm genuinely curious about your opinion on this. Which unique feature that we didn't already have in similar open-source projects is so exciting or fear-inducing?
@@anatolydyatlov963 Just keep playing with your Linux toy while the rest of us keep an open mind on new technology advancements 👍. But please try to show some respect for people like ai Jason who cover these topics for us
@@JamesHoffmannLover Why do you refuse to acknowledge the hard work of numerous software developers from the whole world who have created similar projects, FAR exceeding what I'm describing here? Have you even heard of the Self-Operating Computer Framework by OthersideAI? You're treating them like ghosts who don't even exist, and when a big corporation creates something similar, you're cheering as if they made a groundbreaking discovery. Own it up.
00:05 OpenAI is developing a new type of agent to automate tasks.
02:02 Simulating real human interaction on computer devices unlocks everyday personal assistant use cases.
05:59 Training Agent 2.0 requires more steps and reasoning ability, but the potential market opportunities are exciting.
07:59 Using HTML or XML based approach to provide context for web agents.
12:03 Improving accuracy of self-operating computer vision tasks
14:03 Using multiple models together to interact with GUI screenshots
17:47 WebQ basic allows easy web automation
19:27 Setting up web ql and play right for web browser interaction
22:40 Developed a script for universal e-commerce product information scraping.
24:32 WebQ allows you to build powerful web agents for various tasks.
Thank god, I thought you were gone. Been waiting for another video. You are going places Jason, incredibly talented at teaching complex subjects in easy to understand digestible videos.
Thank you!
this is such a sweet comment :-)
he doesnt upload often anyway, he's never truly gone. the quality is among the best though
Thanks for the kind words! Glad you're enjoying the content!
Hi Jason, I subscribed a few weeks ago and have been really appreciating your videos! As others have said, your teaching style is very thorough and contains enough depth without being overwhelming. Thanks for keeping us all up to date on the latest AI tech and for taking the time to break it down for easy understanding. Wondering if you have a community?
The breakdown of different methods to get vision model identify UI elements to interact with is very useful;
I just start imaging - if we have super powerful web agents & spin up 1000+ virtual machines and letting them completing web tasks simutanously - it's gonna be so powerful
True. The limit will be: A lotta people use windows, but windows is extremely resource heavy, even for VMs. *I guess people could try to use Linux and "Wine", but I'm not sure how good that will be, and won't be pure AI. You can spin up say 100 VMs of Linux, and because Windows is so much more heavy and requires licenses and stuff, thats like probably 10 windows machines only (just a off the cuff, but it is probably something like that)
Anyway TL/DR: hopefully windows OS can help deal with their limitations so we all can do/have this be effective at scale
Oh, you said WEB tasks. Yeah pure web is different, since you could use any OS for that probably and use a "headless browser". Great point.
What kinds of tasks do you envisage?
This is my favourite AI channel, perfect mix of theory and practical application. Would love to see an indepth video on this. Please keep it up, these videoes help a lot!
I’m only a minute into this video, but I just want to say- what amazing visuals (like the diagrams)!
Also, this is such a great premise for a video, and you are great at explaining complex things in a way such that people of varying levels of technical knowledge can understand them.
Thanks for the kind words, I will keep it up!
Great video, well researched. This is what I expect from a quality channel. You got yourself a subscriber!
I also worked on a similar project right after they announced rabbit r1, it was using ocr+yolo+ llm to control the computer.
I was able to get it to click wherever I wanted, but failed to build the backend for the llm to orchestrate the high level tasks.
It was simply too much work for me. 😅😅
Jason, you are a hero! this is great. Please a video for an agent who can browse the web discover websites that can be useful and organize the URLs in a spreadsheet. Thank you!!!
DAYUMMM man you are always dropping fire contents. I'm always curious on what you can bring to the table for each upload and they always amaze me the quality you can bring. Thanks a lot, much appreciated and keep it up Jason! Big fan right here :)
When I watch your videos, I feel I'm watching the future!
Thanks Jason for your content 👏🏼
High quality video from Jason as always ❤
yes sir i would love a github link to check out the code of your scraper please.
Same here
Very interested in the "in-depth" video you mentioned at the end, looking forward to seeing you add GPT Vision with WebQL
Looking forward to the in depth video. 👍🏻
Thanks alot for this video!
I am also actively exploring this tools - so far I used the approach of Self Operating Computer with modifications - instead of using Selenuim or playwrite and struggle with web elements locators, I just define coordinates of those elements and interact with them. Whebql looks really interesting and I definitely will try it. I think that it's real potential can be used in a multi agents team, like Autogen or CrewAI. Thanks again!
Thanks Jason. I’ve only just found your channel but I’m glad I did. This is great content!
Great video Jason, really informative, thank you
I'm convinced open AI has an alien locked deep underground spilling all this black magic technology.
Demonolotry practitioners and others are known to channel demons (defined in many more ways than most realize) to write books through them and develop technological/scientific advancements.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
Demonolatry practitioners (and others) are known to channel demons to write books through them and accelerate the advancement of science and technology.
babe wake tf up ai jason posted
This is incredible, this will definitely change the way people interact with apps and the internet
I believe that a viable solution would be to create a new city designed specifically for autonomous cars to flow smoothly, rather than trying to adapt autonomous cars in cities designed for human drivers. Similarly, I propose creating a dedicated part of the internet for interactions between artificial intelligences, where everything could be automated or accessed by voice commands. This would even include developing an operating system that accepts these types of interactions. For example, apps like Uber could be entirely accessible by artificial intelligences, allowing users to request services through voice commands. This could simplify tasks such as requesting a ride to downtown New York, where artificial intelligence could perform various actions within the app to meet the user's needs. A webpages designer that way would be the solution to agents trying to do many tasks on an website
If we start building whole cities to accommodate ai, then we are working for ai, ai is not working for us
Mmmmm… I like the approach… like a hybrid environment. Not necessarily force AI into ours or us into AI… but two distinct environments designed for human in one environment and AI in another to optimize performance in either environment with freedom to access the other environment knowing well that performance will be limited when not in the “native environment”.
I think at this point, for practically speaking all of us, _how_ it works no longer matters. It’s now much more pressing to know what to do with it, as well as with ourselves.
This is awesome, a few month ago web agent still feel like unusable, didn't know it has come along so far;
The demo in the end is super pratical, an universal web scraper itself will unlock lots of use case, trying it out tonight!!
3 minutes in, I already knew how this will help my actual work
Great video thanks Jason
Fantastic introduction to WebQL and what it can do. Well done!
I'm curious about the name change to "AgentQL" and what it portends. Lots of agent frameworks use image snapshots, possibly with overlaid annotations, then drive the desktop or browser with basic actions (click, text entry, etc). Maybe generating WebQL is a better approach to driving the browser?
Super interesting topic. I'd really like to see GPT-4V working with WebQL/AgentQL. 🎉
Wow this information is so well done. Thank you
Brilliant video - thanks for this. It also makes me concerned for the future of web security - we're almost at the point where an AI tool can be built that will call your mobile, speak to you in a real voice and extract enough information from you to enable it to then browse the web, log into your bank account and ... well, you get the rest.
🏆 Well done! Happy to see your subscriber count is growing as your videos are quite valuable!
I look forward to an AI agent where a topic/task can be given with little to no guidance. It uses a swarm to find relevant websites, grabs relevant info and creates a spreadsheet to compared the websites found. Use case: Best price for an item you want to purchase. Best features on offer for online tool/service to meet one's needed. Performing competitor analysis as part of business idea validation process. Many others I'm sure you can think of, too.
Love your videos Jason. You are one of the few guys who make good content for someone who is not entirely new in the space. Greatly appreciated! You are a GOAT in my opinion 🙌🐐🤩
You are the only real pioneer of AI education that is readily avaliable to people that are not directly involved in this new developing area of CS.
All others are just baiting for views and scamming saying "make xyz $$$ with my shitty code that I just copied from somewhere else" lol
incredible video jason. When do you think they'll release it approximately? Does this kill all agent startups?
Great analysis, thanks for sharing
Great video!! Thx for your regular insights the perfect balance between new tech and tutorial. In your video you said that webql is open source. Where do I find these sources.
WebGl seems pretty nice, will give it a shot, is there a javascript version of it?
Not yet, but its in a roadmap
Nice video, BTW do you have the new link for webQL ? Does not work
It was great explanation...
I have been using Lang chain shell tool to perform various actions on my desktop that can be done by cmd. I believe a mix of power shell, HTML parsing and a ocr can act a good model to be in production.
So based on the given prompt, a lllm master agent can take a decision to use any of the path. And after 3-4 tries if It fails to do it. Then it can come back and get redirected to another path to fulfil the task.
these are the beast AI Agent tutorials
Simple solution going forwards would be to add comments in all UI elements when designing a website. Describe exactly what they do: ie. //This is the element to submit the login form. Etc. Would take a while to catch on in the web dev community. I for one will be doing this on any of my sites going forward. Accessibility for AI is a genuine concern at this point. 🎉
Oh that’s a great point! Yea some portal around it will be great
i wanna know how ya brainstorm the thumbnail idea?
Dude its amazing love it. Keep it up
This is gold. Thanks!
How does it deal with hint modals that pop up, like in 23:56? If its a screenshot, it'll obscure things.
ありがとうございます!
Thank you 🙏
If sora has a universal model of real world physics, it sure could have a model of the universe of web browsers interfaces. That would make all these hack work arounds redundant. Open ai could have been using gpt-4 to build this training data for ages and be streaks ahead. If agents can learn to play video games, sora has a multi-world model of physics, and gpt-4 can reason better than most humans…. Wow
Great breakdown! 👍
Thanks
Thank you 🙏
Thanks Jason!!
What is the cost of AgentQL
They are beta testing now so don’t think they finalised pricing yet!
Thank you!
hey json this was amazing , can you please prepare a hands on tutorial on cogAgent
great content sir
This is good Stuff, thanks.
That WebQL sounds crazy. I searched for it but couldn’t find their site, just some placeholder. You know what happened?
I’ve added link in description!
thank you
Can you use an LLM to talk to the editor instead of coding?
thoughts on $OLAS? its a framework to develop AI agents
Before agents were a thing I was using another gpt to determine which tools that were needed and then would format the request for me.
amazing content!
I tried tools like MultiOn but no no-code tool seems to work well yet. Open to suggestions.
we want another autonomous agent part 4 with langgraph
Isn't it just a fancy Selenium Webdriver?
Chrome extension - You need to request permission to download the extension. Not sure how long will it take to get the access.
Can you test Gemini 1,5 with rpa?
when it needs to complete a captcha to log in for you 0_0
Already AI able at 97%.
Could you make a QA bot that is given a scenario with steps to test happy flows and maybe also negative flows?
Totally can, it is actually a perfect use case
I was using chatGPT to build an agent exactly like this. Does OpenAI have access to the contents of chatGPT chats? This is odd timing for this to all of a sudden be announced💀 I’d be more inclined to call it a coincidence if the company that’s building this wasn’t the same company I was using to develop this. Not making any claims but I’m curious now. Does OpenAI have access to user chat data?
That’s just parallel thinking Amy Schumer
Yeah, Agents are great but what happens when someone prompts for their agent to find their banking password on the computer, log into the bank and transfer all funds to xxx - on someone else's computer - on an entire botnet of computers (millions)
omg, this means that designers have to design for another viewport/user agent... AI
Why am I watching this at 2am. I don't even know how to code 😭
I can work on that and fix the problems, I need research center to work in
It exist now multion, uipath.
It’s only a matter of time. Soon AI will be walking on CPU’s.
Do we need API webql to try this?
Yes I believe you do, but they are planning open source it too
can anyone recommend similar channels around the web?
what the hell is webQ??? where is the AI scraper????
good point. We renamed it to AgentQL :)
Release the scraper code ❤
Added GitHub link in description! But you need api key first
Chat GPT + selenium = scary
How does OpenAI keep pushing shitc?
Why should I be excited or scared? I've had this thing for over a year now - built not long after GPT-3.5 was released. It has full control over my linux machine and works fairly well.
Did it also write this comment because that would explain a lot
@@JamesHoffmannLover Why so cheeky? Sure, they're doing a bit more than simply allowing an LLM to control your OS through the CLI, but do you really think the leap is that big? Like... I'm genuinely curious about your opinion on this. Which unique feature that we didn't already have in similar open-source projects is so exciting or fear-inducing?
@@anatolydyatlov963 Just keep playing with your Linux toy while the rest of us keep an open mind on new technology advancements 👍. But please try to show some respect for people like ai Jason who cover these topics for us
@@JamesHoffmannLover Why do you refuse to acknowledge the hard work of numerous software developers from the whole world who have created similar projects, FAR exceeding what I'm describing here? Have you even heard of the Self-Operating Computer Framework by OthersideAI? You're treating them like ghosts who don't even exist, and when a big corporation creates something similar, you're cheering as if they made a groundbreaking discovery. Own it up.
@@anatolydyatlov963 lol since when am I doing any of that? Maybe re-read the comments and think about it for a while.
Can it solve any capcha?
Yes, I believe so for simple ones
I would put it to play poker 😂
This is just going to end up with a bunch of ai talking to each other lol
🔥🔥🔥🔥🔥/🔥🔥🔥🔥🔥
Self operating computer doesn't actually work...
To be honest selenium is still easier.
AutohotKey has been doing this for years. Requires basic programming skills.
lol at the soy face thumbnails
Did you try CogAgent?
This is insane!