MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux)

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 มิ.ย. 2024
  • OS World gives agents the ability to fully control computers, including MacOS, Windows, and Linux. By giving agents a language to describe actions in a computer environment, OS World can benchmark agent performance like never before.
    Try Deepchecks LLM Evaluation For Free: bit.ly/3SVtxLJ
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? 📈
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    👉🏻 Instagram: / matthewberman_ai
    👉🏻 Threads: www.threads.net/@matthewberma...
    Media/Sponsorship Inquiries ✅
    bit.ly/44TC45V
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 282

  • @donrosenthal5864
    @donrosenthal5864 หลายเดือนก่อน +54

    OSWorld project video? Yes, please!!!

    • @reidelliot1972
      @reidelliot1972 หลายเดือนก่อน +4

      Yes, tutorial please! Please elaborate more on the relationship to CrewAI-like frameworks and potential implications for the rumored YAML endpoints!

    • @user-wz3qe3vw6h
      @user-wz3qe3vw6h หลายเดือนก่อน +3

      @@reidelliot1972 Yes Matthew, pls!

  • @jamesheller2707
    @jamesheller2707 หลายเดือนก่อน +66

    Please make more videos testing and running this yourself🙏🏼, I'll be great

    • @reidelliot1972
      @reidelliot1972 หลายเดือนก่อน

      Yes, tutorial please! Please elaborate more on the relationship to CrewAI-like frameworks and potential implications for the rumored YAML endpoints!

  • @Carnivore69
    @Carnivore69 หลายเดือนก่อน +64

    User: What happens between the steps in these Ikea instructions?
    Agent: A fuckton of swearing!
    User: Test passed.

    • @BangaloreYoutube
      @BangaloreYoutube หลายเดือนก่อน +1

      Legit laughed for 10 mins 😂

  • @KCM25NJL
    @KCM25NJL หลายเดือนก่อน +55

    It's great an all, but I kinda think one of two things will end up happening:
    1. An AI layer will become a standard for interoperability as part of the OSI and App Dev Stacks
    2. A whole new OS will be developed that serves this very purpose.
    I suspect we may start with 1 and end up with 2 in the longer term.

    • @theterminaldave
      @theterminaldave หลายเดือนก่อน +5

      When i was helping to write test steps for an automated software testing app, I was required to basically open up the developer tools and get the name of the object that was needed to be interacted with, the HTML code/name for a particular button, or a certain drop down textbox.
      I don't understand the whole "lay a grid over the screen and guess the coordinates." That's just the user interface, the computer utilizes all the code in the background I don't get why the AI isn't navigating by looking at the underlying code for the page instead of the graphical output of the page?

    • @DaveEtchells
      @DaveEtchells หลายเดือนก่อน

      @@theterminaldave Interesting point. I'd say though that the point is to have the AI interact with the UI based on what a human would see. on a related note, there have been tools for doing software regression testing dating back many years that'd let you interact with UI elements, but it was a PITA to write the scripts for them and they were very fragile in that tiny changes could send them off the rails.

    • @Daniel-jm8we
      @Daniel-jm8we หลายเดือนก่อน +1

      ​@@theterminaldave Would the AI always have access to the code?

    • @ich3601
      @ich3601 หลายเดือนก่อน

      ​@@Daniel-jm8we Allmost. When using RPA-Tools you're scanning the HTML, the OS-Events, or the application events. Would be great if an AI would eat this stuff, because nowadays RPA-Tools are very sensitive to changes.

    • @theterminaldave
      @theterminaldave หลายเดือนก่อน

      @@Daniel-jm8we Open any webpage, and press f12, and click on the inspector tab, that's the code I'm referring to.
      It's basically the code for the graphical interface, so yes, the AI would always have "access" because if you don't have access it's because it's not appearing on the page.
      After you open the inspector, click on any line and hit delete, it will disappear from the page. If you hit refresh it will come back.

  • @haroldpierre1726
    @haroldpierre1726 หลายเดือนก่อน +28

    It would be helpful to have a catalog of pre-built open source AI agents that can be easily downloaded and used for specific tasks. My brain shuts off trying to follow video tutorials on programming my own AI agent from scratch.

  • @ScottzPlaylists
    @ScottzPlaylists หลายเดือนก่อน +15

    Yes Please 👍 Need lots of OSWorld Videos ❗❗❗
    We need a video tutorial watching AI, that creates a training set item for OSWorld on how to do X, by watching a video on how to do X (and fills in missing details not shown). 🤯🤯🤯🤯❗❗❗❗

    • @AGIBreakout
      @AGIBreakout หลายเดือนก่อน +8

      Great Idea!!!!

    • @CryptoMetalMoney
      @CryptoMetalMoney หลายเดือนก่อน +7

      YT Tutorials videos would be a huge ready to go dataset... Great Idea

    • @CryptoMetalMoney
      @CryptoMetalMoney หลายเดือนก่อน +5

      Continuous learning will be huge in the future, and using computers will be a big part of that.

    • @NWONewsGod
      @NWONewsGod หลายเดือนก่อน +5

      YT is a treasure trove for more Advanced forms of AI Training and even Training Now.

  • @BlankBrain
    @BlankBrain หลายเดือนก่อน +5

    The most difficult part of making something like OSWorld is security. When you open your OS to computer manipulation, it's a lot easier for computers to manipulate it.

  • @threepe0
    @threepe0 หลายเดือนก่อน +4

    Really look forward to your videos. You’ve helped me get the gist of developments as they come out and determine which technologies are useful and worth spending my time on, and which ones I am equipped to handle, for my personal use-cases.
    I have and will continue to recommend your channel to friends and co-workers.
    Seriously man when I see your name, I click. Thank you for continuing to do what you do.

  • @alanhoeffler9629
    @alanhoeffler9629 หลายเดือนก่อน +2

    This was good video showing what had to be done to make LLM’s agentic using computer OS’s. It showed me two things. The first was why self autonomous cars are so hard to set up. The auto system has to not only know what the “rules of the road” are, what the automobile’s driving characteristics are, and how to make the car do what it needs to do, but it has to be able to correctly parse at high speed what a situation that it has never encountered before is, what is the correct action to take is and pull off executing it in real time. The second is that a system that can do that well is way closer to AGI than any LLM.

  • @justjosh1400
    @justjosh1400 หลายเดือนก่อน +1

    Can't wait for the tutorial. Wanted to say thanks for the videos Matthew.

  • @reidelliot1972
    @reidelliot1972 หลายเดือนก่อน +1

    Yes, tutorial please! Please elaborate more on the relationship to CrewAI-like frameworks and potential implications for the rumored YAML endpoints!

  • @jimbo2112
    @jimbo2112 หลายเดือนก่อน +2

    Yes please! Tutorial on this would be great. I see agents as being a driving force behind vast amounts of commercial AI adoption. Companies want greater efficiency and agents are the tools to bring this.

  • @JandJActionPlay
    @JandJActionPlay หลายเดือนก่อน +2

    always awesome and informative videos Matt, love it brother. And feel like that much smarter after watching them. Keep up the awesome work!

  • @pvanukoff
    @pvanukoff หลายเดือนก่อน +51

    Not long before we have star-trek style computers, where we just say "computer ... do x, y and z for me".

    • @theterminaldave
      @theterminaldave หลายเดือนก่อน +2

      That's the goal. Agentic AI.

    • @ericspecullaas2841
      @ericspecullaas2841 หลายเดือนก่อน +2

      You can do that now. Although food replicator and hollowdecks are still far off

    • @shooteru
      @shooteru หลายเดือนก่อน +6

      Working on it, many of us

    • @JBulsa
      @JBulsa หลายเดือนก่อน

      2 - 9 years

    • @tomaszzielinski4521
      @tomaszzielinski4521 หลายเดือนก่อน

      Who do you mean by "we"?

  • @AhmedMagdy-ly3ng
    @AhmedMagdy-ly3ng หลายเดือนก่อน +1

    I will be more than happy to see you testing it in real world examples, not complex task but just everyday tasks, like summarize a bunch of pdfs or make a research, and things like that.
    And also i need to say that a really appreciate your work❤

  • @marshallodom1388
    @marshallodom1388 หลายเดือนก่อน +7

    Computer! Computer?
    [Handed a mouse, he speaks into it]
    Hello, computer.
    The Dr. says just use the keyboard.
    Keyboard. How quaint.

  • @PhoebusG
    @PhoebusG หลายเดือนก่อน +1

    Yes, def set it up that would be a good video. Keep up the cool videos :)

  • @darwinboor1300
    @darwinboor1300 หลายเดือนก่อน

    Thanks Matt.
    The change-the-background task is like an Optimus realworld task. Using the mouse requires a collection of basic motion skills (eg move in XY, click right/left, scroll up/down, etc). Moving and activating the mouse on a screen are simple subtasks necessary to build actual realworld tasks (on the PC these basic skills and subtasks and more can be accomplished using AutoHotkey). The reactive sequence of mouse subtasks (including motions) is the equivalent of FSD navigation from location A to B in the realworld or Optimus stepping through a set of realworld subtasks to complete a realworld task. The advantage for a change-the-background task AI is the paucity of edge cases that make realworld tasks so difficult for Optimus and for FSD. All three AI systems need to evaluate the realworld changes they evoke before executing the next subtask. Optimus and FSD repeatedly face infinite realworld variations between subtasks. These variations are introduced by independent external agents (cars, animals, fallen trees, etc.) The change-the-background task AI will mostly face changes due to software upgrades and due to different starting states. Most computer issues can be resolved by deeper searches on the web. AutoHotkey can programatically solve simple issues (hiding open windows). Having an AI to navigate the process would fundamentally change the ability to execute complex computer tasks based upon simple sequences of verbal commands.
    Here is an example: Convert the most recent Matt Berman TH-cam to mp4 and then extract unique screenshots to a Powerpoint file and the youtube transcript without timestamps to a text file. The filename for each file is MB1.

  • @BThunder30
    @BThunder30 หลายเดือนก่อน +1

    This is amazing. I think you need a team to help you set it up fast. We want to see a demo!

  • @wardehaj
    @wardehaj หลายเดือนก่อน

    Great explanation video. Thanks a lot!

  • @rupertllavore1731
    @rupertllavore1731 หลายเดือนก่อน

    NICE is see you getting Brand deals! May your channel Keep getting more brand deals!

  • @nqnam12345
    @nqnam12345 หลายเดือนก่อน +1

    Great ! Pls more on this topics

  • @iwatchyoutube9610
    @iwatchyoutube9610 หลายเดือนก่อน +5

    I was waiting for your own test the whole video. Git'r done son!

  • @LauraMedinaGuzman
    @LauraMedinaGuzman หลายเดือนก่อน

    Amazing! I want to try it for Revit, a software for architecture. Actually I did try something that worked! However I truly need more knowledge, so your help is very very aprecciated! Thanks!

  • @DefaultFlame
    @DefaultFlame หลายเดือนก่อน +1

    Nice! I'd love to see you test it out.

  • @tigs9573
    @tigs9573 หลายเดือนก่อน

    Yes I would like to learn more about OSworld , keep up with the great content

  • @arinco3817
    @arinco3817 หลายเดือนก่อน

    This is really interesting. I've been thinking for ages about how to go from vllm to action. It's a bit like us sitting in front of your computer and describing what you want to happen.

  • @dilfill
    @dilfill หลายเดือนก่อน

    Would love to see you test this out doing a few different tasks! Also curious if this could run someone's social media etc.

  • @2106chrissas
    @2106chrissas หลายเดือนก่อน

    Great Project,
    it would be interesting to have a video on RAG and programs available for the RAG (example H2OGPT)

  • @timduck8506
    @timduck8506 หลายเดือนก่อน

    Are we able to programme new action's? or create new connection? like what we can already do with macros?

  • @EduardoJGaido
    @EduardoJGaido หลายเดือนก่อน

    Great video!

  • @kevinehsani3358
    @kevinehsani3358 หลายเดือนก่อน

    can a multimodal model scroll up or down on a screen and see more than just wha tis displayed? Can they actually read the text on cmd terminal and then act on it instead of us copy and paste it the reply into an input cotext?

  • @alpineparrot1057
    @alpineparrot1057 หลายเดือนก่อน

    I enjoy your content Matt. You put me on to LM Studio, then Ollama, then Crewai. CrewAI has excellent case use, so thank you so much. Could you please do some more stuff with CrewAI (I have mine setup in the one file approach, but am not too sure how to set it up with multiple files and calling to and from (I'm not to familiar with Python, chat gpt is excellent help, but it still only goes so far)..

  • @camilordofficial
    @camilordofficial หลายเดือนก่อน

    This video was great thanks. Could this work with IoT like devices?

  • @marcfruchtman9473
    @marcfruchtman9473 หลายเดือนก่อน

    Thanks for the video! Yes, this seems like it will be very useful.

  • @luxaeterna00
    @luxaeterna00 หลายเดือนก่อน

    Any link to the presentation? Thanks!

  • @roharbaconmoo
    @roharbaconmoo หลายเดือนก่อน

    Does anything change for your video with their addition of memory sharing?

  • @CharlesFinneyAdventure
    @CharlesFinneyAdventure หลายเดือนก่อน

    I would love to watch you setting up OS world on your own
    machine testing it out and using it to create
    a tutorial from it of it.

  • @scottwatschke4192
    @scottwatschke4192 หลายเดือนก่อน

    Very interesting. I would love a testing video.

  • @yugowatari2935
    @yugowatari2935 หลายเดือนก่อน

    Yes.. please do a tutorial in osworld. Have been waiting for this for some time.

  • @ThinkAI1st
    @ThinkAI1st หลายเดือนก่อน

    You are a very good teacher…so keep teaching.

  • @joe_limon
    @joe_limon หลายเดือนก่อน +11

    How close until I have a locally run agentic system that can install all future improved agentic systems and/or github projects autonomously?

    • @fullcrum2089
      @fullcrum2089 หลายเดือนก่อน +2

      With this, a person's ideas, dreams and personalities can become immortal.

    • @nickdisney3D
      @nickdisney3D หลายเดือนก่อน

      Id share my repo but i think youtube comments deletes it automatically.

    • @electiangelus
      @electiangelus หลายเดือนก่อน

      Already there. Im actually passed this.

    • @fullcrum2089
      @fullcrum2089 หลายเดือนก่อน

      @@nickdisney3D yes, i can't see it, just share the path repo/name.

    • @electiangelus
      @electiangelus หลายเดือนก่อน

      @@fullcrum2089 Apotheosis was thinking that 6 months ago.

  • @alexalex4192
    @alexalex4192 หลายเดือนก่อน

    hello i'v registred on massedcompute but couldn't find you preinstall system. Any tips? And may be you have a tutorial?

  • @moses5407
    @moses5407 หลายเดือนก่อน

    Great presentation! Too bad the accuracy levels are currently so low but this seems to be a framework that can self-grade and, hopefully, self-adjust for improvement.

  • @AetherTunes
    @AetherTunes หลายเดือนก่อน

    ive always wondered if you could incorporate vision for llm into something like shadowplay

  • @galaxymariosuper
    @galaxymariosuper หลายเดือนก่อน

    16:40 think of temperature as of maneuverability. the higher it is the more flexible the system, which is basically a closed loop control system at this point.

  • @systemlord001
    @systemlord001 หลายเดือนก่อน

    I think temp is set to 1 because if it fails and does another attempt it will have different approaches. When temp is set to lower values it might not get to a working solution because the tried method’s are not divergent enough to contain a valid solution.
    But i think having an llm fine tuned on datasets generated by humans in the format of OSWorld (the tree, screenshots ect…). Could improve the succes rate.
    If I am not mistaken this is what Rabbit R1 was doing. It’s basically teach mode but with more examples then just the one you give it.

  • @monnef
    @monnef หลายเดือนก่อน

    Very nice project. I would find interesting to see success rates in different OSes (or in case of Linux even DE/WM). Also GUI vs CLI - I can imagine on some tasks CLI would be a king, while in others it could fail miserably. Still, it could be useful to see for which use cases different OSes or GUI/CLI are better and might be worth of trying to utilize an AI for them.

  • @dreamphoenix
    @dreamphoenix หลายเดือนก่อน

    Awesome Thank you.

  • @gotemlearning
    @gotemlearning หลายเดือนก่อน

    great vid!

  • @adtiamzon3663
    @adtiamzon3663 16 วันที่ผ่านมา

    Good start. Excellent. 🤫 🌞👏👏

  • @mshonle
    @mshonle หลายเดือนก่อน

    16:38 It depends on the specific formula used for the temperature setting, so a 1 here is by no means the maximum. The use of top-p implies there is nucleus sampling being used, which prevents the most improbable completions from even being considered. They are looking for a wider sampling to establish a baseline and setting the temperature too low would create more repetitive results (repeats across different runs and also repeating the same phrase in a single run until the context is full) and thus would be too easy dismiss as a strawman.

  • @beckettrj
    @beckettrj หลายเดือนก่อน

    OSworld project videos please! This could be a series of videos?
    I could see this helping me do my job five times faster ! Helpdesk support tool to check and update XYZ application user account then email user letting them know we have updated their account and that they should be able to login. Complicated processes, such as opening VPN connection and checking active directory account settings, and then logging into administrative program(s) to search and open users account to check their settings. The user account Settings in active directory must match the user login settings in the application(s). Email the findings and let them know what was altered or changed, etc..

  • @settlece
    @settlece หลายเดือนก่อน

    i would definitely like to see more OSWorld
    thank for bringing this exciting news to us

  • @japneetsingh5015
    @japneetsingh5015 หลายเดือนก่อน

    I am already waiting for a linux where i could enter commands in natural language and the llm model gemerates a set of possible true commans and i just have to choose one or make a minor change

  • @OSWALD569
    @OSWALD569 หลายเดือนก่อน

    For performing actions on desktops there is a macro recorder available and suitable.

  • @ma77yg
    @ma77yg หลายเดือนก่อน

    would interesting to have a tutorial on this setup

  • @yenielmercado5468
    @yenielmercado5468 หลายเดือนก่อน

    Excited for the Humane Ai pin Agents feature coming .

  • @gatesv1326
    @gatesv1326 หลายเดือนก่อน

    Very similar to RPA (Robotic Process Automation) that I’ve been developing for 10 years now. Nothing new, but being able to do this with a typed or vocal prompt is what’s going to be interesting when it does get as good as a human can do (which is what RPA has been successful at doing for a long time), also understanding that RPA licences are expensive.

  • @awakstein
    @awakstein หลายเดือนก่อน

    Good! So, how do we test it?

  • @spikezz29
    @spikezz29 หลายเดือนก่อน

    Do you have plan for taking about DSPy?

  • @ericgoz3858
    @ericgoz3858 หลายเดือนก่อน

    What Python version as a component of OSWorld is required to launch in a Linux Arch Zen Kernel environment?

  • @Treewun2
    @Treewun2 หลายเดือนก่อน

    Please do a series on Fine Tuning open source models!

  • @andreluistomaz3930
    @andreluistomaz3930 หลายเดือนก่อน

    Ty!

  • @francoislanctot2423
    @francoislanctot2423 หลายเดือนก่อน

    Thanks Yes please install it and show us the procedure. I think it is going to be useful for a lot of people.

  • @BelaKomoroczy
    @BelaKomoroczy หลายเดือนก่อน

    Yes, test it out, go deeper, it is a very interesting project!

  • @AGI-Bingo
    @AGI-Bingo หลายเดือนก่อน +1

    A new golden age of open source is upon us ❤

  • @davidhoracek6758
    @davidhoracek6758 หลายเดือนก่อน

    This only needs to work once and you basically build the universal installer. Soon you just tell a computer "make the latest stablediffusion (or whatever) work on my computer, including all the hardware-specific optimizations that apply to my specific system. Then it just needs to bootsrap in the newest interaction AI for my OS, have a little conversation with the system, try promising settings, and if they fail, come up with others, and (importantly) update the weights of the remote installer system based on the successes and errors of this particular interaction.

  • @scotter
    @scotter หลายเดือนก่อน

    With regard to difficulty of an AI to access the desktop, is there an exception if we are talking about just manipulating a browser window through the use of selenium?

    • @byrnemeister2008
      @byrnemeister2008 หลายเดือนก่อน +1

      You can build tools for an Agent using Selenium as a browser Automator. There is also the likes of RPA apps like Power Automate.

  • @christopheboucher127
    @christopheboucher127 หลายเดือนก่อน

    Of course we want to see more about that ;) thx 4 all

  • @Maisonier
    @Maisonier หลายเดือนก่อน

    This is great! I'm going to wait for a Linux distro that has these agents built-in to automatically configure Wi-Fi, printers, drivers, or even VMs with Windows (for specific programs that don't work in Wine).

  • @ThomasEWalker
    @ThomasEWalker หลายเดือนก่อน

    Cool - This is moving SO fast! I think we will get AIs with the ability to recognize what is on the screen more directly, much like a self-driving car sees the world. This would become 'go click the button that does X', without screenshots. I bet that happens this year. Real world agents with AGI for a Christmas present!

  • @ScottSummerill
    @ScottSummerill หลายเดือนก่อน

    Actually your video, specifically the table, convinced me that agents at least in this interaction are not all that spectacular. They will likely get there but right now it’s a lot of hype.

  • @buggi666
    @buggi666 หลายเดือนก่อน

    Soooo we basically arrived at Reinforcement Learning using LLMs? Thats sounds so awesome!

  • @paketisa4330
    @paketisa4330 หลายเดือนก่อน

    Considering a project where a person documents daily experiences, thoughts, feelings and personal history in a diary specifically for a future AGI’s learning. Do you think such a personalised dataset could enhance an AGI’s ability to understand and interact with individuals on a deeper level? And lastly, is it feasible to expect an AGI to become a close, personal companion based on this method, or would it somehow be redundant useless data? Thank you for the answer.

  • @gokudomatic
    @gokudomatic หลายเดือนก่อน

    Nice, but does it support ollama?

  • @jamalnuh8565
    @jamalnuh8565 หลายเดือนก่อน

    Update us always like this, especially the new research papers

  • @ayreonate
    @ayreonate หลายเดือนก่อน

    I think they set the temp @ 1.0 to test how hard it will hallucinate if given more creative freedom, then added it to the presentation just to show off

  • @oratilemoagi9764
    @oratilemoagi9764 หลายเดือนก่อน

    So which team are u on:
    OS World or Open interpreter 01lite

  • @canadiannomad2330
    @canadiannomad2330 หลายเดือนก่อน

    In Linux there is the xserver.. I've been thinking it would be neat to plug a system into the xserver backend, and have an llm communicate with that directly... Somewhat bypasses most visual interpretation, except what is actually rendered as graphic

  • @user-lb5cp5mw4u
    @user-lb5cp5mw4u หลายเดือนก่อน

    Often restricting model to output code only reduces the accuracy, especially on complex tasks. It's worth trying to allow it to print chain of thought (even better if there is a self-critical inner dialogue loop) and then output the final code piece.

  • @mikey1836
    @mikey1836 หลายเดือนก่อน

    Copilot on Windows already allows control of the OS. For example, you can ask it to switch to night mode and it will.

    • @slomnim
      @slomnim หลายเดือนก่อน

      That's pretty simple compared to where this project is going. Maybe soon yeah Microsoft will have copilot do some of this stuff but so far this seems like the first real attempt

  • @youjirogaming1m4daysago
    @youjirogaming1m4daysago หลายเดือนก่อน

    Taking a screenshot and guessing is an impractical implementation, for desktop agents to truely work we would totally have to create new apis that directly alters the desktop state and best operating system to do this is Linux right now, but if max and Windows also provide them I think then it is possible for agents to make a significant impact

  • @minissoft
    @minissoft หลายเดือนก่อน

    Please do a test. Thanks!

  • @DamielBE
    @DamielBE หลายเดือนก่อน

    hopefully one day we'll get agents like the Muses in Eclipse Phase or the Alt-Me in Peter F Hamilton's Salvation trilogy

  • @NoahtheGameplayer
    @NoahtheGameplayer หลายเดือนก่อน

    I have no idea what is going on, especially of not knowing what agents means, is it like a another word for chatgpt or something else?

  • @ThomasTomiczek
    @ThomasTomiczek หลายเดือนก่อน

    I think a lot of the current problems are training - if GPT-5 is trained on videos from youtube and that includes a lot of videos of people USING THE COMPUTER - the AI may be more prepared for this.

  • @DailyTuna
    @DailyTuna หลายเดือนก่อน

    I think as this evolves it’s time for somebody to create a Linux system that would work directly with this, you need an operating system, catering directly to the agents

  • @johnkintree763
    @johnkintree763 หลายเดือนก่อน

    I want the digital agent in my phone to download my monthly invoice from the electric utility, merge that and other data I want recorded publicly into a decentralized graph representation that is maintained in collaboration with digital agents running in other personal devices to create a shared world model for planning collective action.

  • @Justin-1111
    @Justin-1111 หลายเดือนก่อน

    Let's see it!

  • @nangld
    @nangld หลายเดือนก่อน +8

    20% success rate is super impressive a start. As soon as they iterate on that and train a proper model, it will reach 99%, leading to all office workers getting fired.

    • @andrada25m46
      @andrada25m46 หลายเดือนก่อน

      Yeah prolly not.
      I use AI at work, I’m one of the few who do. A lot of data is confidential and extra security measures are needed, sth like this breaches contractual agreements since the AI provider would have access to the data.
      Not to mention proprietary apps running in containers which the AI wouldn’t be able to navigate..

    • @marcussturup1314
      @marcussturup1314 หลายเดือนก่อน +6

      @@andrada25m46 Local LLM's could fix the data access issue.

    • @WolfeByteLabs
      @WolfeByteLabs หลายเดือนก่อน +1

      This.

    • @stefano94103
      @stefano94103 หลายเดือนก่อน

      @@andrada25m46 All of the big player MicroSoft, IBM, Google all have enterprise software that is data privacy compliant. The price varies with the solution. The only problem with the enterprise LLMs are they do not move at the speed of other models for obvious reasons. But open source or enterprise is the way to go if your company has compliance requirements.

    • @greenleaf44
      @greenleaf44 หลายเดือนก่อน +1

      ​@@marcussturup1314 I feel like people underestimate how possible it is for large businesses to run their own inference

  • @WilsonCely
    @WilsonCely หลายเดือนก่อน

    Please do it! Tutorial for OS world

  • @lilvital
    @lilvital หลายเดือนก่อน

    is apsolytly opensource?

  • @MeinDeutschkurs
    @MeinDeutschkurs หลายเดือนก่อน +1

    Temperature of 0.1 could lead to “I cannot click, I’m just an LLM.”

  • @SimenStaabyKnudsen
    @SimenStaabyKnudsen หลายเดือนก่อน

    Yes! Make a tutorial of it! :D

  • @iseverynametakenwtf1
    @iseverynametakenwtf1 หลายเดือนก่อน

    why not link the project in your description?

  • @waqaskhan-uw3pf
    @waqaskhan-uw3pf หลายเดือนก่อน +1

    Please make video about Romo AI- super AI tools in one place and learnex AI - world's First fully AI powered education platform. My favorite AI tools

  • @tanuj.mp4
    @tanuj.mp4 หลายเดือนก่อน +1

    Please create an OSWorld Tutorial

  • @ayreonate
    @ayreonate หลายเดือนก่อน

    maybe the LLMs are vastly better in the daily and professional tasks because thats whats widely available online aka their training data. while workflow based tasks dont have that much resources. case in point, the example they used (viewing photos of receipts and logging them on a spreadsheet) that wont have the same amount of online resources as daily or professional tasks.

  • @xxxxxx89xxxx30
    @xxxxxx89xxxx30 หลายเดือนก่อน

    Interesting take, but again, trying to go to general. I am curious if there is a team working on a real "AI OS". Not using screenshots and these half-solutions, but actually having predefined built in functions that control the device through code and track the progress in the same way to do the "grounding" step?