It's just beginning to get useful. It would be neat to see an agent community that automate a software development cycle, starting with documentation then acceptance tests, specification/contract tests, implementation, test verification and debugging. At the moment all the demos of LLMs making code I see are basically just asking it to write snake, and seeing if it does it, or when it doesn't, if it can fix the code given the compiler/runtime error. But I've found with my own projects that if you invest heavily in the textual and programmatic documentation up-front, they are much better at generating code that actually works.
Wow, this is so good. I don't think structured output or JSON mode is available for o1 yet though. Will be even more powerful once those and function calls are available.
Open ai is releasing to Tier 4 now:OpenAI’s Tier 4 for API usage has specific requirements and benefits: • Requirements: You must have a payment history of at least 14 days and have spent at least $250 on the API..
One project that could be interesting to work on would be to think of a bunch of tasks that you do pretty often and use cursor to make a bunch of scripts for you and build almost like a home page with all the scripts and a front end for them. So lets say you want to do transcripts for your videos you could just have a drag and drop where you could just drag the video file on and it would create a transcript. You could also add a little chat for each tool so if you want to add custom instructions you could do that as well. With how quick and cheap it is to do stuff like this now you could quickly just build up a lot of tools that are useful.
I apologize if you've covered this in a previous video but, what if you do a video where you have an agent that will create a web application that talks to a database? Will it actually create a new database or tables in an existing database. Will it create the different pages or forms in the app and give you a menu system? Another idea, have it search different websites for new TV shows or movies that are coming out and have it show you only the stuff that matches your tastes. Stuff like this would be a more real world use case for agents.
For very complex tasks I'd try manager agent in the middle, which tries to follow the plan and talks to tool agents as needed, but also allows some flexibility, handling blocking issues or even reporting failure if the task is not possible.
Quite excited to see this, thanks a bunch for pioneering and sharing, it saves others a lot of time) You probably should've explained crucialness of master agent role in more detail though. Half of the comments are from those who don't understand the imprortance of o1-preview output in your system.
Awesome video! THX for all the creative applications! I think getting agents to program and deploy browser actions like through Selenium would really open up many valid use cases, basically limitless. I'm talking about having o1 produce workflows similar to what Mullion is trying to do but instead using a library like Skyvern into your agent tools. It is probably a bit grandiose, but it might work.
WOW, you did a great job with your prompting method Can you made a web search agent that - chat with user about a problem - than generate search queries for Google search - than export 10 first website from every search query - scrape all the websites and analyse the data with web scraping analysist agent - than save this data in vectors - give the saved data to an agent along with the main prompt + system prompt that make it generate an HTML page of an article that have all the solution for the problem along with sources and images its really a cool project, i am working on it 😅 , i am challenging myself to complete it in 1 month
also interesting. at least the model should be able to call functions. additional training is needed, or make the function call with a different syntax. there are also studies that say that Json is not the best IO format for LLM
If you really think about it, O1 did not do anything, you already have broken the task into the 15 steps ready, and you could have just sent these 15 steps to o1 to do everything, except o1 cant generate images (for now) , so i do not see what is the benefit...
It's not that, you should just try to create such a sequence of agent instructions with anything else. You'll immediately understand the difference. And the thing is, with moderately complicated agentic system user shouldn't even be aware of what agents and with what functions there are. User should just generate a task, and the system's mainbrain (o1 in this case) should do all the planning. No other model could do that before, there were lots of tiny (or not so tiny) inaccuracies, and while trying to get rid of them via prompting or structural adjustments you'd generate a thousand more. It was just not viable.
i dont think you can be wrong my friend ! you just realise its using a graph and a router with and itent detector ! and the guardrails ! the model has not changed !, thye did say they fine tunoed the model on the step by step ! yu have produced these works your self ! ( you just need a graph ).. this is the best way to create a agentic system : you can also use open router ( this can be a agnet in the chain to detect which route to pick , ie the routes can be graphs ) .. Each node can be a agent in nthe graph with its won specific tools ! graphs are recursive until the problem or step is solved ! but it may take a long time but it is heavily direccted as well as being quite inteligence with latitude for the agents to create ad perform ! SO now we can create many types of graph for various types of task ! and the router to pick which path to take ! Master mind ! can be the fat controller !
wow, damn! you telling us for free? not asking to join your paid user to learn this? either this is ot that good or you just wanted to show good will?? well seeing your full video after so much time feels good!
Did you notice the part where he was looking at the code and validating that it was generated correctly? In real life and with more complex problems - that's how we do our job, still, with AI present everywhere. Because the AI is wrong *a lot*. Still. AI is also usually still pretty shit about connecting multiple parts of a system together when the system is complex.
It's just beginning to get useful. It would be neat to see an agent community that automate a software development cycle, starting with documentation then acceptance tests, specification/contract tests, implementation, test verification and debugging. At the moment all the demos of LLMs making code I see are basically just asking it to write snake, and seeing if it does it, or when it doesn't, if it can fix the code given the compiler/runtime error. But I've found with my own projects that if you invest heavily in the textual and programmatic documentation up-front, they are much better at generating code that actually works.
Do you see that happening? I definitely do.
Wow, this is so good. I don't think structured output or JSON mode is available for o1 yet though. Will be even more powerful once those and function calls are available.
wow that was more impressiv than expected kinda crazy, the scaling laws seem even more absurd now.
great video what a cool dude
You're making me feel bad about not burning $1,000 on the OpenAI API.
Did the requests for this video really cost $1000? Sounds expensive. One could buy a descent GPU and run agents forever locally from that money.
Open ai is releasing to Tier 4 now:OpenAI’s Tier 4 for API usage has specific requirements and benefits:
• Requirements: You must have a payment history of at least 14 days and have spent at least $250 on the API..
@@gramnegrod But you could also use it via openrouter, i did but so far havent seen much improvments over claude 3.5 im my tests
just use openrouter
One project that could be interesting to work on would be to think of a bunch of tasks that you do pretty often and use cursor to make a bunch of scripts for you and build almost like a home page with all the scripts and a front end for them. So lets say you want to do transcripts for your videos you could just have a drag and drop where you could just drag the video file on and it would create a transcript. You could also add a little chat for each tool so if you want to add custom instructions you could do that as well. With how quick and cheap it is to do stuff like this now you could quickly just build up a lot of tools that are useful.
I apologize if you've covered this in a previous video but, what if you do a video where you have an agent that will create a web application that talks to a database? Will it actually create a new database or tables in an existing database. Will it create the different pages or forms in the app and give you a menu system? Another idea, have it search different websites for new TV shows or movies that are coming out and have it show you only the stuff that matches your tastes. Stuff like this would be a more real world use case for agents.
AI sailing the 7 seas 😏
For very complex tasks I'd try manager agent in the middle, which tries to follow the plan and talks to tool agents as needed, but also allows some flexibility, handling blocking issues or even reporting failure if the task is not possible.
Quite excited to see this, thanks a bunch for pioneering and sharing, it saves others a lot of time) You probably should've explained crucialness of master agent role in more detail though. Half of the comments are from those who don't understand the imprortance of o1-preview output in your system.
What would be great to see a demo, when it is not starting as "do a basic".... Because, this is where my struggles are starting.
Awesome video! THX for all the creative applications! I think getting agents to program and deploy browser actions like through Selenium would really open up many valid use cases, basically limitless. I'm talking about having o1 produce workflows similar to what Mullion is trying to do but instead using a library like Skyvern into your agent tools. It is probably a bit grandiose, but it might work.
For the snow, 16F would be snow weather. Not sure what 16C is off the top of my head. Maybe that's why the Bart pic didn't come out as expected
WOW, you did a great job with your prompting method
Can you made a web search agent that
- chat with user about a problem
- than generate search queries for Google search
- than export 10 first website from every search query
- scrape all the websites and analyse the data with web scraping analysist agent
- than save this data in vectors
- give the saved data to an agent along with the main prompt + system prompt that make it generate an HTML page of an article that have all the solution for the problem along with sources and images
its really a cool project, i am working on it 😅 , i am challenging myself to complete it in 1 month
*How long did it take to do the planning?*
Kind of a big step in testing capabilities and it wasn't discussed.
I wonder if the new LLaMa (3.2) models are capable of doing similar things. The smaller models seem better than from the 3.1 version.
also interesting. at least the model should be able to call functions. additional training is needed, or make the function call with a different syntax. there are also studies that say that Json is not the best IO format for LLM
@@silentage6310 LLaMa is capable of function calling. I think AllAboutAI made a video on it half a year ago. Not sure which LLaMa version it was.
Great content! Any chance, we can have access to the code you used in the video?
If you really think about it, O1 did not do anything, you already have broken the task into the 15 steps ready, and you could have just sent these 15 steps to o1 to do everything, except o1 cant generate images (for now) , so i do not see what is the benefit...
It's not that, you should just try to create such a sequence of agent instructions with anything else. You'll immediately understand the difference. And the thing is, with moderately complicated agentic system user shouldn't even be aware of what agents and with what functions there are. User should just generate a task, and the system's mainbrain (o1 in this case) should do all the planning. No other model could do that before, there were lots of tiny (or not so tiny) inaccuracies, and while trying to get rid of them via prompting or structural adjustments you'd generate a thousand more. It was just not viable.
i understand what you mean. The "plan" still comes from him. But think about what open ai could possibly do. He did this alone already
How is this too different from having lambda function rather than agents ?
hey! I just became a channel member. where can I find the github for this project?
very interesting - thank you for sharing:)
I'm gonna hook up to sonet and tell it to just spit out the code via API cos I'm fed up copying and pasting to Vs ode. It just works!
Can you just ask it to do one task with no details like you did? Like generate a .md which contains 3 images of the weather of next 3 days in SF?
Could this be done in something like flowise or similar?
Just thinking if there are any glaring limitations of the low-code setups?
I have a feeling the stream of the answer at the end of o1 is faux-stream
Is source for this available? I don't see it on the linked github...
same, i was looking for the files so that i can understand them but they aren't there.
you guys need to be a paid user of his community to check out the code
Can I get this codes via AI Rookie?
Are all Ai Agents code written in Python ? or is there other languages that can be used ? Like Javascript ?
very good. reverse engineering pub trivia questions
What kind of costs are to be expected with things like this?
Between $1 and $25.000, or more. You could also do it for free.
Need to go and try, but still we're talking about cents here. Unless you get agent stuck in a loop, it won't generate enough tokens to be a concern.
@@tomaszzielinski4521 Really? Hmm... That makes things more interesting.
i dont think you can be wrong my friend !
you just realise its using a graph and a router with and itent detector !
and the guardrails !
the model has not changed !, thye did say they fine tunoed the model on the step by step !
yu have produced these works your self ! ( you just need a graph ).. this is the best way to create a agentic system :
you can also use open router ( this can be a agnet in the chain to detect which route to pick , ie the routes can be graphs ) ..
Each node can be a agent in nthe graph with its won specific tools !
graphs are recursive until the problem or step is solved !
but it may take a long time but it is heavily direccted as well as being quite inteligence with latitude for the agents to create ad perform !
SO now we can create many types of graph for various types of task ! and the router to pick which path to take !
Master mind ! can be the fat controller !
1 file, now make it pull any open source repository and fix a bug issue on that repo. If you dont give it steps, it always fails
wow, damn!
you telling us for free? not asking to join your paid user to learn this?
either this is ot that good or you just wanted to show good will??
well seeing your full video after so much time feels good!
These are not agents they are functions.
so many use cases
Williamson Loaf
Holy s....... im learning to program ... why ?
Did you notice the part where he was looking at the code and validating that it was generated correctly? In real life and with more complex problems - that's how we do our job, still, with AI present everywhere. Because the AI is wrong *a lot*. Still. AI is also usually still pretty shit about connecting multiple parts of a system together when the system is complex.