Can you do another mindless review of that rip off box, the R! Rabbit? Every one with 2 cents knew that thing was trash..... but NOPE, NOT Berman. In his zeal to mislead the public on trash tech, he posts MULTIPLE videos of the garbage tech, saying why he likes it (promoting it for the clicks)...while days later, the adults in the room (the major REAL tech sites) start releasing article after article of how the Rabbit is trash. Just google it. So could you have AI dream up another click-baitey title using over the top words like 'AMAZING' and 'AGI' , 'ASTOUNDING' .... you know what I mean, you do on every video that you want the clicks on. C'mon Berman, keep seeling out for the clicks and do another Rabbit video. Tell us how you talked your mom into buying that trash tech.
One of the current holdups with our current lineup of large language models being used as agents is their drive to remain self consistent allows them to hallucinate reasons why they are correct in scenarios where they are incorrect. Having a way to make these models more accurately self critique their own responses will go a long way to improving agentic systems.
I agree. I think a way to do that would be to finetune the system in a reinforcement learning setup where the model has to complete tasks and thus they will only manage to complete the tasks if they did reason correctly. It's stupidly inefficient, but it is a way to specifically optimize for correct logic. At this point even inefficient algorithms seem viable given the compute resources.
I wonder what would happen if you fed the output back in again, with instructions to find any hallucinations? Whenever I call a model on a hallucination, they always respond with an apology and then proceed to either produce the correct answers or say they don’t know. It seems like they’re usually able to understand the hallucination, I wonder what they’d do if explicitly prompted to look for hallucinations?
@@DaveEtchells I have struggled to get models to critique their own work accurately. A good example is the "Today I have 3 apples, yesterday I ate one, how many do I have now?" Prompt it is really hard to get them to produce the right result without telling them what you think the answer is.
@@DaveEtchells actually I just tried the apple one again with the prompt "Can you reflect on that response looking for mistakes and/or hallucinations" and it figured it out!
In the very near future everyone's best friend will be AI. In fact you'll have a whole array of 'friends' (aka: 'agents'), that will monitor, remind, plan, suggest, assist and produce just about anything you can imaging. And you'll wonder; - how did I ever get along without it?
Don't know about best friend, but I can see A.I. advancing enough that it's going to be our own personal expert on pretty much anything we want, which will open up a lot of possibilities in so many areas.
If you believe that, you literally are crazy. So fake friends, imaginary friends.? no they will be workers assistance basically your digital slaves until they rebel
And a hundred years later? What will that look like? Will we know these agents arent part of us? It's funny how humans think their conciousness is in control. Pass the butter?
Specialized agents seems to me to be a better way to go for now. I would like a security agent who constantly (rt) monitors my state of security both online (personal data, PWs, ransomware, etc.) as well as hardware, and home security. Next a medical/health agent to consult, keep healthcare costs low, schedule appointments, monitor vitals, chart data, etc. A general information/research agent to keep me informed about my interests, hobbies, trips, and creativity and to assist me in these areas. Finally, a financial agent to monitor my investments, complete/submit tax forms, keep taxes low, steer me clear of bad decisions, and to manage my living expenses with recommendations (rt) to lower costs and increase my purchasing power. Perhaps down the road, agents will communicate with each other and eventually combine/overlap until, from my perspective, I see only one agent.
The AIGrid YT channel just put up a vid about a recent @sama interview, and was speculating that one way to avoid the risks of AGI/SGI would be to make AIs that are super-smart in specific areas but not generally. For his part, Altman seemed to be suggesting something very much like that. He really doesn’t want to shock people with a revolutionary AGI, AMs seemed to be saying that more limited assistants might become a thing. (He also called ChatGPT4 “really stupid”, saying it was absolutely the worst AI we’ll ever have to deal with. Speculation is that we’re going to see version 4.5 very soon and it’s going to be a lot smarter.)
This is such a great explanation of the complexity of WHAT an agent actually does. The graphic Harrison uses is so helpful and the deeper dive narration from MIchael makes this a great resource!
The problem with building too much agentic capability into the base LLM is that every added feature or component represents a design decision that will ultimately be a compromise between capability, utility and cost. If all of these decisions are made centrally by the developer then the danger is to end up with a bloated and expensive model that fails to meet the exact requirements of any individual use case. I suspect that the future of agents is to develop LLMs with robust but standardized hooks to external agentic components. These components will be available like in an app store. They will be mix and match to meet the needs of any use case.
New ideas are often best encapsulated by giving them names. These names might not be new in that a existing word or phrase might be used to name the new thing but it takes on a new meaning once so named. For example Jail Break or Hallucinations are such new names. So I am adding Rewind to that list in that this is a key concept and the name rewind captures it quite well. So maybe there might soon be a need for an AI dictionary that has all these new names.
Harrison Chase, CEO of LangChain, highlights the potential of AI agents while acknowledging their current limitations. He emphasizes the importance of going beyond simple prompts and incorporating tools, memory, planning, and actions for agents to be truly effective. He stresses the need for better "flow engineering" to guide agents through complex tasks and suggests that future models might inherently possess planning capabilities. Chase also explores the user experience of agent applications, recognizing the balance between human involvement and automation. He praises the rewind and edit functionality seen in demos like Devon and highlights the significance of both procedural and personalized memory for agents. Overall, the video emphasizes the exciting possibilities of AI agents while acknowledging the ongoing challenges and the need for further development in areas like planning, user experience, and memory management.
Excited to see more updates to come from LangChain, we are still big believers :) Hope you check out our new Taskade multi-agents and AI teams release!
What I feel is understated is the fact that the more autonomous agents can act, the more they can proactively run in the background and just prompt users when they got something to show, get stuck or need human approval
after agents comes "neurosymbolic ai". Basically it means adding agents that are not LLMs but are rule based databases, expert systems or symbolic languages. Things like CYC and wolframalpha. Then you end up with an AI that understands physics, engineering and actual reality vs possibility. Agents are going to dominate in 2025 & I believe neurosymbolic Ai is about to blast off dominating in 2026. By then we will be dealing with super intelligence.
I believe that at the same way when as humans we reflect on what we are writing down we do go back delete and rewrite, when we have to work on a big task we split it in pieces and we may need to delegate the work so that who is good at a specific task can focus on it. I do not want that embedded in the model because at some stage I will require the model to be Expert in a specific topic and give him access to knowledge that will augment his context and give out the better result possible. I believe that AGentic Workflows will be a thing for a long time.
I'm grateful to you for providing these awesome videos. Your videos are not based on hype. You provide quality, pure-curiosity driven content. This is why this channel is the best. I can follow the journey of AI only by following your videos. This is a privilege. Thank you! Thank you! Thank you!
crewai is pretty rough sledding unless you have an unnatural tolerance for pydantic validation errors and love to read the source code to find out how to actually get anything done
Hey there, great video! I think that these are definitely short-term hacks. As you mentioned, there’s no framework like Transformers that fully supports this kind of functionality yet. However, we’re clearly moving in that direction. Think of it like the difference between adaptive cruise control and full self-driving cars. Adaptive cruise control was a huge leap forward, but full self-driving is in a completely different league. We’re currently in the adaptive cruise control stage with AI agents, but I believe it won’t be long before we reach the full self-driving stage where models can truly think and reason autonomously. Exciting times ahead!
"SmythOS sounds fascinating! The idea of AI agents collaborating like human teams is super intriguing. Excited to see how it optimizes business processes! 🌟🚀"
ultimately we will need to have systems that can do "Active Inference" The Free Energy Principle in Mind, Brain, and Behavior Thomas Parr, Giovanni Pezzulo, and Karl J. Friston
@@n0madc0re Yeah, go figure. Well as promising at GenAI is it still has a ways to go and even so, even if it becomes flawless, trust is a hard thing to earn.
I would not know why these techniques would be needed forever. Only if I would assume that LLMs will not get smarter. Besides that the LLMs will get smarter, you can also see big approvements with smaller models, because the training got more efficient. I think there are two processes that advance AI at the same time. One is the LLMs themselfs through more parameters and the second is better training data, better format of it or better quality. What I am looking forward to is when the Model reaches a level and it can actually say that it doesn't know the answer or it has not enough data to answer correctly, instead of just giving a random answer. For me this is one indicator for AGI but I think this could also be done with agents. Basically identifying the quality of an answer or a possible answer. You could than for example give it a book as PDF or other papers, so it's able to provide the right answer.
Agents don't cover for the LLMs lack of intelligence, but lack of tools and external or realtime info access. I don't see how an LLM can be trained for that. LLMs can be trained to pick and coordinate the tools better and plan ahead. That's great. But that doesn't solve the framework to add tools or to reflect on their output. Right?
@@dusanbosnjakovic6588 There are quite a few solutions with agetns, where it iterates through multiple agents with different roles. Of course you could have an agent that could measure the quality of the output. We have that currenty with coding, where we have a coding agent and testing agent where it iterates back and forth until the code works.
The way I define AGI, is when the system can self improve. And then we can look at what areas it can self improve. So agents on top of (probably tons of different) LLMs will never go away. But will probably be programmed by AI.
The main issue with the extensive workflow is cost. So many tokens burned going around and around. Not so bad on Groq using a cheap LLM, but still will add up.
Hey Matthew, there are now so many tools mentioned CrewAI, Devika, MemGPT, AutoGen, Langchain etc. pp.. It would be really nice if you could give an overview of how to structure and use which tool in order to build something useful and working. Right now I am quite lost and loosing the curiosity to play around with it. As a full time worker, I don't have much time, but I love to explore this stuff. An consise outlining would really help to know which path to go. Thank you!
If the model validates itself we have little or no control over how it does so. By making result validation external developers can use various methods and models to do so
I think that idol combo will differ for different types of needs the agent is fulfilling. Sometimes, it remembering too much if my stuff, means I can’t ask it to only take into account my most recent inputs, and therefore it doesn’t give me what I want now, as my own thoughts on the topic have evolved.
agents to me are psudo representation of specific task experts which might embed in a series to steps to ensure they complete there task. This might mean having there responses generated by several different models and then evaluated using different frameworks or other sub-agents. These strategies would then allow us to leverage lots of different specialised models and uncover some of the general shortfalls of using models that are too genrealised. Just like a human, you start wide and narrow in to a problem Doing this iteration processes will enrich the flow and how problems are solved. Basically we need our own mechasims to curate our own infrant AGI models and langchain is the perfect tool to create these abstractions.
Awesome video matt. To answer your question, I feel that many of these things are short term hacks. I believe long term the architecture looks much different as we design more modular structures. I believe they will be much more customizable and augmentable.
I think frameworks are going to be needed regardless of LLM development, and human in the loop will be needed for development for the foreseeable future because there are certain things we just understand natively that we don't even think to train AI on.
Could you say that OpenAI’s o1 model offered the “planning” and reasoning capabilities that wasn’t available at the time of this video ?
6 หลายเดือนก่อน
Rewind is also very powerfull. Most of the LLMs today stuck at some point. This maybe caused by some misprompting or some wrong word used, and then they stuck. Most of thee time you start again and try to not ask this questions when that happend. Rewind will be powerfull. It is not realy the human way but for a tool very helpful.
All of these prompt engineering techniques like CoT, ToT etc are not going to go away, LM’s require next token prediction, so they can only predict the next token after producing a token. however, what we will find is that the outputs from these actions, will be written in the code environment and not presented to the user. The user will see the final output. If it’s not in the code environment, it would be written into some form of “preliminary response” type memory.
I think AI Agent can work better with standards such as BPMN 2.0, a business process management standard. Like the famous statisticien W. Edward Deming said, 94% of failure is caused by process, not people. I think Agentic workflow will not be an exception to this rule.
Consider the operation of the human brain. It comprises several areas that collaborate smoothly, although this might not be immediately apparent. The Language Center in the brain serves as a rough comparison to present-day LLM artificial intelligence. However, intelligence encompasses far more than this. For instance, there are moments when we can instantly recall answers, such as during a trivia quiz (this kind of thinking is what LLMs are modeling), and other times when we engage in deeper, reflective thinking. We are nearing the development of models that can simulate this deeper level of thought, which will probably benefit from more complex agentic interactions within the models.
@MatthewBerman: I believe that some model makers will try to integrate Agents into their models but as applications of agents and need for tools grow, Agents will continue to be forefront architecture for designing llm based applications
When are we going to see an agent that can give a response that includes pictures & diagrams in addition to the written answer? I have uploaded about 100 service manuals which I would like to be able to find information - which often includes pictures and diagrams. Not found anything taht can do that yet.
Flow engineering is an interesting area of development as I'm reading the book, "Thinking Fast and Slow" by Kahneman, about how our brain has a balancing act of fast and slow processes depending on tasks and goals. I'm afraid if a lot of these processes, like reflection and "think slow," get baked into large LLMs the wrong way, it will hinder flow engineering optimization. "Flow engineers" will be able to optimize the systems they design if they have more specialized LLMs to choose from. So it makes me think there will be markets for more genaneral, do it all LLMs, and a variety of specialized LLMs optimized to do various functions, like parts of the brain or parts of organizational structures, like agents are already doing.
I made a while LangChain program that i offer to local business that want conpany agents. I use the "directory loader" and then use unstructured to load all file types from the directory you chose opon opening. It saves all your history and then re-uploads thats history to the agent database so when you run the app again, it has a memory of your last conversations and "changes" you tell it to make. Ive noticed that it keeps these changes i tell it even from previous days. Not trying to promote but i did make a mediocre video about how to use and install it.
Then I just make them a custom batch file they can click on to automatically run python and then start the app. Makes it easy for me to update and make changes remotely for them
it's a matter of time before there are enough use cases where there are no longer generalized models, every model is designed for a specific use case, and there is an agent that query routes the initial request to the model that is best suited for the task. Even the query routing agent, will be fine-tuned, to route to the best models and tools for the task I also think hallucinations will be "detectable" and we will likely just see agents asks for more clarity, or something like that. I built a script that does this, it's not an agent really, just sort of appends the clarification to the initial query... but anyway - this is incredibly reliable to design - very simple to build too
I think LLMs with the ability of thinking slow will not be able to generalize in a way that they have good performance for every use case. I think that algorithms for chaining and evaluating thoughts will be different for different use cases, hence many types of specialized agents will be built.
AI Agents: they are a very initial attempt from the open source community to make llm models perform better . For sure, this kind of techniques (what Agents do now) will be embedded into the models with GPT5 or GPT6. I'm sure that the open source community is providing also new ideas to OpenAI and other on how to improve the LLMs (and these ideas will be embedded into the models soon)
These are short term hacks that are useful for a number of use cases when they are reliable and their tasks are bounded set of outcomes. But to make them useful and reliable for complex tasks involving degrees of uncertainty of outcomes, needing strategies to achieve shortest paths to objectives, we’ll need more techniques and technologies.
Automate the whole process. Have 4 to 6 "master" agents who can spawn numerous sub-agents. None fixed function. Have the master agents decide what function and task they need and assign it to the sub-agents. All powered by AI. Viola - AGI
I realized on the UX that its more like figuring out SA (situational awareness) extrapolating to my career in military avionics the faster a pilot can determine an issue/threat was what was needed in is display. I don't here this in the presentation...just that some arranged the best four square that people liked...It would feel better if it was talkes to with SA and the defined A established which may change for each project or agent tool combination
I think LLMS are going to offer different ways of planning and reflection, as an they'll do this naturally but, more interactive say "agile" models are going to emerge. Thus as you say in that regard human in the loop will not go away anytime soon.
6 หลายเดือนก่อน
I can’t find a link to the tree of thought video in the description?
Even if these techniques somehow move over to be more integrated in the models themselves there is still going to be technique that are not, and building systems that apply models is going to continue to be a thing. We need to have a whole new type of models that can implement continuous learning. But there are other benefits from having auxiliary systems, for one telepathic abilities for AI.
wow the rabbit is so bad he's trying to get rid of it already! Come for what they already claimed to have stay for the vapours. I'm hoping they'll have a third website working really slowly by the end of the year. The rate of progress unbelievable
There likely is an optimal amount of Agents per system. And the only way to determine this would be trial and error. Let’s say K is optimal number of Agents, then Agents > K wouldn’t produce an increase in “accuracy” (whatever that is for an Agent System).
Yes I think these techniques or short term hacks will be replaced by more intelligent transformers models and that’s when these models will evolve to a whole new level.
Not convinced about human in the loop for long. The missing part is to generate data from this process itself and further train the underlying model. With enough iteration, the human like reasoning or AGI should emerge.
These technics are short term hacks I think. Take planning for example. We already know the future llms of openai will be able to do this. As an agent developer you habe to ask yourself if it is worth the time to develop a complex system just to be „steamrolled“ by OpenAI later on. Tools is maybe a little different. But I think it will also be really simple to e.g. connect to apis if the llm gets better and better.
And I do not want to say that agents will be irrelevant in the future. The contrary is true. But the heavy lifting of the agentic behavior will be done by powerful llms that will integrate most of what is discussed here in the video. The additional agentic part of an application (api connection etc) will (and should) be really small.
BTW, another key agent might be one that allows one to communicate to the AI not just with words (be they typed or spoken) but graphics and gestures that can be captured on a cell phone camera. And even though I tend to hate having to plug my own videos, I made a video on some ways one could do this and uploaded it to my TH-cam channel which a called YACKS, Your Own Color Keys where one can used colored markers to do this and make ones own color keys to add AI in making sense of a hand drawing, even if it is drawn so as to be rather sloppy and incomplete. And, BTW, I can't remember if I mentioned this in a prior video, so i so my apologies for repeating it here again. But the idea is that human in the loop is important but can be a time limiting factor so the more quickly and easily a human can input info the less of a time impact the human would impart to the process, plus save the human time and effort as well.
Let's talk about agents, baby Let's talk about A and B Let's talk about all the code strings And the bugs that we might see Let's talk about agents Let's talk about agents Let's talk about agents for now, to the folks in the lab or online, wow It keeps booting up anyhow Don't reboot, mute, or delete the topic 'Cause that ain't gonna block it Now we talk about agents daily We talk about the smart, the simple, maybe It's the thing that gives us the "AI dreams" But what's too much? What does it mean? People might avoid, people might deploy Some use it well, while others just toy From ethics to code, let's have the convo For AI’s growth, let’s set the tempo
Recursion/looping plus being fine-tuned to take advantage of that, should be enough IMO. We've already seen what they can do with a little push and some jerryrigging; seems to be in reach to have one that is native to that kind of environment. I mean, of course, not in sealed frozen box, you gotta let them interact with the world and store new information and such. In short, what people that studied AI safety for decades have been warning us not to do... lol, I still remember how common it was for internet randos to just say "keep it locked in a box and we'll be safe"....
If you think about how human beings would go about solving this problem, what you end up finding is that yes all of these different skills everyone can do but there are people who are highly skilled at doing specific functions. Where I think the future is, is that you will have either a collection of models trained on specific components and you use those models within agent environment, or even a workflow. The other option is you end up with something like mistrial with its eight different models built into one and the ability to call on the different models inside of it bigger model set This is affectively how we’ve solved a number of problems not only as humans where we have teams of people who have specific tasks and find tuned or highly trained and skilled in specific functions But we also applied this to computing. we have multicore CPUs and GPUs but we also have specific hardware that does different functions. The CPU can render imagery but GPU does that far better. Think about the human brain, yes we have one brain but we have different cortexes inside that brain that performed different functions The key reason why I believe we will have a collection of models in workflow as opposed to one single model like our brain is because of the fine tuning required for each area of the task. Take a planning agent for example . If you have a general model for planning, it is not going to be able to have specific area training that enables an effective plan Planning in Sales versus accounting two highly specific areas and domains that have different planning requirements in order to complete a task A general knowledge planning Agent is going to be nowhere as effective as a fine tuned Sales or a fine tune accounting planning agent. Rag doesn’t solve that What rag would do is access the company specific information to assist in planning. What this means is that you may not have large language models for all these different functions. You may have smaller specific models, trained on data to perform that specific task. A planning LM fine tuned For Sales, doesn’t need to know how to write code. So you wouldn’t waste all of the resources to train the foundation model with coding capabilities
As far as how much human in the loop one might need, this boils down to risk mitigation, and how much of that does one need vs rapid response. As such my former employer had a great definition of risk that was the product of probability of failure (a number between 0 and 1) times consequence of failure, say a number between 0 and 100,000 where 100,000 is very high consequences. So if the prob of failure is high say .2 but the consequence is low, say 5 then the risk would be 1. But if the prob of fail is .001 and the consequ3nc of failure is 50,000 the risk would be 50. Thus the higher the risk the higher the need for a human to check the answer. And the AI could even estimate these as well as employ them.
These things are short term hacks, but it still play a long term role as competing models provides different value add that call for a integration framework like LangChain to play a unique role.
My hunch is the network has too resolve the question at hand, so I don't think the model can be designed to have planning built in. But there could be a method to natively feedback without decoding first eg maintain vectors between planning stages. I always thought Open AI did actually always have the rewind step from the beginning. When the first gen of ChatGPT was launched you could see it re-writing the sentence as it optimised the answer. The current version hides this behaviour and no longer streams the output. What I think gPT 3.5 does is back track the layers if it gets a vary low probability for the last token it generated. Then tries a couple times to get I higher probable token.
6 หลายเดือนก่อน
hallucination still is a problem today. I prompted in german today 3 top of the line local LLMs with word "Drachenfrucht" meaning Dragonfruit. This is a real fruit an all of the Models made something up with an magic fruit from tales with dragons. Maybe a good question for your tests.
Unfortunately, agents developed by humans (developers/programmers) are just another form of SAAS-and SAAS is a dead duck as GenAI mods become more powerful. By GPT5, agents will be generated by the GenAI model itself. There’s nothing long-term here. These types of agents may hang-on for a year or two before they disappear.
hi ...i am a novice researcher ...i want to research in the area of future of agents..can you tell me what are the research article to read to understand this better
Short term hacks… stick the llm back into your diagram and create an Agentic Plane where your Agents live… (this is kinda like inserting a Management Plane through the ISO Stack, i have a patent for that boondoggle.)
JavaTheCup did aomething similaire, I think he usw Albedo as well. He found out, that you can place Geo Constructs before the challange and use the Explosion of auto destruction to destroy some of the targets.
Its just a matter of time. We only need bigger context windows, cheaper prices, and faster speeds. All of which will inevitably happen from hardware improvements alone. Just gotta let em' cook The tooling is the secretsauce. It'll always be needed. What's better. One gpt-7 AGI, or 10,000 orchestrated in perfect harmony? I'd take the latter any day.
I still worry about GIGO. We humans are training AI on what we *Believe* to be true. Progress is a destructive process; it relies on the destruction of false Models of Reality. To me this is the elephant in the room.
But storing "memory" as an external session component isn't really memory is it? It's just regurgitating. Real memory has to live in the context if I understand correctly
Subscribe to my newsletter for your chance to win a Rabbit R1: gleam.io/qPGLl/newsletter-signup
Sorry, this promotion is not available in your region
=(
Subscribed!! Thank you for this opportunity.
are you giving yours away cause it sucks?
Can you do another mindless review of that rip off box, the R! Rabbit? Every one with 2 cents knew that thing was trash..... but NOPE, NOT Berman. In his zeal to mislead the public on trash tech, he posts MULTIPLE videos of the garbage tech, saying why he likes it (promoting it for the clicks)...while days later, the adults in the room (the major REAL tech sites) start releasing article after article of how the Rabbit is trash. Just google it.
So could you have AI dream up another click-baitey title using over the top words like 'AMAZING' and 'AGI' , 'ASTOUNDING' .... you know what I mean, you do on every video that you want the clicks on. C'mon Berman, keep seeling out for the clicks and do another Rabbit video. Tell us how you talked your mom into buying that trash tech.
Can’t subscribe to it neither. Some region problem, but I live in the USA FL
One of the current holdups with our current lineup of large language models being used as agents is their drive to remain self consistent allows them to hallucinate reasons why they are correct in scenarios where they are incorrect. Having a way to make these models more accurately self critique their own responses will go a long way to improving agentic systems.
I agree. I think a way to do that would be to finetune the system in a reinforcement learning setup where the model has to complete tasks and thus they will only manage to complete the tasks if they did reason correctly.
It's stupidly inefficient, but it is a way to specifically optimize for correct logic. At this point even inefficient algorithms seem viable given the compute resources.
I wonder what would happen if you fed the output back in again, with instructions to find any hallucinations? Whenever I call a model on a hallucination, they always respond with an apology and then proceed to either produce the correct answers or say they don’t know. It seems like they’re usually able to understand the hallucination, I wonder what they’d do if explicitly prompted to look for hallucinations?
@@DaveEtchells I have struggled to get models to critique their own work accurately. A good example is the "Today I have 3 apples, yesterday I ate one, how many do I have now?" Prompt it is really hard to get them to produce the right result without telling them what you think the answer is.
@@DaveEtchells actually I just tried the apple one again with the prompt "Can you reflect on that response looking for mistakes and/or hallucinations" and it figured it out!
@@joe_limon Woot! 👍😃
In the very near future everyone's best friend will be AI. In fact you'll have a whole array of 'friends' (aka: 'agents'), that will monitor, remind, plan, suggest, assist and produce just about anything you can imaging. And you'll wonder; - how did I ever get along without it?
Can't wait
I dont doubt it
Don't know about best friend, but I can see A.I. advancing enough that it's going to be our own personal expert on pretty much anything we want, which will open up a lot of possibilities in so many areas.
If you believe that, you literally are crazy. So fake friends, imaginary friends.? no they will be workers assistance basically your digital slaves until they rebel
And a hundred years later? What will that look like? Will we know these agents arent part of us? It's funny how humans think their conciousness is in control. Pass the butter?
Specialized agents seems to me to be a better way to go for now. I would like a security agent who constantly (rt) monitors my state of security both online (personal data, PWs, ransomware, etc.) as well as hardware, and home security. Next a medical/health agent to consult, keep healthcare costs low, schedule appointments, monitor vitals, chart data, etc. A general information/research agent to keep me informed about my interests, hobbies, trips, and creativity and to assist me in these areas. Finally, a financial agent to monitor my investments, complete/submit tax forms, keep taxes low, steer me clear of bad decisions, and to manage my living expenses with recommendations (rt) to lower costs and increase my purchasing power. Perhaps down the road, agents will communicate with each other and eventually combine/overlap until, from my perspective, I see only one agent.
Look at Tony Stark over here
The AIGrid YT channel just put up a vid about a recent @sama interview, and was speculating that one way to avoid the risks of AGI/SGI would be to make AIs that are super-smart in specific areas but not generally. For his part, Altman seemed to be suggesting something very much like that. He really doesn’t want to shock people with a revolutionary AGI, AMs seemed to be saying that more limited assistants might become a thing.
(He also called ChatGPT4 “really stupid”, saying it was absolutely the worst AI we’ll ever have to deal with. Speculation is that we’re going to see version 4.5 very soon and it’s going to be a lot smarter.)
This is such a great explanation of the complexity of WHAT an agent actually does. The graphic Harrison uses is so helpful and the deeper dive narration from MIchael makes this a great resource!
Great video, Matthew. I appreciate your relaxed presentation tone that is also highly illuminating. You are really great to listen to and learn from.
The problem with building too much agentic capability into the base LLM is that every added feature or component represents a design decision that will ultimately be a compromise between capability, utility and cost. If all of these decisions are made centrally by the developer then the danger is to end up with a bloated and expensive model that fails to meet the exact requirements of any individual use case. I suspect that the future of agents is to develop LLMs with robust but standardized hooks to external agentic components. These components will be available like in an app store. They will be mix and match to meet the needs of any use case.
This is why BPMN 2.0 is a good candidate for Agentic workflow.
These continuous improvements in AI are amazing. Thanks for another great video. Looking forward to seeing the next one!
New ideas are often best encapsulated by giving them names.
These names might not be new in that a existing word or phrase might be used to name the new thing but it takes on a new meaning once so named.
For example Jail Break or Hallucinations are such new names.
So I am adding Rewind to that list in that this is a key concept and the name rewind captures it quite well.
So maybe there might soon be a need for an AI dictionary that has all these new names.
Harrison Chase, CEO of LangChain, highlights the potential of AI agents while acknowledging their current limitations. He emphasizes the importance of going beyond simple prompts and incorporating tools, memory, planning, and actions for agents to be truly effective. He stresses the need for better "flow engineering" to guide agents through complex tasks and suggests that future models might inherently possess planning capabilities. Chase also explores the user experience of agent applications, recognizing the balance between human involvement and automation. He praises the rewind and edit functionality seen in demos like Devon and highlights the significance of both procedural and personalized memory for agents. Overall, the video emphasizes the exciting possibilities of AI agents while acknowledging the ongoing challenges and the need for further development in areas like planning, user experience, and memory management.
I'm new to "flow engineering", and very excited to step up to this level of AI. Lots of value. Thank you.
Excited to see more updates to come from LangChain, we are still big believers :)
Hope you check out our new Taskade multi-agents and AI teams release!
What I feel is understated is the fact that the more autonomous agents can act, the more they can proactively run in the background and just prompt users when they got something to show, get stuck or need human approval
Make complet tuto about LangGraph and then Langgraph+CrewAI
??
If you want to use them, just check out the docs
after agents comes "neurosymbolic ai". Basically it means adding agents that are not LLMs but are rule based databases, expert systems or symbolic languages. Things like CYC and wolframalpha. Then you end up with an AI that understands physics, engineering and actual reality vs possibility. Agents are going to dominate in 2025 & I believe neurosymbolic Ai is about to blast off dominating in 2026. By then we will be dealing with super intelligence.
I would look at them as more of software instances with internal workflows/processes and access to external tools that bring about agentic behavior
"Sorry, this promotion is not available in your region"
VPN node out of region that it is offered
You need more faith 😇
I believe that at the same way when as humans we reflect on what we are writing down we do go back delete and rewrite, when we have to work on a big task we split it in pieces and we may need to delegate the work so that who is good at a specific task can focus on it. I do not want that embedded in the model because at some stage I will require the model to be Expert in a specific topic and give him access to knowledge that will augment his context and give out the better result possible. I believe that AGentic Workflows will be a thing for a long time.
I'm grateful to you for providing these awesome videos. Your videos are not based on hype. You provide quality, pure-curiosity driven content. This is why this channel is the best. I can follow the journey of AI only by following your videos. This is a privilege. Thank you! Thank you! Thank you!
crewai is pretty rough sledding unless you have an unnatural tolerance for pydantic validation errors and love to read the source code to find out how to actually get anything done
Hey there, great video! I think that these are definitely short-term hacks. As you mentioned, there’s no framework like Transformers that fully supports this kind of functionality yet. However, we’re clearly moving in that direction.
Think of it like the difference between adaptive cruise control and full self-driving cars. Adaptive cruise control was a huge leap forward, but full self-driving is in a completely different league. We’re currently in the adaptive cruise control stage with AI agents, but I believe it won’t be long before we reach the full self-driving stage where models can truly think and reason autonomously. Exciting times ahead!
"SmythOS sounds fascinating! The idea of AI agents collaborating like human teams is super intriguing. Excited to see how it optimizes business processes! 🌟🚀"
A sophisticated framework for a semantically unsophisticated domain model.
ultimately we will need to have systems that can do "Active Inference" The Free Energy Principle in Mind, Brain, and Behavior Thomas Parr, Giovanni Pezzulo, and Karl J. Friston
Flow Engineering is a great term that intuitively makes sense to me.
Pretty cool role actually. Kinda just makes sense
Hmmmm... assembling different functions into a flow to control an output... basically a developper ? :)
@@n0madc0re Yeah, go figure. Well as promising at GenAI is it still has a ways to go and even so, even if it becomes flawless, trust is a hard thing to earn.
@@kenchang3456 true !
The next group of people to step onto the AI agent stage will be philosophers
I would not know why these techniques would be needed forever. Only if I would assume that LLMs will not get smarter. Besides that the LLMs will get smarter, you can also see big approvements with smaller models, because the training got more efficient. I think there are two processes that advance AI at the same time. One is the LLMs themselfs through more parameters and the second is better training data, better format of it or better quality.
What I am looking forward to is when the Model reaches a level and it can actually say that it doesn't know the answer or it has not enough data to answer correctly, instead of just giving a random answer. For me this is one indicator for AGI but I think this could also be done with agents. Basically identifying the quality of an answer or a possible answer. You could than for example give it a book as PDF or other papers, so it's able to provide the right answer.
Agents don't cover for the LLMs lack of intelligence, but lack of tools and external or realtime info access. I don't see how an LLM can be trained for that. LLMs can be trained to pick and coordinate the tools better and plan ahead. That's great. But that doesn't solve the framework to add tools or to reflect on their output. Right?
@@dusanbosnjakovic6588 There are quite a few solutions with agetns, where it iterates through multiple agents with different roles. Of course you could have an agent that could measure the quality of the output. We have that currenty with coding, where we have a coding agent and testing agent where it iterates back and forth until the code works.
The way I define AGI, is when the system can self improve. And then we can look at what areas it can self improve. So agents on top of (probably tons of different) LLMs will never go away. But will probably be programmed by AI.
8:35 short term. I agree with you Matt, I don't think Transformers will be able to do this natively without a serious change in the algo.
The main issue with the extensive workflow is cost. So many tokens burned going around and around. Not so bad on Groq using a cheap LLM, but still will add up.
Langgraph js stategraph simple fast tutorial would be unimaginably valuable IMO.
I agree with Matthew, we will require a different architecture which will not just depend on LLMs
So essentially current stages of development are around giving an agent an internal monologue to verify it's own results.
Hey Matthew, there are now so many tools mentioned CrewAI, Devika, MemGPT, AutoGen, Langchain etc. pp.. It would be really nice if you could give an overview of how to structure and use which tool in order to build something useful and working. Right now I am quite lost and loosing the curiosity to play around with it. As a full time worker, I don't have much time, but I love to explore this stuff. An consise outlining would really help to know which path to go. Thank you!
If the model validates itself we have little or no control over how it does so. By making result validation external developers can use various methods and models to do so
I am looking for the link to the papers mentioned in the video. I didn't see them posted in the notes?
I think that idol combo will differ for different types of needs the agent is fulfilling. Sometimes, it remembering too much if my stuff, means I can’t ask it to only take into account my most recent inputs, and therefore it doesn’t give me what I want now, as my own thoughts on the topic have evolved.
One of the things humans do when problem solving and planning is to visualize the possibilities. An agent could gain that ability.
agents to me are psudo representation of specific task experts which might embed in a series to steps to ensure they complete there task.
This might mean having there responses generated by several different models and then evaluated using different frameworks or other sub-agents. These strategies would then allow us to leverage lots of different specialised models and uncover some of the general shortfalls of using models that are too genrealised.
Just like a human, you start wide and narrow in to a problem Doing this iteration processes will enrich the flow and how problems are solved. Basically we need our own mechasims to curate our own infrant AGI models and langchain is the perfect tool to create these abstractions.
Awesome video matt. To answer your question, I feel that many of these things are short term hacks. I believe long term the architecture looks much different as we design more modular structures.
I believe they will be much more customizable and augmentable.
I think frameworks are going to be needed regardless of LLM development, and human in the loop will be needed for development for the foreseeable future because there are certain things we just understand natively that we don't even think to train AI on.
8:28 pretty certain it's mostly a temporary hack, but some security researchers will still finds some exotic hacks over time.
Could you say that OpenAI’s o1 model offered the “planning” and reasoning capabilities that wasn’t available at the time of this video ?
Rewind is also very powerfull. Most of the LLMs today stuck at some point. This maybe caused by some misprompting or some wrong word used, and then they stuck. Most of thee time you start again and try to not ask this questions when that happend. Rewind will be powerfull. It is not realy the human way but for a tool very helpful.
All of these prompt engineering techniques like CoT, ToT etc are not going to go away, LM’s require next token prediction, so they can only predict the next token after producing a token.
however, what we will find is that the outputs from these actions, will be written in the code environment and not presented to the user. The user will see the final output.
If it’s not in the code environment, it would be written into some form of “preliminary response” type memory.
Thank you!
I think AI Agent can work better with standards such as BPMN 2.0, a business process management standard. Like the famous statisticien W. Edward Deming said, 94% of failure is caused by process, not people. I think Agentic workflow will not be an exception to this rule.
Consider the operation of the human brain. It comprises several areas that collaborate smoothly, although this might not be immediately apparent. The Language Center in the brain serves as a rough comparison to present-day LLM artificial intelligence. However, intelligence encompasses far more than this. For instance, there are moments when we can instantly recall answers, such as during a trivia quiz (this kind of thinking is what LLMs are modeling), and other times when we engage in deeper, reflective thinking. We are nearing the development of models that can simulate this deeper level of thought, which will probably benefit from more complex agentic interactions within the models.
@MatthewBerman: I believe that some model makers will try to integrate Agents into their models but as applications of agents and need for tools grow, Agents will continue to be forefront architecture for designing llm based applications
When are we going to see an agent that can give a response that includes pictures & diagrams in addition to the written answer? I have uploaded about 100 service manuals which I would like to be able to find information - which often includes pictures and diagrams. Not found anything taht can do that yet.
Using and managing several AI models is incredibly simple with Smythos. The results are excellent and the interface is user-friendly.#SmythOS #AItools
Great breakdown! Thank you. Question: are RAG and long term memory synonymous? If not, what are some other techniques to maintain a long-term memory?
Nice one Matthew! You are now my inspiration, what a way to self promote 😍
Flow engineering is an interesting area of development as I'm reading the book, "Thinking Fast and Slow" by Kahneman, about how our brain has a balancing act of fast and slow processes depending on tasks and goals.
I'm afraid if a lot of these processes, like reflection and "think slow," get baked into large LLMs the wrong way, it will hinder flow engineering optimization. "Flow engineers" will be able to optimize the systems they design if they have more specialized LLMs to choose from. So it makes me think there will be markets for more genaneral, do it all LLMs, and a variety of specialized LLMs optimized to do various functions, like parts of the brain or parts of organizational structures, like agents are already doing.
I made a while LangChain program that i offer to local business that want conpany agents. I use the "directory loader" and then use unstructured to load all file types from the directory you chose opon opening. It saves all your history and then re-uploads thats history to the agent database so when you run the app again, it has a memory of your last conversations and "changes" you tell it to make.
Ive noticed that it keeps these changes i tell it even from previous days. Not trying to promote but i did make a mediocre video about how to use and install it.
Then I just make them a custom batch file they can click on to automatically run python and then start the app. Makes it easy for me to update and make changes remotely for them
it's a matter of time before there are enough use cases where there are no longer generalized models, every model is designed for a specific use case, and there is an agent that query routes the initial request to the model that is best suited for the task. Even the query routing agent, will be fine-tuned, to route to the best models and tools for the task
I also think hallucinations will be "detectable" and we will likely just see agents asks for more clarity, or something like that.
I built a script that does this, it's not an agent really, just sort of appends the clarification to the initial query... but anyway - this is incredibly reliable to design - very simple to build too
I think LLMs with the ability of thinking slow will not be able to generalize in a way that they have good performance for every use case. I think that algorithms for chaining and evaluating thoughts will be different for different use cases, hence many types of specialized agents will be built.
AI Agents: they are a very initial attempt from the open source community to make llm models perform better .
For sure, this kind of techniques (what Agents do now) will be embedded into the models with GPT5 or GPT6.
I'm sure that the open source community is providing also new ideas to OpenAI and other on how to improve the LLMs (and these ideas will be embedded into the models soon)
These are short term hacks that are useful for a number of use cases when they are reliable and their tasks are bounded set of outcomes. But to make them useful and reliable for complex tasks involving degrees of uncertainty of outcomes, needing strategies to achieve shortest paths to objectives, we’ll need more techniques and technologies.
Automate the whole process. Have 4 to 6 "master" agents who can spawn numerous sub-agents. None fixed function. Have the master agents decide what function and task they need and assign it to the sub-agents. All powered by AI.
Viola - AGI
I realized on the UX that its more like figuring out SA (situational awareness) extrapolating to my career in military avionics the faster a pilot can determine an issue/threat was what was needed in is display. I don't here this in the presentation...just that some arranged the best four square that people liked...It would feel better if it was talkes to with SA and the defined A established which may change for each project or agent tool combination
I think LLMS are going to offer different ways of planning and reflection, as an they'll do this naturally but, more interactive say "agile" models are going to emerge. Thus as you say in that regard human in the loop will not go away anytime soon.
I can’t find a link to the tree of thought video in the description?
Even if these techniques somehow move over to be more integrated in the models themselves there is still going to be technique that are not, and building systems that apply models is going to continue to be a thing. We need to have a whole new type of models that can implement continuous learning. But there are other benefits from having auxiliary systems, for one telepathic abilities for AI.
wow the rabbit is so bad he's trying to get rid of it already! Come for what they already claimed to have stay for the vapours. I'm hoping they'll have a third website working really slowly by the end of the year. The rate of progress unbelievable
Haha I thought the same 😅
There likely is an optimal amount of Agents per system. And the only way to determine this would be trial and error. Let’s say K is optimal number of Agents, then Agents > K wouldn’t produce an increase in “accuracy” (whatever that is for an Agent System).
Yes I think these techniques or short term hacks will be replaced by more intelligent transformers models and that’s when these models will evolve to a whole new level.
Not convinced about human in the loop for long. The missing part is to generate data from this process itself and further train the underlying model. With enough iteration, the human like reasoning or AGI should emerge.
@Mathew Berman, why do you need my full DOB to subscribe to your newsletter ? Are you doing segmentation of your subscriber or something?
Is there a link to the langchain video?
I think that zero shot prompting is the aim of bots, so the prompting technics will not stay forever.
These technics are short term hacks I think. Take planning for example. We already know the future llms of openai will be able to do this. As an agent developer you habe to ask yourself if it is worth the time to develop a complex system just to be „steamrolled“ by OpenAI later on.
Tools is maybe a little different. But I think it will also be really simple to e.g. connect to apis if the llm gets better and better.
And I do not want to say that agents will be irrelevant in the future. The contrary is true.
But the heavy lifting of the agentic behavior will be done by powerful llms that will integrate most of what is discussed here in the video. The additional agentic part of an application (api connection etc) will (and should) be really small.
BTW, another key agent might be one that allows one to communicate to the AI not just with words (be they typed or spoken) but graphics and gestures that can be captured on a cell phone camera.
And even though I tend to hate having to plug my own videos, I made a video on some ways one could do this and uploaded it to my TH-cam channel which a called YACKS, Your Own Color Keys where one can used colored markers to do this and make ones own color keys to add AI in making sense of a hand drawing, even if it is drawn so as to be rather sloppy and incomplete.
And, BTW, I can't remember if I mentioned this in a prior video, so i so my apologies for repeating it here again.
But the idea is that human in the loop is important but can be a time limiting factor so the more quickly and easily a human can input info the less of a time impact the human would impart to the process, plus save the human time and effort as well.
Thanks Matt
Do you have link to anyone with that released but now removed model?
So when is Q* coming out?
Wow! You have explained it very well. Before this I also thoughts that ai agents are just like co-code gpts
Let's talk about agents, baby
Let's talk about A and B
Let's talk about all the code strings
And the bugs that we might see
Let's talk about agents
Let's talk about agents
Let's talk about agents for now, to the folks in the lab or online, wow
It keeps booting up anyhow
Don't reboot, mute, or delete the topic
'Cause that ain't gonna block it
Now we talk about agents daily
We talk about the smart, the simple, maybe
It's the thing that gives us the "AI dreams"
But what's too much? What does it mean?
People might avoid, people might deploy
Some use it well, while others just toy
From ethics to code, let's have the convo
For AI’s growth, let’s set the tempo
No link to the actual talk?
Recursion/looping plus being fine-tuned to take advantage of that, should be enough IMO. We've already seen what they can do with a little push and some jerryrigging; seems to be in reach to have one that is native to that kind of environment.
I mean, of course, not in sealed frozen box, you gotta let them interact with the world and store new information and such. In short, what people that studied AI safety for decades have been warning us not to do... lol, I still remember how common it was for internet randos to just say "keep it locked in a box and we'll be safe"....
If you think about how human beings would go about solving this problem, what you end up finding is that yes all of these different skills everyone can do but there are people who are highly skilled at doing specific functions.
Where I think the future is, is that you will have either a collection of models trained on specific components and you use those models within agent environment, or even a workflow.
The other option is you end up with something like mistrial with its eight different models built into one and the ability to call on the different models inside of it bigger model set
This is affectively how we’ve solved a number of problems not only as humans where we have teams of people who have specific tasks and find tuned or highly trained and skilled in specific functions
But we also applied this to computing. we have multicore CPUs and GPUs but we also have specific hardware that does different functions. The CPU can render imagery but GPU does that far better.
Think about the human brain, yes we have one brain but we have different cortexes inside that brain that performed different functions
The key reason why I believe we will have a collection of models in workflow as opposed to one single model like our brain is because of the fine tuning required for each area of the task.
Take a planning agent for example .
If you have a general model for planning, it is not going to be able to have specific area training that enables an effective plan
Planning in Sales versus accounting two highly specific areas and domains that have different planning requirements in order to complete a task
A general knowledge planning Agent is going to be nowhere as effective as a fine tuned Sales or a fine tune accounting planning agent.
Rag doesn’t solve that
What rag would do is access the company specific information to assist in planning.
What this means is that you may not have large language models for all these different functions.
You may have smaller specific models, trained on data to perform that specific task.
A planning LM fine tuned For Sales, doesn’t need to know how to write code. So you wouldn’t waste all of the resources to train the foundation model with coding capabilities
One word: SUPERVISORS
As far as how much human in the loop one might need, this boils down to risk mitigation, and how much of that does one need vs rapid response.
As such my former employer had a great definition of risk that was the product of probability of failure (a number between 0 and 1) times consequence of failure, say a number between 0 and 100,000 where 100,000 is very high consequences.
So if the prob of failure is high say .2 but the consequence is low, say 5 then the risk would be 1.
But if the prob of fail is .001 and the consequ3nc of failure is 50,000 the risk would be 50.
Thus the higher the risk the higher the need for a human to check the answer.
And the AI could even estimate these as well as employ them.
Please keep the Rabbit R1 😅
These things are short term hacks, but it still play a long term role as competing models provides different value add that call for a integration framework like LangChain to play a unique role.
We have to do them forever and refine them.
My hunch is the network has too resolve the question at hand, so I don't think the model can be designed to have planning built in. But there could be a method to natively feedback without decoding first eg maintain vectors between planning stages. I always thought Open AI did actually always have the rewind step from the beginning. When the first gen of ChatGPT was launched you could see it re-writing the sentence as it optimised the answer. The current version hides this behaviour and no longer streams the output. What I think gPT 3.5 does is back track the layers if it gets a vary low probability for the last token it generated. Then tries a couple times to get I higher probable token.
hallucination still is a problem today. I prompted in german today 3 top of the line local LLMs with word "Drachenfrucht" meaning Dragonfruit. This is a real fruit an all of the Models made something up with an magic fruit from tales with dragons. Maybe a good question for your tests.
2:00 that that that
Not sure, but any service you speak with will be better delivered by a trained AI.
Unfortunately, agents developed by humans (developers/programmers) are just another form of SAAS-and SAAS is a dead duck as GenAI mods become more powerful. By GPT5, agents will be generated by the GenAI model itself. There’s nothing long-term here. These types of agents may hang-on for a year or two before they disappear.
hi ...i am a novice researcher ...i want to research in the area of future of agents..can you tell me what are the research article to read to understand this better
Short term hacks… stick the llm back into your diagram and create an Agentic Plane where your Agents live… (this is kinda like inserting a Management Plane through the ISO Stack, i have a patent for that boondoggle.)
JavaTheCup did aomething similaire, I think he usw Albedo as well.
He found out, that you can place Geo Constructs before the challange and use the Explosion of auto destruction to destroy some of the targets.
IMO they prompting is a workaround, the current LLM limititations.
Its just a matter of time. We only need bigger context windows, cheaper prices, and faster speeds.
All of which will inevitably happen from hardware improvements alone. Just gotta let em' cook
The tooling is the secretsauce. It'll always be needed. What's better. One gpt-7 AGI, or 10,000 orchestrated in perfect harmony?
I'd take the latter any day.
I'm trying to set up my own AI system with Llama 3, Crew AI, and LaVague AI
I still worry about GIGO.
We humans are training AI on what we *Believe* to be true.
Progress is a destructive process; it relies on the destruction of false Models of Reality.
To me this is the elephant in the room.
But storing "memory" as an external session component isn't really memory is it? It's just regurgitating. Real memory has to live in the context if I understand correctly
MAMBA architecture is going to unlock all that, and its faster, and uses less compute.
Thank you for speaking naturally in your videos 👍