Thank you for explaining the topic. If you want to train a small model to do reasoning, maybe you could look at this paper "Orca: Progressive Learning from Complex Explanation Traces of GPT-4" or wait until this model is released into the wild. The number of tools written to use in LLMs is growing fast, so you can not mention them all in the prompt. Is there already a technique to manage this, like a database approach for tools to search them by topic or such thing like TreeOfTools?
One thing that concerns me is the lack of control over the number of calls to LLMs when using agents. With langchain, a single question can lead to multiple calls to LLMs, and this process remains somewhat opaque. Am I correct in understanding this?
I am having an issue when i run your code :LangChainDeprecationWarning: The class `DocstoreExplorer` was deprecated in LangChain 0.1.0 and will be removed in 1.0 docstore = DocstoreExplorer(retriever) any clue please?
Thanks for the vid. What I do not understand yet is that it needs search and lookup actions? It is already trained on a big dataset right? So it might know Russel crow already? How do you define that it should not use a tool but just use its own dataset for answers? ( my agent only wants to use tools also when I don’t think it should be nessecary )
Can you explain what the output parser is doing? I don't understand why it is needed to get whatever the agent searched for, isn't that information already there in the prompt sent to the LLM?
at the end you mentioned about changing CoT prompts. i assume these are embedded in the core agent execution framework in LangChain . how can i change these CoT prompts?
Thanks for the great tutorial Sam! One question about output parsers I'm confused abt is how does the program know which name to take if there's multiple names that shows up in the search result. For example, if asking abt the POTUS, it may return the past few presidents in the text. Does the LLM get involved to figure out which is most related to the question? If not, how does the regex know which name to pass onto the next action? Thanks!
Thank you Sam. As always yours videos are very helpful. I am trying with LMQL and LangChain for the ReAct paradigm. LMQL (Language Model Query Language is a programming language for large language model (LM) interaction). LMQL guarantees that the output text follows the expected format. It would be nice to try with OpenSource LLMS.
This is something awesome. I understood that you need to make that standard content in the template on any question you are going to ask. So i guess this content template can be used only to hop to websites and get info and not do any calculations. Am I right? btw is there any link to understand that content template given by the react paper?
Hey, Im currently working on this and need to build an actor-critique style LLMchain, so a tool-less bot that would analyse transcripts. Can you go over a showcase on how to setup output parsers and prompt.format() for a llm_chain usecase?
6:32 "a lot of the reasoning stuff doesn't work on the open source models" so this was a year ago -- I wonder if this is still true for the newer models like llama 3?
Thanks Sam. One year later I came back to watch this video again. Do you think this is still useful with tools and function calling? Most LLMs now support it, including Claude and Gemini
Nothing changes because of the inclusion of function calling in particular LLM models. LLMs use functions as tools. ReAct a sequential process: thought --> action --> (action_result or observation) then this is looped until a certain number of iterations is met or an answer is reached. Rewatch the video, my friend. One of the main things was how to use functions as tools.
@@TheGenerationGapPodcast Thanks my friend. I might have confused you ^_^... By functions/tools, I actually meant LLM's native function-callling/Tool-using feature, not from users' prompt or using Langchain. OpenAI firstly released its function calling feature on 20230613 that this video of Sam had no way to cover. The point is that there's no way you can insert any Thought-Observation-Action prompt during OpenAI's native function calling flow and OpenAI decides which tool to use and what to follow the next. Everything is behind the scenes but it seems to work well. I am not sure if OpenAI is using the ReAct logic internally but I can't control this flow.
Thanks for you clear explanation and showing examples!! It all makes sense to me, but as I am implementing it, I am getting alot of errors with the REACT_DOCSTORE agent type. When in debug mode, I can see that it found the answer but does not output it and reaches the maximum iterations.
Thanks for the great video. Would one be correct to assume this is what autogpt is all about and even hugginggpt which probably used technical plugins for react.
Thank you for the sharing! At the end of the video you said that it won’t work for most of the open source models, does it mean that we have to use gpt-4? Will llama 2 or some other model works?
Hi, thanks for this nice explanation. I had already seen the benefits of CoT prompting on ChatGPT but was wondering if there was a better way to guide it or rather make it guide itself to a better answer. ReACT looks way better. How can one implement this in flowise? (not a dev. just comfortable enough to use the drag and drop of flowise) My previous method of using prompt chains to trigger the reasoning part now looks stupid lol..
I think the model's tendency to "justify" or "reinforce" its invalid answer is the same kind of issue as the LM repeating itself ad nauseam, the simpler LMs do it on verbatim word sequences, the larger ones on verbatim sentences, and I suspect the really clever ones do it on meanings of sentences (or paragraphs). The exact reason, as far as I'm aware, has not been truly researched, but the sequence of generated tokens stored in context tends to amplify the probability of the same (!) sequence getting generated again. This is puzzling because this sort of repetition is not really found in the training datasets. So I guess something is fundamentally broken in the (transformer?) algorithm and we are desperately patching our way around it. I suspect that it also a feature because the model is supposed to repeat certain words (e.g. refer to actors of a story repeatedly) while suppressing repetition of others, and it really is not able to tell between these categories of "repetition which makes sense / is inevitable" and "repetition because repetition is so much fun".
Another way to put it would be that the problem is that the model gives equal importance to its own generated bs as to external inputs (be it prompts or results of the observations from its senses/tools). Perhaps the solution will be to teach the bastard models some self-criticism and humility (attenuate the probabilities depending on where the in-context tokens came from). There's probably already someone writing a paper about it lol.
I'm just spitting at the wall and seeing what sticks. But in order to use react at a somewhat cheaper level why not have openai API do the basic steps (so layout the general steps) and then parse off this info to a open source LLM that's run on a GPU to do the menial tasks? Could that possibly work?
What an incredible and useful channel, your content is awesome ⚡ It would be very nice if you shared a video about Text Generation Web UI and how to use it with MetaGPT or AutoGPT. Because i tried so hard to use a drop-in replacement API for openai with those projects but the output is not as expected.
So theoretically, if we only fine-tune an open source model to only perform these thought and action generating tasks, a small OpenSource can potentially do all these tasks really really well? Only constraint is the data. Right?
One of the biggest drivers is the reasoning capability of the foundation LLM that you are using. The LLM model also must have function-calling abilities. Data for fine-tuning or data for training the LLM?
This is almost pseudo-AGI/Agent design. A single prompt / chain forms a linear process but with steps implied at the start. Would this be improved with camel/AGI/Agent-style back-and-forth conversations? Moreover, I'm wondering if the future of success is in balancing the large language models with specialized models for some tasks. So could something like this include OpenAI asking StarCoder-beta or Gorilla for a code or API output, and then using that to build steps, evaluate / improve code, etc...
This is an intuitive idea, but it is a bit at odds with the history of the research field. This "specific model for specific task" approach is what AI researchers (and computer game AI developers) have been dabbling with for decades, in lack of better options. And then, bang, the big unified transformer models came about, and produced general results that far surpassed such handcrafted solutions. In fact the only reason to mess around with the handcrafting would be if you wanted to save on the hardware. But this might be a "penny wise pound stupid" sort of approach when it comes to AGI (while certainly reasonable if you only want a single task done well), sort of like trying to build a winning race car from off-the-shelf scrap parts in your local hardware shop.
@@clray123 Interesting way of looking at it. Self-referentially (but with human agents) I wonder what other folks think here. Is @clray123 right? Sam? Please explain your reasoning in your answer. :)
If i post the question of Russel Crowe to Chatgpt, it returns me the correct answer. Does this model is able to do ReACT reasoning of its own without need for react prompting?
Yeah I do in a lot of the videos. 2 reasons for this one, 1. didn't want to confuse people with the system prompt stuff etc. 2. Actually often the older davinci model does better with ReACT. The best one is by far is to use the GPT-4 but a few more months before the price on that one comes down.
@@samwitteveenai thanks. What are the steps to modify for turbo? Which video should I look at? Or if you mind pasting the system prompt modification I need to make? Just want to make it 10 times cheaper! Thanks!
Have you considered that the answers are not better but people perceive them as better because there is "reasoning." You have to create answers up front to test this. Double blind studies are needed.
Actually when tested on various datasets the models do much better using reasoning and tools. I do wonder about some of the newer reasoning techniques like Tre of Thoughts for how well they generalization to things that weren't in the paper.
Running the doc_store example now returns consistently "Thought: Joe Biden is the 46th and current president of the United States, so the answer is 46.". I asked davinci-003 to critique the thought and answer and it says its ok. gpt4, on the other hand, pointed "No, the thought and answer are not good. The age of the president is not determined by their position in the sequence of presidents. The thought should be about finding out the current age of the president, which can be done by subtracting their birth year from the current year."
Interesting so perhaps this has changed with the new updates to which model is the default fine-tuned model they use. Will try to look at it when I get a chance.
Wow Sam, you're blowing my mind with this info. You're a content wizard, I swear. This is very informative, please don't stop!
Really the best explanation till now on how LangChain uses React prompting.
Thanks, glad it was helpful.
Once again I appreciate the time and effort you take in these videos to set things out simply so that it is easily understood. Thank you!
Amazing, the only youtuber with depth in the LLM space
Thank you so much Sam. I was stuck with React trying to understand how it works. It is now much clearer with your very good explanations.
Think, THEN speak. That's a good idea, also for humans.
I just finished reading this paper and your video was exactly what I needed to cement it in there! Thank you!
Brilliant explanation from the paper research up to demonstrating the ReACT CoT. Thank you again. Well done.
Great explanation, thanks for taking the time to post it!
One of the best video content I witnessed. Great work Sam, Truly appreciate your efforts & clarity in your content.
Thank you for explaining the topic. If you want to train a small model to do reasoning, maybe you could look at this paper "Orca: Progressive Learning from Complex
Explanation Traces of GPT-4" or wait until this model is released into the wild.
The number of tools written to use in LLMs is growing fast, so you can not mention them all in the prompt. Is there already a technique to manage this, like a database approach for tools to search them by topic or such thing like TreeOfTools?
Yep output parsing is very important, I personally like to use a tool like jsonforming with reasoning in the given schema.
good work - appreciated the clear explanations and now feel confident in using this part of langchain. cheers.
Great explanation! You keep it clear and simple. Txs so much.
That is a brilliant explanation! Thank you so much, sir!
Fantastic high quality content. Really appreciate the work you're putting in 🙌
One thing that concerns me is the lack of control over the number of calls to LLMs when using agents. With langchain, a single question can lead to multiple calls to LLMs, and this process remains somewhat opaque. Am I correct in understanding this?
First 😊. I just saw the notification and ReAct was the one I was waiting for . Thanks as usual
Hey Sam, love your content. Could you please try out a similar experiment but using Guanaco or Falcon models?
I tried the Falcon and it didn't work. I haven't tried on Guanaco.
This was super interesting - I wonder how difficult it would be to add this kind of reasoning to an open source model with some fine tuning
Incredible helping breakdown on this topic, thanks.
Glad it was helpful!
I am having an issue when i run your code :LangChainDeprecationWarning: The class `DocstoreExplorer` was deprecated in LangChain 0.1.0 and will be removed in 1.0
docstore = DocstoreExplorer(retriever) any clue please?
thanks Sam, i found yourvideos very informative and helpful
I appreciate this a lot Sam really good deep dive observation
Wow, thanks so much for this explanation!
Thanks for the vid. What I do not understand yet is that it needs search and lookup actions? It is already trained on a big dataset right? So it might know Russel crow already? How do you define that it should not use a tool but just use its own dataset for answers? ( my agent only wants to use tools also when I don’t think it should be nessecary )
I like ReAct. But not sure whether using LangChain makes it simpler or adds a level of abstraction that actually complicates it.
Can you explain what the output parser is doing? I don't understand why it is needed to get whatever the agent searched for, isn't that information already there in the prompt sent to the LLM?
at the end you mentioned about changing CoT prompts. i assume these are embedded in the core agent execution framework in LangChain . how can i change these CoT prompts?
Thanks for the great tutorial Sam! One question about output parsers I'm confused abt is how does the program know which name to take if there's multiple names that shows up in the search result.
For example, if asking abt the POTUS, it may return the past few presidents in the text. Does the LLM get involved to figure out which is most related to the question? If not, how does the regex know which name to pass onto the next action?
Thanks!
Thank you Sam. As always yours videos are very helpful. I am trying with LMQL and LangChain for the ReAct paradigm. LMQL (Language Model Query Language is a programming language for large language model (LM) interaction). LMQL guarantees that the output text follows the expected format. It would be nice to try with OpenSource LLMS.
LMQL is very interesting, I have thought about making a video about it. Do you have it working with an open source model for ReACT?
Not with Open Source Models . Only Commercial Ones.
This is something awesome. I understood that you need to make that standard content in the template on any question you are going to ask. So i guess this content template can be used only to hop to websites and get info and not do any calculations. Am I right? btw is there any link to understand that content template given by the react paper?
Hey, Im currently working on this and need to build an actor-critique style LLMchain, so a tool-less bot that would analyse transcripts. Can you go over a showcase on how to setup output parsers and prompt.format() for a llm_chain usecase?
6:32 "a lot of the reasoning stuff doesn't work on the open source models"
so this was a year ago -- I wonder if this is still true for the newer models like llama 3?
Thanks Sam. One year later I came back to watch this video again. Do you think this is still useful with tools and function calling? Most LLMs now support it, including Claude and Gemini
Nothing changes because of the inclusion of function calling in particular LLM models. LLMs use functions as tools. ReAct a sequential process: thought --> action --> (action_result or observation) then this is looped until a certain number of iterations is met or an answer is reached. Rewatch the video, my friend. One of the main things was how to use functions as tools.
@@TheGenerationGapPodcast Thanks my friend. I might have confused you ^_^... By functions/tools, I actually meant LLM's native function-callling/Tool-using feature, not from users' prompt or using Langchain. OpenAI firstly released its function calling feature on 20230613 that this video of Sam had no way to cover. The point is that there's no way you can insert any Thought-Observation-Action prompt during OpenAI's native function calling flow and OpenAI decides which tool to use and what to follow the next. Everything is behind the scenes but it seems to work well. I am not sure if OpenAI is using the ReAct logic internally but I can't control this flow.
Is it possible to fine tune a model's 'thoughts'?
Excellent explanation
Thanks for you clear explanation and showing examples!! It all makes sense to me, but as I am implementing it, I am getting alot of errors with the REACT_DOCSTORE agent type. When in debug mode, I can see that it found the answer but does not output it and reaches the maximum iterations.
This will depend on the model you are using it with.
Thanks for the great video. Would one be correct to assume this is what autogpt is all about and even hugginggpt which probably used technical plugins for react.
Yes kind of. HuggingGPT and HF Transformers Agent use something similar to this. AutoGPT it really depends on how it is setup
Thank you for the sharing! At the end of the video you said that it won’t work for most of the open source models, does it mean that we have to use gpt-4? Will llama 2 or some other model works?
So that video is from a while ago, it will work with some open source models, especially if they are fine tuned for it.
@@samwitteveenai thank you!
Hi, thanks for this nice explanation. I had already seen the benefits of CoT prompting on ChatGPT but was wondering if there was a better way to guide it or rather make it guide itself to a better answer. ReACT looks way better. How can one implement this in flowise? (not a dev. just comfortable enough to use the drag and drop of flowise) My previous method of using prompt chains to trigger the reasoning part now looks stupid lol..
What is the research paper name on 5:00 ?
Thanks for these videos really helpful
I'd like to see you use PaLM2 for this
I think the model's tendency to "justify" or "reinforce" its invalid answer is the same kind of issue as the LM repeating itself ad nauseam, the simpler LMs do it on verbatim word sequences, the larger ones on verbatim sentences, and I suspect the really clever ones do it on meanings of sentences (or paragraphs).
The exact reason, as far as I'm aware, has not been truly researched, but the sequence of generated tokens stored in context tends to amplify the probability of the same (!) sequence getting generated again. This is puzzling because this sort of repetition is not really found in the training datasets. So I guess something is fundamentally broken in the (transformer?) algorithm and we are desperately patching our way around it. I suspect that it also a feature because the model is supposed to repeat certain words (e.g. refer to actors of a story repeatedly) while suppressing repetition of others, and it really is not able to tell between these categories of "repetition which makes sense / is inevitable" and "repetition because repetition is so much fun".
Another way to put it would be that the problem is that the model gives equal importance to its own generated bs as to external inputs (be it prompts or results of the observations from its senses/tools). Perhaps the solution will be to teach the bastard models some self-criticism and humility (attenuate the probabilities depending on where the in-context tokens came from). There's probably already someone writing a paper about it lol.
I'm just spitting at the wall and seeing what sticks. But in order to use react at a somewhat cheaper level why not have openai API do the basic steps (so layout the general steps) and then parse off this info to a open source LLM that's run on a GPU to do the menial tasks? Could that possibly work?
Yeah often is better to just fine tune the open source model to do it all I have found.
What an incredible and useful channel, your content is awesome ⚡
It would be very nice if you shared a video about Text Generation Web UI and how to use it with MetaGPT or AutoGPT. Because i tried so hard to use a drop-in replacement API for openai with those projects but the output is not as expected.
Awesome elaboration!
i am thinking using RAG as my functions in Tools
So theoretically, if we only fine-tune an open source model to only perform these thought and action generating tasks, a small OpenSource can potentially do all these tasks really really well? Only constraint is the data. Right?
One of the biggest drivers is the reasoning capability of the foundation LLM that you are using. The LLM model also must have function-calling abilities. Data for fine-tuning or data for training the LLM?
Hi Sam, Thanks for the Amazing video. Is it possible to do the same for pdf documents?. Looking forward to hearing from you. Thanks, Andy
Not sure what you mean the same for PDF?
I dream of a day when open source models will be as good as openai models and everyone will have an assistant like this in their pockets
This is almost pseudo-AGI/Agent design. A single prompt / chain forms a linear process but with steps implied at the start. Would this be improved with camel/AGI/Agent-style back-and-forth conversations? Moreover, I'm wondering if the future of success is in balancing the large language models with specialized models for some tasks. So could something like this include OpenAI asking StarCoder-beta or Gorilla for a code or API output, and then using that to build steps, evaluate / improve code, etc...
This is an intuitive idea, but it is a bit at odds with the history of the research field. This "specific model for specific task" approach is what AI researchers (and computer game AI developers) have been dabbling with for decades, in lack of better options. And then, bang, the big unified transformer models came about, and produced general results that far surpassed such handcrafted solutions. In fact the only reason to mess around with the handcrafting would be if you wanted to save on the hardware. But this might be a "penny wise pound stupid" sort of approach when it comes to AGI (while certainly reasonable if you only want a single task done well), sort of like trying to build a winning race car from off-the-shelf scrap parts in your local hardware shop.
@@clray123 Interesting way of looking at it. Self-referentially (but with human agents) I wonder what other folks think here. Is @clray123 right? Sam? Please explain your reasoning in your answer. :)
If i post the question of Russel Crowe to Chatgpt, it returns me the correct answer. Does this model is able to do ReACT reasoning of its own without need for react prompting?
It is using the ChatGPT model
works well enough with some llama models
Thank you for sharing!
how does it stop the LLM generation mid stream?
it often doesn't it, it just users the output parser to cut it off
Great video!
I love you man.
useful,subscribed
thank you sir ❤
Why don’t you use GPT-3.5Turbo? isn’t it better and cheaper?
Yeah I do in a lot of the videos. 2 reasons for this one, 1. didn't want to confuse people with the system prompt stuff etc. 2. Actually often the older davinci model does better with ReACT. The best one is by far is to use the GPT-4 but a few more months before the price on that one comes down.
@@samwitteveenai thanks. What are the steps to modify for turbo? Which video should I look at? Or if you mind pasting the system prompt modification I need to make? Just want to make it 10 times cheaper! Thanks!
Have you considered that the answers are not better but people perceive them as better because there is "reasoning."
You have to create answers up front to test this. Double blind studies are needed.
Actually when tested on various datasets the models do much better using reasoning and tools. I do wonder about some of the newer reasoning techniques like Tre of Thoughts for how well they generalization to things that weren't in the paper.
Does the model ignore Wikipedia and just decide Russell Crowe won the Oscar for Gladiator? I don’t see how it made that leap…
Running the doc_store example now returns consistently "Thought: Joe Biden is the 46th and current president of the United States, so the answer is 46.". I asked davinci-003 to critique the thought and answer and it says its ok. gpt4, on the other hand, pointed "No, the thought and answer are not good. The age of the president is not determined by their position in the sequence of presidents. The thought should be about finding out the current age of the president, which can be done by subtracting their birth year from the current year."
Interesting so perhaps this has changed with the new updates to which model is the default fine-tuned model they use. Will try to look at it when I get a chance.
This is awesome! Great work as always Sam and my understanding grows by the day! Thanks to you🥳🤔🦾