Power Each AI Agent With A Different LOCAL LLM (AutoGen + Ollama Tutorial)
ฝัง
- เผยแพร่เมื่อ 28 พ.ย. 2023
- In this video, I show you how to power AutoGen AI agents using individual open-source models per AI agent, this is going to be the future AI tech stack for running AI agents locally. Models are powered by Ollama and the API is exposed using LiteLLM.
Enjoy :)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
Instructions - gist.github.com/mberman84/ea2...
Ollama - ollama.ai
LiteLLM - litellm.ai/
AutoGen - github.com/microsoft/autogen
• AutoGen Agents with Un...
• AutoGen Advanced Tutor...
• Use AutoGen with ANY O...
• How To Use AutoGen Wit...
• AutoGen FULL Tutorial ...
• AutoGen Tutorial 🚀 Cre... - วิทยาศาสตร์และเทคโนโลยี
This is what I've been waiting for, for like a year! Complete localization!!!
I just did this exact thing a few days ago. It's crazy how quickly things are moving and how fast everyone is getting to the same page.
Would love to see a tutorial of how to integrate MemGPT with this multi-agent architecture and would it make more sense to have one memory per model or one centralised memory.
@ thank you for this info, do you know what the use-case would be for MemGPT over Teachable agents in that case? Say I wanted to build a ChatBot that remembered user conversations over a long time period. Would it make more sense to use Teachable agents or MemGPT for the higher memory ability?
@@bensheridan-edwards876 there is a video on the channel about Teachable agents
Jap would like to see this too
Yes! This is what I wanted to comment!! And also! Fine tuned MODELS!
@ Really?? Any link to get more info about teachable agents?
I really enjoyed the pacing here, just enough detail on bits that may be unfamiliar (installing ollama, etc.) without getting bogged down. Nice video!
Thank you!!
Matthew!
I really love that you are so autogen focussed!
Thank you
just need to run ollama serve, pull the models to the server and run litellm without any comand, and call the models direct from autogen model="ollama/model_name", dont need 2 instances of the server
How do you pull the models to the server?
just run 'ollama pull '
Would it keep each model in memory and switch as fast as it would by having them loaded separately ?
Great work man, much appreciated, this things evolves so fast.
Thank you so much Matthew, you couldn't have created at a better time for me. Just want to thank you!! I'm really learning a lot from your tutorials!!!
Great video, your channel is without a doubt one of the best I've found for useful and practical advice on setting up LLMs for local use.
👍👍👍
Hey man, not sure if you'll see this but you are quickly turning into one of my favorite youtube channels! I'm really glad you made this video because it's just about perfect for a project I'm in the brainstorming phase of!
This is awesome. Just what I was looking for to play with this weekend!
I'm just about done getting my daily work stuff done, was about to jump into coding here soon! Listening now and will listen again/follow along soon
Damn bro. You are always reading my mind and coming out with the right video shortly after
At last! Thanks for creating this one:)
Thank you so much for this video. I've been trying to get it to work an keep stubling. I was able follow along with you and get it working properly. Looking forward to seeing some real world use cases.
By now I like your videos even before I watch them... always great stuff!
Incredible 🎉 Love the speed of innovation in this field 😊 And the fact that it is open source and being more and more localised 🙌🏻
Please make a video on LLM performance with memory usage, tokens/sec, tokens/sec vs context length. context length stress test.
I find LLM output going out of context with large context lenght.
I'm hooked on these vids
Thank you Matthew for this content! I appreciate your work. Cheers from Argentina
It seems you're always a couple hours ahead of what I'm wanting to do. Super work. Thanks for the vid.
You, by far, have the best AI videos. It would be neat to have a longer video where you orchestrate multiple models building an actual piece of software. For example: Have coder create a node.js website with a basic CSS file, then have a content writer AI write the content for the page.
I agree. I also think that incorporating different general strategies could make sense; he mostly does one-shot, but then it would be nice to see how the model responds to multi-shot. Similarly here, actually making agents instead of just creating a one-shot would have been helpful as it's the whole point of the framework.
We all want to do it but our poor brains need a chance to adopt.
We all want to do it but our poor brains need a chance to adopt.within 6 months as a team we will have wrapped our minds around it
Most of his videos is just him following tutorials and reading stuff but i guess you cannot realise by just watching videos. He is not capable of doing anything of use with Autogen and local LLMs because almost nobody can.
yeah definitely, Id say he's way up there at least from an ops perspective :)
Yeah for sure do a video optimizing autogen for these open-source models. I'm myself trying to work with them and found it very hard to orchestrating them.
You create the best videos. Thanks for taking the time and making an amazing series. For the professional video, I would really enjoying seeing a way to organize the agents into sub-teams.
Thanks. This one seems pretty advanced for me. I will look your beginner tutorials
wow @matthewberman i just want to let you know what an amazing job you are doing for all of us. Your channel is my good morning everyday before i delve into any other task. its amazing to see this pieces of tech working together and further more you make it really easy to understand, i can't thank you enough. I will keep coming everyday for more, and guess what you videos get my thumbs up even before i watch them and that's a testament of the quality of your work!! SALUTE. 🤩💥
You've been kicking ass lately, Mr. Berman.
I can already see the title for the next video "Autogen + MemGPT + Ollama/LiteLLM - Each Agent with its own Local Model + Infinite Context"
Great video as usual mate! I guess many of us wish to see some real world use cases. Hope you will find some time to spend on it, it would be much appreciated
wow, Matthew! another amazing tech review! yes I want another that autogen does something like weather/traffic API and does users scheduling accordingly!
Matthew’s tutorials work very well in my 10 years old Macintosh 💻 laptop.
I love that I saw this video like two weeks ago and it feels so old.
You need to change the python interpreter in vscode to be the conda one, to remove the error of importing the packages
great video, thanks.
This is awesome
awesome video! Do you have any videos on deploying these LLM agents to a UI?
You are amazing!
Great. I will try. thks!
Wow! You are a mind reader. I wanted this right noooooooow. ❤
You can also use `ollama list` to show the current installed models.
Cool!
Great video! Made me wonder how well agents perform function calls
You can make a powerful multi model AI using zero or few-shot classification of prompts to determine the model to use for the prompt.
Great, thanks
I definitely want to see more on fine tuning autogen to use ollama models better.
I think there's a plethora of "first step" videos on youtube because creators are understandably wary about narrowing their audience in an expert level video.
I think if you frame it properly, an expert video can drive even more traffic. If you open with the final result and get people excited about the possibilities, then they would be more likely to marathon your beginner videos, which of course you would link below.
Also, it would fill a purpose that's sorely needed: a next step video for all of us watching hundreds of "beginner videos" looking for a glimpse of where to take it.
Reality is not even 1 on 100 will try even this beginner stuff. You included because else how would you not realise he is a beginner himself.
It would be stupid from him to put a lot of effort in videos nobody would even watch.
@@sCommeSylvain How're your projects coming along?
I'm not sure why the personal attack was necessary; if you're here, you're also watching and learning. If you have expert knowledge to share with the class, I'd happily subscribe to your channel if you put out quality, useful information.
I want to see the pedestal you look down from.
Thanks!
Another excellent tutorial. Could you plan one for where the agents can save their output for example save the python code it generates or outputs the results to pdf/txt. Thanks
would love to see a video on ChatDEV! (curious to see pros and cons vs. autogen)
you did it! thankss
memgpt also please
I want to see more agents building programs. The biggest leap for devs is to code full working programs. That's the true test for any llm group, can they build a program with a beautiful GUI and great functionality?
This was super helpful! Got me started and now I'm wondering if you can help demonstrating how this stack would work with function calling. I'm using autogen+litellm+ollama+mixtral and it's all working great. Then it craps out when I introduce function calling. I can't tell where in these different stacks it may be failing. I believe I've followed all the instructions I could find, but no luck. A video or pointer would be great! Thanks Matthew
I recently came across AutoGen with Ollama/LiteLLM, and it looks quite intriguing. I'm particularly interested in using this technology with Pinokio AI. Can you provide more information or guidance on how to integrate each agent with its own local model in this context?:)
Nice tutorial!! Would be nice to see how the agents connect to a database or json file to retrieve information
Great video!!! is it possible to use a image to text model with a text to text model or is it only one kind of model
amazing
If there would have been one more version of this video using GUI of autogen for No-Code people like me, this would have been great! Just a wish. Brillient video BTW!
Great video, I was trying to use ollama LLM model for implementing RAG using autogen, using the above llm config format, but it says the model not found
Excellent material as always!
Could you explain how to do the same with an external GPU, like Runpod? I mean running multiple models on Runpod with ollama/litellm on a single GPU.
Also, what do you think about integrating AutoGen with projects like "Gorilla OpenFunctions" and "Guidance AI" to improve the function calling and response structure of open source LLMs?
Thanks!
this is so fun, ty. and can you tell us how to adjust that to make it work
Revolutionary
Real world scenario I would love to see: I give it a prompt and a repo and Autogen goes to town adding whatever functionality or bug fix I suggested in the prompt and then it creates a pull request in Github. Sure this would involve working with octocat or similar but would love to see a coder agent and a testing agent working hand in hand.
Have you taken a look at Sweep AI? It doesn't use Autogen but has functionality similar to what you are proposing.
🎉❤😂 Amazing! More more more, a full software company or marketing agency, sorry big asks, but happy as heck watching you kill this.😂
Nice tutorial. Have you managed to implement function calling with ollama models?
Thanks
Bahhhh! If only Ollama could run on Windows. Either way great video. I'd love to see how this can be fine tuned.
This is amazing... Is it possible to hook up autogen with this And to PrivateGPT?
I love how after "tell me a joke" it went on to a math problem. That shows that it learned how people use LLMs :D Eventually they'll learn your usual test cases, and will give perfect answers, but then fail in everything else :D
I’d like to see the vid of you optimizing Autogen to use these models and be successful with it
I was waiting for this video... now I just have to wait for OLLAMA to be usable under Windows. Thanks a lot
Ollama can be used with WSL on Windows
The bigger question is if Windows itself is useable anymore, haha
Thank you so much I learned a lot from your videos, so far you you give a task in the script or by Human input
how about if we need to have a ui web application for end user to send what is needed, like flask app or sending api call to give the task how would that work.
I’ve been running into context length issues with open source models and autogen. Would using a different model for each agent expand context length for each agent?
Great! Is possible to integrate this with ollama-webui ?
Excellent video, thanks. I think the agents have a bit of the problem with passing their names to the manager. I have got an error message "GroupChat select_speaker failed to resolve the next speaker's name. This is because the speaker selection OAI call returned:
```". I will need to spend some sweet moments with the pyautogen examples to find the reason for that
I am toying around with building fiction. I am not done tinkering, but I built a OpenAI Assistant with retrevial and a function call. The function call goes to a free Pinecone vector database (which has 16 some "writing thesaurus" books stored in it). Using Autogen I now have a writer that can use his "magical thesaurus" to build any type of description possible in a relevant format. So... need a description for a circus on the moon and a character with an emotional scar from space clowns?... My autogen can write that.
How would you compare using Llama/LiteLLM versus LMSTUDIO? With som many choices it can be difficult to pick the one that is least likely to result in a dead end or stops being supported.
Good concept need more refinement for prime time.
Ollama makes running local llms SO EASYYY!!
orca 2 is really good at math tasks solving. It solves math task from 3 class for me
Like so many others, this video is a true inspiration, thank you.
Very interesting. I can't get llms to use my gpu if they depend on the llama-cpp-python package, so I'm just using the bog standard models with LM Studio, but I'm always looking for alternatives where I can control input and output with python.
can you do this without ollama? i only have windows machine
Would it not make sense to tell an LLM: "given you have access to a code, a poet, a historian, etc, split the user input into the relevant prompts for each" then parse that and call ollama with each of the separate parsed inputs and their relevant agents? Then combine all the outputs into one to send back to the user?
AWESOME!!!!!!!!! Pls add MemGpt to this!
I was just wondering if AutoGen supports multimodal models? If it was hooked up with visual input, can it use its agents to identify and sort objects?
Do you have a vid or link pls for your CLI prompt?
So I use a HP laptop, no GPU. Will it run pretty fast as shown in the video if I run mistral using olama as well?
How do you specify a port when spawning new LiteLLM instance?
Thank you very much Matthew for this amazing video. When I run the program I only get this response in terminal:
user_proxy (to Coder):
Write a python script to output numbers 1 to 100
---------------------------------------------------------------------------------------
and it does not continue with the execution of the script.
Do you know why this is happening?
Would taskweaver be set up the same way mostly?
Thanks !
If you have the error : TypeError: 'NoneType' object is not iterable,
Than you have to add : "cache_seed": 42,
like this :
llm_config_mistral={
"config_list": config_list_mistral,
"cache_seed": 42,
}
llm_config_codellama={
"config_list": config_list_codellama,
"cache_seed": 42,
}
After that it worked for me
Do we have a model that allows us to upload structural data (csv,xlsx)? We can create an agent that performs data analysis, create ML models on those data locally.
What's the difference between the teachable agent and the MemGPT agent? How do vector databases help the agents with recall?
Does the command `ollama rm ` actually delete the model from the filesystem?
Can you turn a website hosting server into a local llm instead of using api to connect to gpt? Like a WordPress plugging that isn't connected to chatGPT but to a local llm installed on the server hardware
Is it possible to hook up autogen with custom model created with chatGPT?
Do any of the updates to autogen make your previous videos irrelevant/less accurate?
Matthew - I just saw your video on LM Studio - why are you using ollama instead of LM Studio?
Let's do a live product dev with this One agent - One model ... with all bells and whistle possible.
Does Ollama run on windows yet ?
Can you make a video on how to run these models locally on a GPU (like we would any other model alone, without Ollama)? Thanks!