How To Use AutoGen With ANY Open-Source LLM FREE (Under 5 min!)
ฝัง
- เผยแพร่เมื่อ 16 ต.ค. 2023
- A short video on how to use any open-source model with AutoGen easily using LMStudio. I wanted to get this video out so you all can start playing with it, but I'm still figuring out how to get the best results using a non-GPT4 model.
Enjoy :)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
AutoGen Beginner Tutorial - • AutoGen Tutorial 🚀 Cre...
AutoGen Intermediate Tutorial - • AutoGen FULL Tutorial ...
AutoGen - microsoft.github.io/autogen
LMStudio - lmstudio.ai/ - วิทยาศาสตร์และเทคโนโลยี
Should I do a full review of LMStudio?
Absolutely!. Being able to self host an LLM which generates an API is amazing!
Absolutely! Seems better than oobabooga/text-generation-webui doesn't it?
I would like to see if it's possible to make it use AI "character templates" downloaded in json format for instance, or as embedded chunks in an image file. Basically, can it act either directly as a replacement for TavernAI and similar, or can it replace Oobabooga as the server running behind the scenes of TavernAI.
How 'completly free, completly opencource' if LMStudio seems proprietary and for mac/windows only, no Linux?
Yes! Please do!
Absolutely, questions that are interesting to me: Can I host on another machine than where I am using LM-Studio? I have a dedicated linux machine, but would love to use windows to work with the llm via api. Also, does it support multi-gpu setups? Data-parallelism and inference?
All I can say is thank you for your videos, you give enough information to get things up and running without making it overkill. Please keep making more videos like these I am learning soooo much ...
You are a lifesaver, you give so much top-notch content for free. I am about to start-out in a startup where we plan to use a mix of GenAI (driven by stuff like Autogen) and traditional ML models (I wonder if we will need these ever again in the future) with some RPA to spce things up. These videos of yours have given me full coverage of what I will need to do as far as the GenAi side of things is concerned, that is very new for me.
I have to say, I was struggling with this exact task...getting an open source model to load up and expose a openai API endpoint. Awesome content as usual!
Thank you for this timely video in my rough AI journey. I feel this is the boost I needed.
Man, thanks! Can't believe how easy it is -- was a great idea to build LMStudio to mimic the OpenAI API. Definitely looking forward to seeing more content on your exploration of this!
You're welcome!
I followed this video guide, and unfortunately I haven't figured out how to fix "KeyError: "choices'" in completion .py, or "AttributeError: 'str' object has no attribute 'get' "
Seems like AutoGen files still need some code upgrading (mine is ver 0.1.11)
I followed this, but it misses to generate the .py file in coding folder. I confirmed the folder name etc. But it seems like AutoGen search for specific key in order to understand if it's code or text coming from chat response and there LM studio API fails.
same issue im getting@@lomek4559 @matthew_berman
Finally something local and totally free, amazing thanks, it's been a long wait!
I was waiting for this ever since Autogen came out. Thanks :)
Hope you like it!
Matthew, thanks for your time. You do an amazing job promoting these things in the way that you do. Thanks again.
You are the best! You sometimes feed us directly what we need and sometimes you teach how to catch the fish. In this video, you did both.👏
Believe it or not. I find myself in the first minute unconsciously pressing the like button. you are my hero. Please keep putting up beautiful content like this.
I would love to see a video about how to fine tune a local model with your own files. Like several text or pdf.
You might just need RAG rather than fine-tuning.
What's a RAG?
following@@LiberyTree
@@LiberyTreeRetrieval Augmented Generation. Basically embeddings and vector database
@@matthew_berman oh I see. Thank you I didn’t know about Rag, it look like what I need exactly.
I'm so glad I found your channel. When I watch your videos, I feel confident that I know the latest info on AI and how to best utilize these tools. Thank you for doing what you do! You rock! 🤘🤘🤘
Thank You so much for showcasing this. I've been using LM Studio and GTP4All for a few months now and I really like them. One problem - I could not make LM Studio to use my GPU, tho other people are successful.
Wow super nice stuff!! This is what I was waiting for! It makes it so easy to use llms basically anywhere, Amazing!! Thanks for sharing this ultra valueable content with us 🙏🏼🙏🏼🙏🏼
I love the way you teach and explain stuff. It is the right tone for me AND you look like Gale from Breaking Bad.
Amazing work, thank you so much for sharing this! Now let's make this a start of a new era of locally running autonomous assistants which are actually helpful and free to use.
Hell yeah. What a time to be alive!
Thank you for this! WOW, this runs very well on my laptop. I'm playing with Mistral right now, and so far it's great!
This is a great video, makes it very easy to setup. One issue I encountered was a limit of 199 tokens was reached, which seems to be a default. You might want to add "max_tokens": -1, to your llm_config, or to some more reasonable number as it seems 199 is very easy to hit and then the output just stops.
1. Was hoping to see the chat interface. Wondering why you had to hardcode the initial prompt after the assistants were created.
2. LM studio is cool. I recently had ChatGPT create a streamlit front end for my Autogen app. Would love to see you go through this as well.
I should have sent LM Studio to you a while ago. I thought abot it, but I never know what you already know about or how helpful it would be to send stuff to you. Glad you found it though. It has really changed the way that I interact with LLMs, not to mention the frequency because of the ease of use.
I tried to get this done myself so this will save me a lot of time lol thank you!
Glad I could help!
First off - Thanks for all the content and it is very evident that you put a lot of effort into research time for the each upload. That being said, there's one small suggestion I'd like to make, if you could include links to repositories and tools, brought up in your videos. Often, I find myself wanting to play around and learn more about a showcased tool, but without direct links, it can sometimes be a bit of a hunt. I understand that adding links might take a bit of extra time, but I believe it impove your channel reviews.
Links to autogen and lm studio are currently in the description.
@@zyxwvutsrqponmlkh Thank you for updating that.
@@zyxwvutsrqponmlkh The problem is that Autogen is changing rapidly, and a number of links in Matthew's descriptions no longer work. So far, on Autogen's site, one link I have found does not work. The code would make it easier to follow along.
This channel has everything an After Effects intro and a pastel hoodie.
LETS GO!!! THE ONE WE NEEDED! MY MAN! THIS IS WHY WE SUB!
Thank you!
BIG thanks Mat this is by far one of the most useful videos. Just fyi I have a strange behavior when I run it, the assistant gives more requests to the user_proxy than the ones I make (apart from requesting numbers from 1 to 100 it is requesting a fibonacci sequence nobody asked for), also there is a warning that does not interfere with the result but I was not expecting that "SIGALRM is not supported on Windows. No timeout will be enforced". Again, thanks.
It will be interesting to see which open models work best with this.
I suspect that we soon will run different models for different roles. It could compensate a lot for not having the size of the GPT4.
🎯(MoE) Mixture of Experts is what thats called. It's documented and people are starting to realize the benefits of this approach. One barrier to MoE approach is the amount of memory it cost to keep multiple models hanging around. But, overall it's still a huge improvement. and I suspect it will gain even more traction since open source models keep getting smaller in size and better in quality.
@@tvwithtiffani MoE, cool, I had not heard that definision before. Ty for sharing. Ram is luckily not the most expensive or power hungry. It might be possible for the project leader model to decide what models to involve and when. So that some models are not called until the end of a project. We are truly living in interesting times.
Having great luck with Zephyr Mistral 7B model. My only challenge right now is getting it to terminate once it completes the coding task - it keeps going with its own stuff.
I think this would be the perfect time where Chain of thought or a system like moe might help. Before returning the code, pass it by inference once more time and this time ask it to make the code concise and focused on the user's request.@@IslandDave007
I agree that multi agent structure will become more popular in the medium term because it makes applications more reliable, transparent and allows using smaller, more specialised models.
Thanks. I am struggling to get useful stuff out of Autogen and local llms. The timeout thing seemed to be useful. I am getting empty strings and running on LLM sessions. I am about to try a larger model and a higher quantised number for Mistral instruct. This is my prompt "Find ways to store and connect arxiv papers programmatically". Keep up the good work.
I don't know how I'm just now stumbling upon your content, damn algorithms. Would LOVE to see this improved! Cheers
lol. welcome!
OK, now I am totally convinced to start playing with AutoGen :D thanks mate
This is awesome! it's always been what deters me from going all in the AI-Agent world wsa the cost but having this completely local is a game changer. I have it working as we speak using Mistral 7b, on my POS Ryzen with 4gb VRAM 16 gb ram. I really didn't think any of this would work but low and behold. Thanks man, you made my week with this video.
Have you run into the api_type error code in config?
Keep us posted, awesome job man! :)
His job is incredible.
It is challenging to keep up with all the information presented in this TH-cam channel.
Great stuff! I'm waiting for the update of LM studio, so that you can customize the prompt template not only for chat but also for the server. BTW I've just tested autogen with Zephyr locally :) this will save me a lot of $$$ when playing with autogen :)
Did it work well? Did you run into any errors?
'fully local' , i love this word ❤. thank you bro , for your professional info, tuto...
My pleasure!
Thanks man. This open up a lot of very interesting stuff to try.
This is amazing. Please make more videos on this. It would be interesting to see a couple of Python (Data Science, ML, Games) projects being created with assistance from AutoGen.
Awesome find, Matt.
FYI. GPT4All has similar functionality and now supports GPU inference too. I might be worth checking that one out again too. Thanks for the content!
Well done ✅
Thats great news!
Kool. Iv been trying to do this using text generation web ui . This looks way easier. Awesome man thanks!
The reason it fails to run completion is because the output format is different between different models. I fixed it by appending "Mimic gpt-4 output format." to the prompt of the UserProxyAgent and the basic autogen example of plotting a chart of NVDA and TESLA worked! The model I used was codellama-13b-q5_0_gguf on a M1 Max/32GB RAM.
Your videos are very easy to understand and very helpful. Thank you!
super! so helpful Matthew! thanks!
it sounds like there needs to be a written cut off! like ''end if''.
I downloaded LM Studio, and saw that there is a new update right now that makes the end prompt customizable, so maybe that should fix it and terminate it.
Thanks for the videos! I learn so much from them! Could you possibly show how to assign different LLMs to different agents?
That'll be in my advanced tutorial, coming next week most likely.
wow i got it working, and this is Amazing... thank you Matt...
oh that looks way better than textgenui. I check this out this weekend. Planned to use autogen or metagpt as soon as I can use selfhosted LLMs because I have beefy enough setup (33B Models work fine, Have to check 70B lama but maybe thats too slow)
Very promising! Thanks. It could be use with DB-GPT as well ❤
This is awesome! Thank you for sharing!
I will be happy to focus in free source models (non-openAI ) in future 😊
ngl...i was searching for this since last two weeks....perfekt❤❤
Enjoy!
Just ran across this as I'm leaving work. Can't wait to see this when I get home!
Have fun!
@@matthew_berman this is certainly a game changer
Oh wow, this is awesome. Thanks for sharing.
Yeah!! Now it's just matter of time before we have open source GPT-4
💯
Unfortunately not. "Just" the prompt template and the model finetuning are 99% of the work. The things in this video are mostly tools to reduce the boilerplate and don't contribute to inference quality by themselfes.
I watched the video in hopes it would contain some magic bullet to tackle the core inference quality problem.
Still a good video though.
Great video! Subscribed
Thanks for the great tutorial! 🙏
This is amazing! Thanks so much
Just what I was waiting for. Thanks, Mattew
No problem!
It was a needed utility for sure!
Its amazing! Thank you!
I was waiting for this and bam you delivered! I could not find the intermediate video on your channel you mentioned @ 2:50. Please share a link, thanks for your hard work!
Link is in the description :)
You're a legend my friend. Keep up the amazing work.
would love to see something like this for a linux distro :D
Thank you so much for your video🥰
Thanks for all your great videos about AutoGen, Mathew. I'm wondering if there is a way to use AutoGen framework with an AWS API gateway since my LLM is hosted on AWS EC2
Mind Blown!!
Maybe you could do chatdev + LM studio? Great work on this one
Hi Matthew. Using this local models, what's the best way to train it with your data?
I’m liking this method, is it better than textgen webui with the same LLM installed? My prompts are working as well, haven’t configured it yet to AutoGen but it’s knows how when I use my prompts
Great video!
We need to see it working with Open-Source Model. Thanks Bala-blue
Thanks for your great video: I'm using AutoGen with Dolphin 2 installed locally though LM Studio. I want to understand if there is some differences between using "send" function of AutoGen and using the chat integrated in LM Studio because using the same model and the same prompt I obtain pretty good results in the chat integrated in LM Studio and very low-accuracy results using the send functon of AutoGen. Am I missing something?
In detail: at first, I use a "UserProxyAgent" to initiate a chat with an AssistantAgent and then I use the send function to the same AssistantAgent to have other interactions with him.
awesome video man !!!
just quick question i tried to run this but it used CPU for all the tasks.
anyway to make it run on GPU ?
Awesome, thanks so much.
If the goal is to save money (not privacy), perhaps add a GPT-4 agent that only gets involved when Mistral fails.
Reflexion is perhaps the best technique for code gen: Test code is generated before the implementation, and the agent run the tests to ensure implementation code is correct, up to 10 times before giving up. When it does give up, pass the best attempt to GPT-4 to fix. Fixing existing code should require much fewer tokens than from-scratch generation. Look at how Aider does it.
Thanks as always! Can you please share your thoughts on Petals. You mentioned it a long time ago. Has your opinion changed since then?
I need to check it out again. It was awesome but too complicated to set up for most poeple.
It would be interesting to see how fast the api server is, compared to VLLM which also has an openai api but claims to be much fast than everything else out there.
great
will make a followup where u add memgpt to it?
that would be awesome
Gpt 4 works well on coding because of byte-pair-encoding, compared to pairwise because the structure in language. so maybe try just dummy caches of random conjunctions words(like if and or the and etc) to confuse the decoding maybe
Thank youuuuu sooooooo much!
Is it possible to have a tutorial to use autogen with remote LLMs on runpod ?
This is awesome! Do you know if there is a way to use local LLMs for Aider?
Legend ❤ Let's see if it works 😃
Thanks so much for the guidance here Matthew. I've managed to get it stood up, and even running in group-chat mode. I'm noticing however that prompts seem to be cut short far too soon, and if I set "max_tokens": -1, they run on indefinitely (stopped one agent at 6000tkns after repeating itself a bunch of times. Is there a clever way to stop this behaviour that you know of?
Fantastic!!
Like it but can it run on Linux though we have ollama for hosting llms but its not having multi threading support in it
Can you connect these autogen models to a vector database like the Langchain agents do? To use a tool when needed and not programaticaly force to do it?
need more detailed review for this :D
Can we use it with Google's Palm API (text-bison) model? If yes then is it same as creating a local server which return responses from the palm api?
Can you include a link, like you mentioned in the video to AutoGen?
Also, can you link to the script that you were using in the video?
I'm assuming it's on your GitHub repository.
Hi, is it possible to use two different model simultaneously. For example one GPT4 and second my model fined tuned for special task. During the group chat, and whenever model is appropriate use the one.
Also curious
Sure, its coding you can code anything
Could you do a video with a comparison of Ollama and LMStudio? For me it seems they both serve the same purpose and pros and cons are unclear to me. Thanks a lot
What would be an alternative to LM studio, if you have mac as with Intel?
Looks great! But I can't make it to work through proxy... Maybe on a future update? or is there actually a way to do it?
When using these local LLMs, what would be the best computer set up to make it operate smoothly?
This is what I wonder too. I am trying to decide what to buy to be able to run a setup like this.
Hey Matthew , thanks for Video, could you tell me how to extract final output value ,once agents are agree
THANK YOU😊
How would you incorporate Ollama into this where you don't launch a server with a specific model but can call the model out in your actual Autogen code?
After my fail attempt to make autogen run using python, this might look very promising because I am already using LM Studio
I do a lot of work with books and large texts, I need something with lots of memory, is trainable, and works as well as Claude does with text. Is there such a thing that can be hosted locally?
That really good lm studio is cool and also free also i want to run autogen for long time but i dont have the money to buy tokens for openai
Hey, there's a problem with the maximum tokens, as this models only allows up tp 2048 tokens of context. Is there a way to overcome this? Like setting a max token setting so the prompt truncates?
I haven't tried this specific setup yet but, you should be able to set token limits in your oai setup portion of the agent or group chat. I have had many issues with attempting unlimited with actual open ai API requests and failure logs being included in the memory uploaded to the prompt each time. If using autogen, you can open the classes and modify things on there as well.
The best option is to run a few different LLM models locally, but for this, we need a lot of memory.