Power Each AI Agent With A Different LOCAL LLM (AutoGen + Ollama Tutorial)

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2023
  • In this video, I show you how to power AutoGen AI agents using individual open-source models per AI agent, this is going to be the future AI tech stack for running AI agents locally. Models are powered by Ollama and the API is exposed using LiteLLM.
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Instructions - gist.github.com/mberman84/ea2...
    Ollama - ollama.ai
    LiteLLM - litellm.ai/
    AutoGen - github.com/microsoft/autogen
    • AutoGen Agents with Un...
    • AutoGen Advanced Tutor...
    • Use AutoGen with ANY O...
    • How To Use AutoGen Wit...
    • AutoGen FULL Tutorial ...
    • AutoGen Tutorial 🚀 Cre...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 200

  • @chieftron
    @chieftron 7 หลายเดือนก่อน +5

    This is what I've been waiting for, for like a year! Complete localization!!!

  • @supernewuser
    @supernewuser 7 หลายเดือนก่อน +5

    I just did this exact thing a few days ago. It's crazy how quickly things are moving and how fast everyone is getting to the same page.

  • @bensheridan-edwards876
    @bensheridan-edwards876 7 หลายเดือนก่อน +88

    Would love to see a tutorial of how to integrate MemGPT with this multi-agent architecture and would it make more sense to have one memory per model or one centralised memory.

    • @bensheridan-edwards876
      @bensheridan-edwards876 7 หลายเดือนก่อน +9

      @ thank you for this info, do you know what the use-case would be for MemGPT over Teachable agents in that case? Say I wanted to build a ChatBot that remembered user conversations over a long time period. Would it make more sense to use Teachable agents or MemGPT for the higher memory ability?

    •  7 หลายเดือนก่อน

      @@bensheridan-edwards876 there is a video on the channel about Teachable agents

    • @leonwinkel6084
      @leonwinkel6084 7 หลายเดือนก่อน

      Jap would like to see this too

    • @cesarromero936
      @cesarromero936 7 หลายเดือนก่อน +2

      Yes! This is what I wanted to comment!! And also! Fine tuned MODELS!

    • @cesarromero936
      @cesarromero936 7 หลายเดือนก่อน +1

      ​@ Really?? Any link to get more info about teachable agents?

  • @sullygoes9966
    @sullygoes9966 7 หลายเดือนก่อน +16

    I really enjoyed the pacing here, just enough detail on bits that may be unfamiliar (installing ollama, etc.) without getting bogged down. Nice video!

  • @agentDueDiligence
    @agentDueDiligence 7 หลายเดือนก่อน +1

    Matthew!
    I really love that you are so autogen focussed!
    Thank you

  •  7 หลายเดือนก่อน +26

    just need to run ollama serve, pull the models to the server and run litellm without any comand, and call the models direct from autogen model="ollama/model_name", dont need 2 instances of the server

    • @OlivierLEVILLAIN
      @OlivierLEVILLAIN 7 หลายเดือนก่อน +1

      How do you pull the models to the server?

    • @OlivierLEVILLAIN
      @OlivierLEVILLAIN 7 หลายเดือนก่อน +3

      just run 'ollama pull '

    • @MrAngeloniStephen
      @MrAngeloniStephen 6 หลายเดือนก่อน +1

      Would it keep each model in memory and switch as fast as it would by having them loaded separately ?

  • @santiagomartinez3417
    @santiagomartinez3417 7 หลายเดือนก่อน

    Great work man, much appreciated, this things evolves so fast.

  • @nothing_is_real_0000
    @nothing_is_real_0000 7 หลายเดือนก่อน +1

    Thank you so much Matthew, you couldn't have created at a better time for me. Just want to thank you!! I'm really learning a lot from your tutorials!!!

  • @ianparker2238
    @ianparker2238 5 หลายเดือนก่อน

    Great video, your channel is without a doubt one of the best I've found for useful and practical advice on setting up LLMs for local use.
    👍👍👍

  • @ericeide6230
    @ericeide6230 7 หลายเดือนก่อน +2

    Hey man, not sure if you'll see this but you are quickly turning into one of my favorite youtube channels! I'm really glad you made this video because it's just about perfect for a project I'm in the brainstorming phase of!

  • @Weeping81
    @Weeping81 7 หลายเดือนก่อน

    This is awesome. Just what I was looking for to play with this weekend!

  • @93cutty
    @93cutty 7 หลายเดือนก่อน +2

    I'm just about done getting my daily work stuff done, was about to jump into coding here soon! Listening now and will listen again/follow along soon

  • @AndrewPeltekci
    @AndrewPeltekci 7 หลายเดือนก่อน +1

    Damn bro. You are always reading my mind and coming out with the right video shortly after

  • @orkutmuratyilmaz
    @orkutmuratyilmaz 7 หลายเดือนก่อน

    At last! Thanks for creating this one:)

  • @chrisBruner
    @chrisBruner 7 หลายเดือนก่อน

    Thank you so much for this video. I've been trying to get it to work an keep stubling. I was able follow along with you and get it working properly. Looking forward to seeing some real world use cases.

  • @SvennoCammitis
    @SvennoCammitis 7 หลายเดือนก่อน

    By now I like your videos even before I watch them... always great stuff!

  • @MrGaborKukucska
    @MrGaborKukucska 7 หลายเดือนก่อน

    Incredible 🎉 Love the speed of innovation in this field 😊 And the fact that it is open source and being more and more localised 🙌🏻

  • @good_king2024
    @good_king2024 7 หลายเดือนก่อน +8

    Please make a video on LLM performance with memory usage, tokens/sec, tokens/sec vs context length. context length stress test.
    I find LLM output going out of context with large context lenght.

  • @PaulDominguez
    @PaulDominguez 7 หลายเดือนก่อน +1

    I'm hooked on these vids

  • @EduardoJGaido
    @EduardoJGaido 7 หลายเดือนก่อน

    Thank you Matthew for this content! I appreciate your work. Cheers from Argentina

  • @basementadmin
    @basementadmin 7 หลายเดือนก่อน +2

    It seems you're always a couple hours ahead of what I'm wanting to do. Super work. Thanks for the vid.

  • @EricWimsatt
    @EricWimsatt 7 หลายเดือนก่อน +51

    You, by far, have the best AI videos. It would be neat to have a longer video where you orchestrate multiple models building an actual piece of software. For example: Have coder create a node.js website with a basic CSS file, then have a content writer AI write the content for the page.

    • @WiseWeeabo
      @WiseWeeabo 7 หลายเดือนก่อน +1

      I agree. I also think that incorporating different general strategies could make sense; he mostly does one-shot, but then it would be nice to see how the model responds to multi-shot. Similarly here, actually making agents instead of just creating a one-shot would have been helpful as it's the whole point of the framework.

    • @user-ug3pf3uw6x
      @user-ug3pf3uw6x 7 หลายเดือนก่อน

      We all want to do it but our poor brains need a chance to adopt.

    • @user-ug3pf3uw6x
      @user-ug3pf3uw6x 7 หลายเดือนก่อน

      We all want to do it but our poor brains need a chance to adopt.within 6 months as a team we will have wrapped our minds around it

    • @sCommeSylvain
      @sCommeSylvain 7 หลายเดือนก่อน +1

      Most of his videos is just him following tutorials and reading stuff but i guess you cannot realise by just watching videos. He is not capable of doing anything of use with Autogen and local LLMs because almost nobody can.

    • @robertotomas
      @robertotomas 7 หลายเดือนก่อน +1

      yeah definitely, Id say he's way up there at least from an ops perspective :)

  • @tiagocmau
    @tiagocmau 7 หลายเดือนก่อน +3

    Yeah for sure do a video optimizing autogen for these open-source models. I'm myself trying to work with them and found it very hard to orchestrating them.

  • @JohnLewis-old
    @JohnLewis-old 7 หลายเดือนก่อน +6

    You create the best videos. Thanks for taking the time and making an amazing series. For the professional video, I would really enjoying seeing a way to organize the agents into sub-teams.

  • @stanTrX
    @stanTrX 2 หลายเดือนก่อน

    Thanks. This one seems pretty advanced for me. I will look your beginner tutorials

  • @numbaeight
    @numbaeight 7 หลายเดือนก่อน

    wow @matthewberman i just want to let you know what an amazing job you are doing for all of us. Your channel is my good morning everyday before i delve into any other task. its amazing to see this pieces of tech working together and further more you make it really easy to understand, i can't thank you enough. I will keep coming everyday for more, and guess what you videos get my thumbs up even before i watch them and that's a testament of the quality of your work!! SALUTE. 🤩💥

  • @avi7278
    @avi7278 7 หลายเดือนก่อน

    You've been kicking ass lately, Mr. Berman.

  • @AndrewPeltekci
    @AndrewPeltekci 7 หลายเดือนก่อน +3

    I can already see the title for the next video "Autogen + MemGPT + Ollama/LiteLLM - Each Agent with its own Local Model + Infinite Context"

  • @pioggiadifuoco7522
    @pioggiadifuoco7522 7 หลายเดือนก่อน +3

    Great video as usual mate! I guess many of us wish to see some real world use cases. Hope you will find some time to spend on it, it would be much appreciated

  • @taeyangoh7305
    @taeyangoh7305 7 หลายเดือนก่อน +2

    wow, Matthew! another amazing tech review! yes I want another that autogen does something like weather/traffic API and does users scheduling accordingly!

    • @ajarivas72
      @ajarivas72 6 หลายเดือนก่อน

      Matthew’s tutorials work very well in my 10 years old Macintosh 💻 laptop.

  • @EffortlessEthan
    @EffortlessEthan 7 หลายเดือนก่อน

    I love that I saw this video like two weeks ago and it feels so old.

  • @neocrz
    @neocrz 7 หลายเดือนก่อน +5

    You need to change the python interpreter in vscode to be the conda one, to remove the error of importing the packages

  • @rafael.gildin
    @rafael.gildin 7 หลายเดือนก่อน

    great video, thanks.

  • @SethuIyer95
    @SethuIyer95 7 หลายเดือนก่อน

    This is awesome

  • @jasonsalgado4917
    @jasonsalgado4917 7 หลายเดือนก่อน +1

    awesome video! Do you have any videos on deploying these LLM agents to a UI?

  • @forcanadaru
    @forcanadaru 7 หลายเดือนก่อน

    You are amazing!

  • @leandroimail
    @leandroimail 7 หลายเดือนก่อน

    Great. I will try. thks!

  • @mcusson2
    @mcusson2 7 หลายเดือนก่อน

    Wow! You are a mind reader. I wanted this right noooooooow. ❤

  • @lemonkey
    @lemonkey 7 หลายเดือนก่อน +1

    You can also use `ollama list` to show the current installed models.

  • @curtkeisler7623
    @curtkeisler7623 7 หลายเดือนก่อน +2

    Cool!

  • @qwertyuuytrewq825
    @qwertyuuytrewq825 7 หลายเดือนก่อน

    Great video! Made me wonder how well agents perform function calls

  • @MagusArtStudios
    @MagusArtStudios 7 หลายเดือนก่อน

    You can make a powerful multi model AI using zero or few-shot classification of prompts to determine the model to use for the prompt.

  • @pavellegkodymov4295
    @pavellegkodymov4295 7 หลายเดือนก่อน

    Great, thanks

  • @user-sl5je7mt1t
    @user-sl5je7mt1t 5 หลายเดือนก่อน

    I definitely want to see more on fine tuning autogen to use ollama models better.

  • @southVpaw
    @southVpaw 7 หลายเดือนก่อน +6

    I think there's a plethora of "first step" videos on youtube because creators are understandably wary about narrowing their audience in an expert level video.
    I think if you frame it properly, an expert video can drive even more traffic. If you open with the final result and get people excited about the possibilities, then they would be more likely to marathon your beginner videos, which of course you would link below.
    Also, it would fill a purpose that's sorely needed: a next step video for all of us watching hundreds of "beginner videos" looking for a glimpse of where to take it.

    • @sCommeSylvain
      @sCommeSylvain 7 หลายเดือนก่อน

      Reality is not even 1 on 100 will try even this beginner stuff. You included because else how would you not realise he is a beginner himself.
      It would be stupid from him to put a lot of effort in videos nobody would even watch.

    • @southVpaw
      @southVpaw 7 หลายเดือนก่อน +1

      @@sCommeSylvain How're your projects coming along?
      I'm not sure why the personal attack was necessary; if you're here, you're also watching and learning. If you have expert knowledge to share with the class, I'd happily subscribe to your channel if you put out quality, useful information.
      I want to see the pedestal you look down from.

  • @forcanadaru
    @forcanadaru 7 หลายเดือนก่อน

    Thanks!

  • @thefutureisbright
    @thefutureisbright 7 หลายเดือนก่อน

    Another excellent tutorial. Could you plan one for where the agents can save their output for example save the python code it generates or outputs the results to pdf/txt. Thanks

  • @javi_park
    @javi_park 7 หลายเดือนก่อน +1

    would love to see a video on ChatDEV! (curious to see pros and cons vs. autogen)

  • @InsightCrypto
    @InsightCrypto 7 หลายเดือนก่อน

    you did it! thankss

    • @InsightCrypto
      @InsightCrypto 7 หลายเดือนก่อน

      memgpt also please

  • @punishedproduct
    @punishedproduct 4 หลายเดือนก่อน

    I want to see more agents building programs. The biggest leap for devs is to code full working programs. That's the true test for any llm group, can they build a program with a beautiful GUI and great functionality?

  • @yourbuddyles
    @yourbuddyles 7 หลายเดือนก่อน

    This was super helpful! Got me started and now I'm wondering if you can help demonstrating how this stack would work with function calling. I'm using autogen+litellm+ollama+mixtral and it's all working great. Then it craps out when I introduce function calling. I can't tell where in these different stacks it may be failing. I believe I've followed all the instructions I could find, but no luck. A video or pointer would be great! Thanks Matthew

  • @jeffkimball8239
    @jeffkimball8239 7 หลายเดือนก่อน +1

    I recently came across AutoGen with Ollama/LiteLLM, and it looks quite intriguing. I'm particularly interested in using this technology with Pinokio AI. Can you provide more information or guidance on how to integrate each agent with its own local model in this context?:)

  • @fbravoc9748
    @fbravoc9748 6 หลายเดือนก่อน

    Nice tutorial!! Would be nice to see how the agents connect to a database or json file to retrieve information

  • @maalonszuman491
    @maalonszuman491 7 หลายเดือนก่อน

    Great video!!! is it possible to use a image to text model with a text to text model or is it only one kind of model

  • @roboteck-ld7ud
    @roboteck-ld7ud 3 หลายเดือนก่อน

    amazing

  • @amitjangra6454
    @amitjangra6454 4 หลายเดือนก่อน

    If there would have been one more version of this video using GUI of autogen for No-Code people like me, this would have been great! Just a wish. Brillient video BTW!

  • @developer8726
    @developer8726 7 หลายเดือนก่อน

    Great video, I was trying to use ollama LLM model for implementing RAG using autogen, using the above llm config format, but it says the model not found

  • @marianosebastianb
    @marianosebastianb 7 หลายเดือนก่อน

    Excellent material as always!
    Could you explain how to do the same with an external GPU, like Runpod? I mean running multiple models on Runpod with ollama/litellm on a single GPU.
    Also, what do you think about integrating AutoGen with projects like "Gorilla OpenFunctions" and "Guidance AI" to improve the function calling and response structure of open source LLMs?
    Thanks!

  • @huiping192
    @huiping192 7 หลายเดือนก่อน

    this is so fun, ty. and can you tell us how to adjust that to make it work

  • @ALFTHADRADDAD
    @ALFTHADRADDAD 7 หลายเดือนก่อน

    Revolutionary

  • @tclark
    @tclark 7 หลายเดือนก่อน +2

    Real world scenario I would love to see: I give it a prompt and a repo and Autogen goes to town adding whatever functionality or bug fix I suggested in the prompt and then it creates a pull request in Github. Sure this would involve working with octocat or similar but would love to see a coder agent and a testing agent working hand in hand.

    • @berubejd
      @berubejd 7 หลายเดือนก่อน

      Have you taken a look at Sweep AI? It doesn't use Autogen but has functionality similar to what you are proposing.

  • @eugenetapang
    @eugenetapang 7 หลายเดือนก่อน

    🎉❤😂 Amazing! More more more, a full software company or marketing agency, sorry big asks, but happy as heck watching you kill this.😂

  • @nikoG2000
    @nikoG2000 6 หลายเดือนก่อน

    Nice tutorial. Have you managed to implement function calling with ollama models?

  • @gw1284
    @gw1284 7 หลายเดือนก่อน

    Thanks

  • @MakilHeru
    @MakilHeru 7 หลายเดือนก่อน

    Bahhhh! If only Ollama could run on Windows. Either way great video. I'd love to see how this can be fine tuned.

  • @jessem2176
    @jessem2176 7 หลายเดือนก่อน +1

    This is amazing... Is it possible to hook up autogen with this And to PrivateGPT?

  • @marcellsimon2129
    @marcellsimon2129 7 หลายเดือนก่อน

    I love how after "tell me a joke" it went on to a math problem. That shows that it learned how people use LLMs :D Eventually they'll learn your usual test cases, and will give perfect answers, but then fail in everything else :D

  • @proterotype
    @proterotype 6 หลายเดือนก่อน

    I’d like to see the vid of you optimizing Autogen to use these models and be successful with it

  • @MrSuntask
    @MrSuntask 7 หลายเดือนก่อน

    I was waiting for this video... now I just have to wait for OLLAMA to be usable under Windows. Thanks a lot

    • @OlivierLEVILLAIN
      @OlivierLEVILLAIN 7 หลายเดือนก่อน +1

      Ollama can be used with WSL on Windows

    • @FloodGold
      @FloodGold 7 หลายเดือนก่อน

      The bigger question is if Windows itself is useable anymore, haha

  • @IlovegyptForall
    @IlovegyptForall 7 หลายเดือนก่อน

    Thank you so much I learned a lot from your videos, so far you you give a task in the script or by Human input
    how about if we need to have a ui web application for end user to send what is needed, like flask app or sending api call to give the task how would that work.

  • @nigeldogg
    @nigeldogg 7 หลายเดือนก่อน +1

    I’ve been running into context length issues with open source models and autogen. Would using a different model for each agent expand context length for each agent?

  • @luigitech3169
    @luigitech3169 7 หลายเดือนก่อน +2

    Great! Is possible to integrate this with ollama-webui ?

  • @slightlyarrogant
    @slightlyarrogant 7 หลายเดือนก่อน +1

    Excellent video, thanks. I think the agents have a bit of the problem with passing their names to the manager. I have got an error message "GroupChat select_speaker failed to resolve the next speaker's name. This is because the speaker selection OAI call returned:
    ```". I will need to spend some sweet moments with the pyautogen examples to find the reason for that

  • @daxam008
    @daxam008 7 หลายเดือนก่อน

    I am toying around with building fiction. I am not done tinkering, but I built a OpenAI Assistant with retrevial and a function call. The function call goes to a free Pinecone vector database (which has 16 some "writing thesaurus" books stored in it). Using Autogen I now have a writer that can use his "magical thesaurus" to build any type of description possible in a relevant format. So... need a description for a circus on the moon and a character with an emotional scar from space clowns?... My autogen can write that.

  • @robertvoelk4738
    @robertvoelk4738 7 หลายเดือนก่อน +1

    How would you compare using Llama/LiteLLM versus LMSTUDIO? With som many choices it can be difficult to pick the one that is least likely to result in a dead end or stops being supported.

  • @ikjb8561
    @ikjb8561 7 หลายเดือนก่อน

    Good concept need more refinement for prime time.

  • @corykeane
    @corykeane 7 หลายเดือนก่อน

    Ollama makes running local llms SO EASYYY!!

  • @xor2003
    @xor2003 7 หลายเดือนก่อน

    orca 2 is really good at math tasks solving. It solves math task from 3 class for me

  • @BUY_YOUTUB_VIEWS_g0g97
    @BUY_YOUTUB_VIEWS_g0g97 7 หลายเดือนก่อน

    Like so many others, this video is a true inspiration, thank you.

  • @franciscomagalhaes7457
    @franciscomagalhaes7457 7 หลายเดือนก่อน

    Very interesting. I can't get llms to use my gpu if they depend on the llama-cpp-python package, so I'm just using the bog standard models with LM Studio, but I'm always looking for alternatives where I can control input and output with python.

  • @ALTINSEA1
    @ALTINSEA1 7 หลายเดือนก่อน +2

    can you do this without ollama? i only have windows machine

  • @blakelee4555
    @blakelee4555 7 หลายเดือนก่อน +1

    Would it not make sense to tell an LLM: "given you have access to a code, a poet, a historian, etc, split the user input into the relevant prompts for each" then parse that and call ollama with each of the separate parsed inputs and their relevant agents? Then combine all the outputs into one to send back to the user?

  • @bingolio
    @bingolio 7 หลายเดือนก่อน +1

    AWESOME!!!!!!!!! Pls add MemGpt to this!

  • @TheSumone
    @TheSumone 7 หลายเดือนก่อน

    I was just wondering if AutoGen supports multimodal models? If it was hooked up with visual input, can it use its agents to identify and sort objects?

  • @themax2go
    @themax2go 7 หลายเดือนก่อน

    Do you have a vid or link pls for your CLI prompt?

  • @-UE-PR0
    @-UE-PR0 7 หลายเดือนก่อน

    So I use a HP laptop, no GPU. Will it run pretty fast as shown in the video if I run mistral using olama as well?

  • @donaldparkerii
    @donaldparkerii 7 หลายเดือนก่อน

    How do you specify a port when spawning new LiteLLM instance?

  • @sasgalileo
    @sasgalileo 7 หลายเดือนก่อน

    Thank you very much Matthew for this amazing video. When I run the program I only get this response in terminal:
    user_proxy (to Coder):
    Write a python script to output numbers 1 to 100
    ---------------------------------------------------------------------------------------
    and it does not continue with the execution of the script.
    Do you know why this is happening?

  • @figs3284
    @figs3284 7 หลายเดือนก่อน

    Would taskweaver be set up the same way mostly?

  • @joelwalther5665
    @joelwalther5665 7 หลายเดือนก่อน

    Thanks !
    If you have the error : TypeError: 'NoneType' object is not iterable,
    Than you have to add : "cache_seed": 42,
    like this :
    llm_config_mistral={
    "config_list": config_list_mistral,
    "cache_seed": 42,
    }
    llm_config_codellama={
    "config_list": config_list_codellama,
    "cache_seed": 42,
    }
    After that it worked for me

  • @xcalibur1523
    @xcalibur1523 7 หลายเดือนก่อน

    Do we have a model that allows us to upload structural data (csv,xlsx)? We can create an agent that performs data analysis, create ML models on those data locally.

  • @coder0xff
    @coder0xff 6 หลายเดือนก่อน

    What's the difference between the teachable agent and the MemGPT agent? How do vector databases help the agents with recall?

  • @lemonkey
    @lemonkey 7 หลายเดือนก่อน

    Does the command `ollama rm ` actually delete the model from the filesystem?

  • @Derick99
    @Derick99 7 หลายเดือนก่อน

    Can you turn a website hosting server into a local llm instead of using api to connect to gpt? Like a WordPress plugging that isn't connected to chatGPT but to a local llm installed on the server hardware

  • @kyusungpark8346
    @kyusungpark8346 7 หลายเดือนก่อน

    Is it possible to hook up autogen with custom model created with chatGPT?

  • @HunterMayer
    @HunterMayer 7 หลายเดือนก่อน

    Do any of the updates to autogen make your previous videos irrelevant/less accurate?

  • @malikrumi1206
    @malikrumi1206 7 หลายเดือนก่อน

    Matthew - I just saw your video on LM Studio - why are you using ollama instead of LM Studio?

  • @lawyermahaprasad
    @lawyermahaprasad 7 หลายเดือนก่อน

    Let's do a live product dev with this One agent - One model ... with all bells and whistle possible.

  • @DeeMenace
    @DeeMenace 7 หลายเดือนก่อน +1

    Does Ollama run on windows yet ?

  • @karlofranic1299
    @karlofranic1299 7 หลายเดือนก่อน +1

    Can you make a video on how to run these models locally on a GPU (like we would any other model alone, without Ollama)? Thanks!