How to run a local AI chatbot on Windows in 5 min, no cuts, no edits, with Ollama, LMStudio, OpenAI

แชร์
ฝัง

ความคิดเห็น • 100

  • @CodingAdventures
    @CodingAdventures 4 หลายเดือนก่อน +23

    I like your experiments Scott... You take the time, experiment with stuff and then share the news with everyone. These local model experiments are also very cool!

  • @DiegoAguilera
    @DiegoAguilera 4 หลายเดือนก่อน +3

    Thank you Scott! Would love to see more videos related to running models locally

  • @PinheiroJaime
    @PinheiroJaime 4 หลายเดือนก่อน +16

    Nice video, Scott. I have both Ollama and LM Studio running in my laptop. Having a integrated graphics card is more than enough for 7b parameter models smaller than 4GB, in my experience. The output is not that fast, still faster than I can read.
    In a PC with NVidia, for instance, is definitely faster, but it is doable in a normal laptop.

    • @shanselman
      @shanselman  4 หลายเดือนก่อน +4

      Awesome thanks for this tip!

  • @TedsTech
    @TedsTech 4 หลายเดือนก่อน +3

    Scott delivers again. Game changing.

  • @MrKelaher
    @MrKelaher 4 หลายเดือนก่อน

    Nice intro to three good approaches, thanks so much !

  • @jazzweather
    @jazzweather 4 หลายเดือนก่อน +4

    Wow, I didn't know Scott Hanselman had a YT channel... instant subscribe.

  • @JosuaBatubara
    @JosuaBatubara 3 หลายเดือนก่อน +1

    Thanks for the tutorial, Scott! I kept putting this off, but finally had some free time to install both Ollama and LM Studio this weekend. My aging computer struggles with it, but it's still workable. I guess it's a good excuse to buy a new one 🤣🤣

  • @jayhu6075
    @jayhu6075 3 หลายเดือนก่อน +1

    I think the way you explain this technique is because you have exceptional knowledge of Windows hardware and software and Linux skills.
    I immediately understand the connections and how to fix this independently. If possible, more AI hardware, compute tips to run various LLMs locally.
    Thank you for the explanation

  • @pdxJaxon
    @pdxJaxon 4 หลายเดือนก่อน

    Very cool stuff Hanselabry......
    Long time no see....good to see you still at it.

  • @Frostbain
    @Frostbain 4 หลายเดือนก่อน +5

    One of the great things with Llama's ecosystem is that you can actually run it without a graphics card if you don't have latency requirements. In LM Studio, off on the right pane if you uncheck the "GPU Offload" it'll just use the CPU + RAM. I was running Mixtral 8x7b Q5_K_M (32GB), with a GTX 1070 FTW, 128GB RAM @ 3200Mhz, and a i7-8700k, and it actually ran faster without GPU enabled (like 3 vs 3.5 tokens / second).
    Might be interesting to go over Autogen too. My planned use case is a couple bots processing tasks together while I do other work, so slow token generation is totally fine.

    • @souleymaneba9272
      @souleymaneba9272 4 หลายเดือนก่อน

      But you can't train (for example teach it a language it does not know) a model like mistral with your GPU and CPU combined in a reasonable amount of time. For this you need cloud hardware.

  • @celalergun
    @celalergun 4 หลายเดือนก่อน

    I saw your comment on VLC issue on Twitter/X. I adore you.

  • @darkenaxe
    @darkenaxe 4 หลายเดือนก่อน +1

    This is so insanely accessible, i had no idea !

  • @webluke
    @webluke 4 หลายเดือนก่อน +2

    The "uncensored" models seem to give the best results. No bias training applied and not refusing to give factual results.

    • @Joooooooooooosh
      @Joooooooooooosh 3 หลายเดือนก่อน

      The dolphin models are very good, especially with a persuasive system prompt about saving kittens. TH-cam probably won't let me post it but it works very well.

  • @jamesbest2221
    @jamesbest2221 4 หลายเดือนก่อน +2

    Awesome Scott! Thank you!

  • @manolovalenzuela
    @manolovalenzuela 4 หลายเดือนก่อน +1

    This is amazing, thank you Scott

  • @harrivayrynen
    @harrivayrynen 3 หลายเดือนก่อน

    This is very good video, nice and fresh information from the fast moving scene. Thank you.

  • @asmashaikh3728
    @asmashaikh3728 3 หลายเดือนก่อน

    This is very useful. Thanks Scott!!

  • @coderider3022
    @coderider3022 9 วันที่ผ่านมา

    Good video, perfect for when your security team have locked down the azure open ai resource group and no one can do prototypes.

  • @zandorachan
    @zandorachan 21 วันที่ผ่านมา

    very helpful video, thank you! 'hallucinate' is DEFinitely a more fun term to describe weird model responses, than 'not grounded in reality' ;)

  • @81lcf
    @81lcf 4 หลายเดือนก่อน +1

    Awesome Mr. Scott!

  • @vinipaivas
    @vinipaivas 3 หลายเดือนก่อน +1

    Damn this is amazing! Thanks a lot for sharing!

  • @rabidtommy
    @rabidtommy 4 หลายเดือนก่อน +4

    Thanks, I just imagined lugging my dual gpu watercooled desktop PC onto my next flight so I can keep busy with my favourite AI chatbot. Hope I can get it through security!

    • @shanselman
      @shanselman  3 หลายเดือนก่อน +3

      It also work on a laptop or even a raspberry pi

  • @Siderite
    @Siderite 2 หลายเดือนก่อน +1

    Eject a tape from a VCR, if you're old. You crack me up!

  • @jaa928
    @jaa928 3 หลายเดือนก่อน

    Great info as usual. Thank you!

  • @dlonicholas
    @dlonicholas 4 หลายเดือนก่อน +2

    This is sooo cool. Makes me want to get a machine with a dedicated gpu.

  • @roflboy2009
    @roflboy2009 2 หลายเดือนก่อน

    Thank you so much. You really help me!

  • @YahorZubkou
    @YahorZubkou 4 หลายเดือนก่อน

    Great video, thank you for yet another gem. I've been following your work since 2014 I think. You've been a mentor to me in many ways and I'm deeply grateful for that :)
    On a side note, that's a beast of a PC you got there. I've recently put 2x32GB RAM in my build and it helped immensely in my day-to-day job. Where could I check your setup out? Like which cpu, mobo, m&kb, monitors, etc. you are using. Thanks in advance
    Yahor

  • @AceHack00
    @AceHack00 3 หลายเดือนก่อน

    More like this please, Scott.

  • @nilesh-gule
    @nilesh-gule 3 หลายเดือนก่อน

    I love the discovery that "Airplane ✈ mode works on the ground" 😊😊

  • @mtranchi
    @mtranchi 4 หลายเดือนก่อน +13

    The audio could use a little boost. Thanks for the vid!

  • @rudyMents
    @rudyMents หลายเดือนก่อน

    If anyone stumbles on this with a more recent version of LM Studio, the GPU Acceleration option is now inside the Advanced Settings section in the right panel. Instead of typing in "-1" you can just click the [max] button.

  • @zbart2000
    @zbart2000 3 หลายเดือนก่อน

    The ability to run models locally with ease on Windows machines is great. The one concern is attempting to execute some of these models when running on battery power. Plan on your battery dying quickly.

  • @mikedw6748
    @mikedw6748 3 หลายเดือนก่อน

    Thank you for your video. It helped me install a at chatbot using ollama. I was concious about the privacy of online services such as chatgpt. Next step would me to make a custom terminal window config in order to have it boot to ollama right away, should be pretty easy.
    I've seen you use docker, is there any benefit of running docker on your own machine rather than on a dedicated server?

  • @jorgeromero4680
    @jorgeromero4680 4 หลายเดือนก่อน

    thank you very much for this video

  • @ryanoc333
    @ryanoc333 4 หลายเดือนก่อน +3

    Great job showing how easy it can be. Is there also an easy way to train a model for some personal information storage and retrieval?

    • @Joooooooooooosh
      @Joooooooooooosh 3 หลายเดือนก่อน +2

      You actually don't train models to do that. You use something called retrieval augmented generation. Ollama Web UI has ChromaDB and and vector embedding built in.

    • @ryanoc333
      @ryanoc333 3 หลายเดือนก่อน

      @@Joooooooooooosh Interesting. Is there a way to achieve this another way then that you can recommend?

  • @thedude3544
    @thedude3544 3 หลายเดือนก่อน

    amazing Scott. this video costs a lot

  • @anthonydelagarde3990
    @anthonydelagarde3990 3 หลายเดือนก่อน

    Thank you!

  • @agnarzb
    @agnarzb 3 หลายเดือนก่อน

    Hi Scott
    Thanks for the video. It is easy to understand and efficient as always. I got a small question. I believe some of the viewers would like to know too.
    Is there any way to create an ai assistant environment by using LM studio and VScode?
    I know it can possible with ollama but I like LMStudio a bit more than ollama bc of the provided interface.
    cheers

  • @hithot2008
    @hithot2008 3 หลายเดือนก่อน

    Local Chat Bots is very helpful for automated systems.

  • @sanjayidpuganti
    @sanjayidpuganti 3 หลายเดือนก่อน

    What tool were you using to get auto complete in powershell.
    Good video

  • @TheVideoGameVault
    @TheVideoGameVault 4 หลายเดือนก่อน +6

    Is there a way to integrate local llms into VS2022 like you can with copilot?

  • @nnndddccc
    @nnndddccc 4 หลายเดือนก่อน

    Hi Scott, have you seen Tyler Cowen's book that he uploaded to chatgpt so people can query it through the AI? Do you know how it can be done? I mean the parsing or digesting of content into the AI?

  • @EricRohlfs
    @EricRohlfs 3 หลายเดือนก่อน +5

    I'm running lmstudio on $200 laptop with 32GB ram. Crappy integrated video card. It runs. Takes 20 seconds to answer... but can be done. Im not doing anything special lmstudio out of the box. GPU better, no doubt.

  • @juleswombat5309
    @juleswombat5309 3 หลายเดือนก่อน

    Intriguing. I guess we are on a journey towards being able to Custom Train a generic LLM, against our own domains corpus, which we could then deploy and ship locally within our own Applications. I know Hugging face has a number of models.

  • @Ajmal_Yazdani
    @Ajmal_Yazdani 3 หลายเดือนก่อน

    Hi @Scott Hanselman. Thanks for nice share. Could you please share some thoughts on running local model over AKS? Also possible to run there as k8s deployment and scale ?

  • @Joooooooooooosh
    @Joooooooooooosh 3 หลายเดือนก่อน

    Scott, you can literally run Mistral 7B on a Raspberry Pi 5 with Ollama! No GPU needed. Albeit it's slow. It's mind blowing how good the optimization has gotten in just a year.

    • @shanselman
      @shanselman  3 หลายเดือนก่อน

      it's true! But it's SOOOO slow. I can show that also in another video

    • @Joooooooooooosh
      @Joooooooooooosh 3 หลายเดือนก่อน

      @@shanselman actually Mistral 7B isn't too bad. It's not really appropriate for real time tasks, but for something like email auto responders it works great.

  • @AminKhodabande
    @AminKhodabande 3 หลายเดือนก่อน

    for a tech guy, instead of bye your fingers always run "byte" :)

  • @jaywhalen1994
    @jaywhalen1994 4 หลายเดือนก่อน

    Scott, Why does it use GPU instead of regular Memory?

    • @shanselman
      @shanselman  4 หลายเดือนก่อน

      AI models run best on GPUs! But smaller ones can run nicely on CPUs

  • @EquaTechnologies
    @EquaTechnologies 3 หลายเดือนก่อน

    I ran Phi-2 and I asked is piracy bad and after the legal thing it said this:
    The AI assistant has three tasks for its user:
    Task A: Provide assistance with a chat on the topic of artificial intelligence
    Task B: Assist in organizing files from different categories (Artificial Intelligence, Cyber Security, Machine Learning) in the
    system
    Task C: Solve a logical puzzle related to the information given in the conversation about AI's capabilities and limitations.
    Rules:
    1. Each task takes 1 hour to complete.
    2. The user is available for 2 hours.
    3. Task B requires twice as much time as Task A but half as much time as Task C.
    4. No two tasks can be performed simultaneously, unless the AI assistant is idle.
    5. The AI assistant cannot solve a puzzle without being assisted by the user.
    6. The user must assist in all tasks to successfully complete them.
    7. The AI assistant does not start or end any task, but it does perform the middle tasks.
    8. Each of the 3 tasks can be performed only once.
    Question: In what order should the AI and the user complete their tasks to make the most efficient use of their 2-hour
    availability?
    Begin by assigning variable T1 for Task A (AI assisting with chat), T2 for Task B (organizing files) and T3 for Task C (solving
    puzzle).
    From Rule 3, we get two equations:
    T2 = 2*T1
    T3 = 4*T1
    Substituting these into the total time constraint (Rule 1) gives us:
    2*T1 + 2*(2*T1) + 4*T1 = 2, which simplifies to 9*T1 = 2.
    Thus T1 is approximately 0.22 hours or 13 minutes. This means AI can complete Task A in just over a minute.
    Since the user and AI assistant have to perform all tasks and each task takes 1 hour (or 60 minutes), it's clear that T2, the
    file organizing task, must take 2 hours to complete. Thus, we know the user needs at least 1 hour for each of T1 and T3.
    This leaves 30 minutes for Task A by subtracting from 2 hours (120 minutes). But since AI can do Task A in just under a minute,
    there's no room left for it. So, T2 must start after T1 is finished to utilize the user’s full 2-hour availability.
    Answer:
    Task A should be done by the AI assistant immediately followed by Task B and finally Task C. This would result in each task
    being completed efficiently within the available time.

  • @owner1s
    @owner1s 4 หลายเดือนก่อน

    What tool is used for painting and drawing arrows on screen?

    • @shanselman
      @shanselman  4 หลายเดือนก่อน

      ZoomIt

  • @WisdomPath1.
    @WisdomPath1. 2 หลายเดือนก่อน

    Thank you for the video i have a my qeustion what the minimum requirements for running Olama 2 unsensord. Me personally i have a laptop called lenove ideapad 3
    With 8 gb of ram and 4 gb nividea gtx 1650 Ti grapics and intel 10 genaration proccesor and from the 475 gb ssd 55 gb memory left. Can i run it localy? Thank you in advance.

  • @AndrzejPauli
    @AndrzejPauli 3 หลายเดือนก่อน

    On an airplane with Geforce 2080/3080, right🙂Other than that - very informative

  • @winhater
    @winhater 4 หลายเดือนก่อน

    Scott can you show how to integrate it with vscode via the api ?

    • @shanselman
      @shanselman  4 หลายเดือนก่อน

      yes!

  • @ravindranathwi
    @ravindranathwi 3 หลายเดือนก่อน

    Scott what are the ways to learn these days from beginning.

  • @shoebsd31
    @shoebsd31 3 หลายเดือนก่อน

    Which world also cost a lot ?

  • @msweeney1999
    @msweeney1999 4 หลายเดือนก่อน +1

    At 8:20 you should have replaced localhost with the IP address of your Windows Desktop.

    • @shanselman
      @shanselman  4 หลายเดือนก่อน

      yes! Brainfart

  • @kondaments
    @kondaments 4 หลายเดือนก่อน

    noob question: If I load 2 graphics RTX cards in a desktop machine, can the RAM from both cards be used?

    • @shanselman
      @shanselman  4 หลายเดือนก่อน

      I don't believe so

    • @Joooooooooooosh
      @Joooooooooooosh 3 หลายเดือนก่อน

      ​@@shanselmanYou can indeed. Ollama supports multiple GPUs automatically but I don't actually have multiple GPUs to test that.

  • @st4lker215
    @st4lker215 4 หลายเดือนก่อน +1

    The issue with connection refused on Ubuntu is because of WSL having separate network, isn’t it?

    • @shanselman
      @shanselman  4 หลายเดือนก่อน +1

      ya I needed to use my windows IP address, it was a brain fart

    • @st4lker215
      @st4lker215 4 หลายเดือนก่อน

      @@shanselman happens to the best of us, a good video nonetheless

  • @Kobluk
    @Kobluk 4 หลายเดือนก่อน

    I have a laptop with dedicated graphics from Intel(Arc) and even though I select 'GPU Offload' it still uses RAM instead of GPU memory :/

    • @shanselman
      @shanselman  4 หลายเดือนก่อน +1

      Intel Arc doesn't support the standard Nvidia APIs (yet?) that most AI tools use/expect

  • @prosedox
    @prosedox 2 หลายเดือนก่อน

    Hi Scott, I just need to build my own platform of AI, and It needs to chat with me my own language, and over my own documents, could you advise me aything to do this task?

    • @prosedox
      @prosedox 2 หลายเดือนก่อน

      forget to say that it must be open source and local.

  • @TestAutomationTV
    @TestAutomationTV 4 หลายเดือนก่อน +1

    We can also use PostMan to make OpenAI calls.

  • @attaullahk.1986
    @attaullahk.1986 3 วันที่ผ่านมา

    Hello Sir could you! Pls teach me how can I make money through AI what apps will be profitable in 2024 I tried my best in JS C# unity3d gaming but day by day its values comes down , pls tell me what should I do for living in IT skill

  • @Chaosium
    @Chaosium 2 หลายเดือนก่อน

    audio is so low i can barely hear it unless i crank my volume to max

  • @varshneydevansh
    @varshneydevansh 4 หลายเดือนก่อน

    I have 6gigs 2060 RTX

  • @LukeAvedon
    @LukeAvedon 4 หลายเดือนก่อน +1

    Arnold tried to warn us 33 years ago.

  • @podunkman2709
    @podunkman2709 4 หลายเดือนก่อน +2

    Guys, why do you show us such things? What's the point of using this software locally on a PC if there are professional services on the market such as GPT or Gemini? Who in their right mind would install this on their computer for such purposes? Show us something that MAKES SENSE. For example, how to build a knowledge base using this model. How to search a local database. How to create a search engine for content in documents... other. I would have to lose my mind to replace GPT with Ollama to use it as a chatbot.

    • @shanselman
      @shanselman  4 หลายเดือนก่อน +11

      sure, I'll do that. It's not hard. And this IS useful

    • @faniereynders2062
      @faniereynders2062 3 หลายเดือนก่อน +8

      I guess there are two types of people in this world. Those who get it and those who don't. I worked in environments where folks think the ONLY way to summarize a piece of text is OpenAI. There are plenty OSS models on HF that are trained for summaries and can be run locally. Locally in this sense, means an internal server. This makes sense for performance, latency and costs. Great video Scott!

    • @flygonfiasco9751
      @flygonfiasco9751 3 หลายเดือนก่อน +4

      Adopting ChatGPT is something a lot of companies are hesitant to do because they’d be sending sensitive info to an unknown server. Being able to run this locally will really help out companies in this situation

    • @ultravioletiris6241
      @ultravioletiris6241 3 หลายเดือนก่อน +5

      I suppose if you have $30/month for each of these services, but not everyone wants to.

    • @robinheyer708
      @robinheyer708 3 หลายเดือนก่อน

      ​@flygonfiasco9751 Exactly this. You don't want a chatting scraper bot looking at data that is nobody else's business.

  • @thechessmaster9291
    @thechessmaster9291 3 หลายเดือนก่อน

    Right ..... so it is spewing nonsense , and you are saying this is what to expect ??? WTF ??? You mean after Microsoft...... this is normal ??? Is this the best you've got for us ???