Ollama - Local Models on your machine

Sam Witteveen

มุมมอง 72 292

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 มิ.ย. 2024
Site: www.ollama.ai/
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
Timestamps
00:00 intro
01:07 What is Ollama
01:30 Ollama Models
02:19 Installing Ollama
03:13 Running Ollama
06:17 Customizing Ollama
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 114

@VincentVonDudler 8 หลายเดือนก่อน ⁺³⁵
Thanks, Ollama.
@chautauquatrail 8 หลายเดือนก่อน ⁺¹⁶
I just did the ollama install yesterday, you are awesome for being able to produce these so quickly.
@kenchang3456 8 หลายเดือนก่อน
Thanks Sam, very interesting. It's amazing how fast the whole LLM ecosystem is moving.
@DanielSpringer 8 หลายเดือนก่อน ⁺⁶
Definitely has a docker vibe. I like it!
@nexuslux 8 หลายเดือนก่อน ⁺¹
Nice and to the point video. Appreciate it!
@paulmiller591 6 หลายเดือนก่อน
Thanks Sam Great video. These are some of the best videos on AI tools. I need to master this for my work and your approach to communication really works for me. Cheers. Keep up all things Langchain, please.
@kp_kovilakam 7 หลายเดือนก่อน ⁺¹
Thank you for the introduction!
@user-ut8ts5gv2g 3 หลายเดือนก่อน ⁺¹
nice video. I managed to create a customized model by watching this.
@sandrocavali9810 4 หลายเดือนก่อน
Excellent intro
@jeffsteyn7174 4 หลายเดือนก่อน ⁺¹⁰
Its not just about been techincal. Its also about been productive. Do you want to spend your time building something useful or time trying to figure out how a badly maintained and documented piece of software works.
@sonurocks341 2 หลายเดือนก่อน
Great demo ! Thank you !!
@riflebird4842 8 หลายเดือนก่อน
Thanks for the video, keep it up
@richardchiodo2200 3 หลายเดือนก่อน ⁺¹
I picked up a 12gb 3060 from my local microcenter for a pretty good price, and am now running Ollama with Open WebUI for the frontend, and they have a community repository for prompts and modelfiles. The biggest hurdle was passing the gpu through my proxmox host to my vm.
@theh1ve 8 หลายเดือนก่อน ⁺⁷
Another great flag this Sam. I would be interested to see this running so you can make API calls.. currently using Text Gen Web UI as a server and this looks like it would be a good alternative.
@sitrakaforler8696 8 หลายเดือนก่อน ⁺¹
LLama2 unsencored was quite surprising for me x)
By the way. THANK YOU FOR yOUR VIDEO
Every time i need to use Ollama I'm using your video to be sure of the command "Ollama run" hahah
@willi1978 6 หลายเดือนก่อน
it looks like the uncensored versions are a lot better. then it is not always giving a paragraph why it can't do what you ask it to do
@nembalage 6 หลายเดือนก่อน
super helpful.
@mohamed_elmardi 8 หลายเดือนก่อน ⁺⁶
For windows users they can use WSL
@fontende 8 หลายเดือนก่อน ⁺³
Win 10 was last for me, I found perfect Linux of PikaOS, an Ubuntu without ubuntu snaps or other crap.
@jidun9478 8 หลายเดือนก่อน ⁺²
Nice, thank you. It runs so much faster than Text Gen Web UI! I wish they'd make it easier for you to add a custom choice of models though (that is a real draw back).
@samwitteveenai 8 หลายเดือนก่อน
I have a video coming that shows how to do exactly that. Its actually pretty easy.
@FreakyStyleytobby 8 หลายเดือนก่อน ⁺²
Fantastic video Sam, thank you! Ollama looks great but the big, 70B models still remain beyond the reach of typical RAM. Do you know of any way, (be it API or other) to get access to Llama70B and be able to run arbitrary tokens on the model? There are some APIs like TogetherAI but they only let you to run the endpoints like /prediction, not much more.
@tusharbokade8378 8 หลายเดือนก่อน
Interesting!
@AlphaSynapse 3 หลายเดือนก่อน ⁺¹
ollama is now available for windows (windows 10 or above)
@twobob 8 หลายเดือนก่อน
nice one
thanks
@zacharymacaroni7649 หลายเดือนก่อน
good video :)
@BionicAnimations 2 หลายเดือนก่อน ⁺¹
Hi. How do I uninstall one of them off my MacBook Pro? I am using it in terminal.
@samwitteveenai 2 หลายเดือนก่อน
ollama rm llama3
If you just type ollama in the command line you should be able to see all the commands
@morespinach9832 4 หลายเดือนก่อน
If we get these locally in our cloud, is there a best practice to keep them updated?
@attilavass6935 8 หลายเดือนก่อน ⁺¹
What are the pros and cons of using such "local" Ollama models on Colab Pro with 2 TB of Drive?
@photorealm 2 หลายเดือนก่อน
Ollama for windows is out and available for download. I am testing it and it works fabulous but its very slow for me on windows. I don't think its using my nvidia GPU and can't seem to find a way to hook the GPU in under windows. But just got started, I love the fact that it is serving to a local http port as well as command line.
@Leonid.Shamis 8 หลายเดือนก่อน ⁺²
Great video, as usual :) I have been using Ollama on Linux and it has been working great. I know that Ollama can be used via API but I was wondering whether its API is compatible with OpenAI API and can be used as a replacement for OpenAI API inside LangChain. Looking forward to more videos about Ollama. Thank you.
@IanScrivener 8 หลายเดือนก่อน ⁺²
There are dedicated langchain and LlamaIndex connectors for Ollama.
Ollama is different to OpenAI’s.. better IMO
@nelavallisivasai8740 4 หลายเดือนก่อน
It's taking more time for me to get responses from local model not as fast as yours. Can you please tell me what processor you are using?
What are minimum hardware requirements to run LLM models to get faster responses?
@VaibhavPatil-rx7pc 8 หลายเดือนก่อน
Thanks
@franciscojlobaton 8 หลายเดือนก่อน
Please, more. Más por favorrrr
@user-di5tl3iv8l 3 หลายเดือนก่อน
thanks for the video! how can I make ollama run the 13gb tar file i download locally?
@ronelgem23 4 หลายเดือนก่อน
Does it require VRAM or just regular RAM?
@AndyAinsworth 8 หลายเดือนก่อน ⁺³
LM Studio for Windows and Mac is a great way to achieve the same with a lot less setup! Also has a great internal model browser which suggests what models might run on your machine.
@AndyAinsworth 8 หลายเดือนก่อน ⁺¹
It can also run as an API with a click in the UI. Definitely been the easiest way for me to test out a load of different LLMs locally, nice user interface with history and markdown support.
@alx8439 8 หลายเดือนก่อน
Lm studio is a proprietary software. God only knows what else it is doing on your PC - gathering and sending out your data while you are sleeping, mining bitcoins, using your PC as exit node for TOR, keylogging everything you type - you can only guess
@IanScrivener 8 หลายเดือนก่อน ⁺¹
Agree, LM Studio is great.
Can be run in OpenAI API mode.. which replicates OpenAI's API format.. and so can be easily used with langchain, LlamaIndex etc
@AndyAinsworth 8 หลายเดือนก่อน ⁺²
@@IanScrivener Yeah, I'm hoping to get it setup to use the API via LM Studio with Microsoft AutoGen which provides a multi agent work flow with a code Interpreter.
@scitechtalktv9742 8 หลายเดือนก่อน
@@AndyAinsworththis is what I want to do also! Have you had any progress and success with this?
@Knowledge_Nuggies 4 หลายเดือนก่อน ⁺¹
I'd be interested to learn how to build a RAG system or local LLM agent with tools like Ollama, LM Studio, LangChain etc.
@mbottambotta 8 หลายเดือนก่อน ⁺¹
Thank you Sam for posting this video. Very accessible, clearly explained. Question: what I could not see is if ollama enables you to choose the model size. I.e., whether you want llama2 7b, 13b or 70b for example.
@brando2818 8 หลายเดือนก่อน ⁺²
3:37
You can, specify with
ollama run llama2-uncensored
Just go to the models page, then click one. It'll tell u the command if you're using cli
@samwitteveenai 8 หลายเดือนก่อน ⁺²
Yes you can pick this take a look on the models page.
@pensiveintrovert4318 6 หลายเดือนก่อน
Any idea how to load a model that is already on my disk?
@xdasdaasdasd4787 7 หลายเดือนก่อน
awesome! I was hoping to use a custom model but didnt fully understand :(
@iainattwater1747 8 หลายเดือนก่อน
I used the docker container just released and it works in windows.
@VijayChintapandu 2 หลายเดือนก่อน
can you provide the docker container link. from where did you downloaded
@liji8672 8 หลายเดือนก่อน ⁺¹
Hi, Sam, good video. my little question is that if you llama2 model ran on your cpu?
@samwitteveenai 8 หลายเดือนก่อน ⁺²
Pretty sure it was running on Metal and using the Apple Silicon GPUs. It is certainly a quanitized model though, which helps .
@IanScrivener 8 หลายเดือนก่อน ⁺¹
You CAN run any llama.cop tool on CPU… though it is MUCH slower than GPU.
MacOS Metal GPU is surprisingly fast…
@Thelgren00 21 วันที่ผ่านมา
Can i use this to install ai town..default method was too complex for me
@Gerald-iz7mv 2 หลายเดือนก่อน
what port does the webserver on? can i set that port?
@Shawn-lk2ze 6 หลายเดือนก่อน
I'm new to this topic and I just binged your videos. How does this compare to vLLM from your previous video? I get ollama is more user-friendly, but I'm more curious about the performance?
@samwitteveenai 6 หลายเดือนก่อน ⁺¹
vLLM is for more for serving full resolution models in the cloud and Ollama is for doing. vLLM shines when you have some strong GPUs to use etc.
@Shawn-lk2ze 6 หลายเดือนก่อน
@@samwitteveenaiGot it! Thanks!
@user-ex4mf6ky6x 7 หลายเดือนก่อน
Hi I am using ollama for past 2months, yes its giving the good results but what i need is it is possible to set the configuration file for ollama like setting the parameters for ollama to get a most accurate results can you make one video about how to set the custom parameters.
@GrecoFPV 2 หลายเดือนก่อน
Can we give this power to N8n ? Connect our local ollama with our selfhosted N8n ?
@ghrasko 6 หลายเดือนก่อน ⁺¹
In fact, it was quite easy to install ollama on Windows 10 using Windows Subsystem for Linux (WSL). In a Windows command prompt:
wsl --install -d Ubuntu (this downloads and runs the Ubuntu distribution giving a Linux prompt)
ollama pull llama2:13b (this downloads the selected model)
ollama run llama2:13b (this runs the selected model)
At this point you can white user test that will be sent to the model. This did not work for me, the keyboard input is not correctly directed to the application. This is possibly a compatibility issue with this Linux emulation. But I could fully use the downloaded models from simple Python programs directly or through Langchain.
@MarcellodeSales 7 หลายเดือนก่อน
It seems like it's Docker :D Same feeling.. Ollama will captalize one CloudNative Software Engineers
@rookandpawn 9 วันที่ผ่านมา
I'm coming from text-generation-webui, how can i use that model folder for ollama?
@alx8439 8 หลายเดือนก่อน ⁺²
There are bunch of alike tools (simple to use for non technical ppl). The most prominent is gpt4all. Yeah from the guys who fine tuned first llama back in march/April on their own handcrafted datasets.
These guys from ollama were definitely inspired by docker, based on the syntax and architecture :)
@technovangelist 8 หลายเดือนก่อน ⁺²
a few of the maintainers were early Docker employees
@guanjwcn 8 หลายเดือนก่อน ⁺¹
Thanks, Sam. Do you know what tricks ollama uses to make it run so smoothly locally?
@samwitteveenai 8 หลายเดือนก่อน
They are using quantized models and on macOS they are using metal etc.
@IanScrivener 8 หลายเดือนก่อน ⁺²
Ollama uses llama.cpp under the hood… the fastest LLM Inference Engine. Period.
Many other apps also use llama.cpp: Kobold, Oogabooga, etc
Many other apps use Python inside … easier to build much much slower performance.
@samwitteveenai 8 หลายเดือนก่อน ⁺¹
@@IanScrivener They have the llama.cpp running on Metal on macs right. It feels like it is more than just on cpu etc. honestly haven't looked under the hood much it.
@abhijitkadalli6435 8 หลายเดือนก่อน ⁺²
feels like docker
@NoidoDev 8 หลายเดือนก่อน ⁺¹
# New Software
In the past: It only runs in WIndows, but maybe in a few years it will be available on MacOS and one day but probably never on Linux.
Today: At the moment it supports MacOS and Linux, but apparently Windows support is coming soon as well.
@samyio4256 4 หลายเดือนก่อน
Also another question. Do you really run this on a mac mini? If so, how much ram does your machine have?
@samwitteveenai 4 หลายเดือนก่อน ⁺¹
32gb of RAM
@XiOh 8 หลายเดือนก่อน ⁺¹
when is the windows version coming out? O.o
@wendten2 8 หลายเดือนก่อน ⁺¹
"Its Llama for those who dont have technical skills" .. the PC version is currently only available on Linux... xD
@samyio4256 4 หลายเดือนก่อน
If the used model talks to an api, how it this a local usage?
Id like to know where the prompt data goes to? Will it go to a database and the Model loads it after? Or is the model hosted seperatly in a monitored env?
My basic question is, who gets the data from the input prompt?
@samwitteveenai 4 หลายเดือนก่อน
the data is only on your machine. It is all running locally. It can run an api on your machine and you can then expose that if you want to use it from somewhere else. If you are just using it on your machine all data stays on your machine.
@samyio4256 4 หลายเดือนก่อน
@@samwitteveenai wow! Thats a complete game changer! Thanks! ill sub, insane content!
@Ryan-yj4sd 8 หลายเดือนก่อน ⁺¹
Can you run fine tuned models?
@IanScrivener 8 หลายเดือนก่อน ⁺¹
Yes, and your own Loras…😊
@merselfares8965 3 หลายเดือนก่อน
would a i3 11gen with 8 ram and 630uhd graphics card be enough ?
@samwitteveenai 3 หลายเดือนก่อน
honestly not sure. It will probably run but you may get very slow tokens per second
@VijayChintapandu 2 หลายเดือนก่อน
My system is very slow when I am running Ollama. My system is mac M2. Is this issue?
@samwitteveenai 2 หลายเดือนก่อน
depends which model you are trying to run. The video was done on a M2 Mac Mini
@SharePointMaster 2 หลายเดือนก่อน
@@samwitteveenai ohh thanks for the reply. mine also same Mac air with M2 chip but it was slow. I will check
@abobunus 7 หลายเดือนก่อน
how to make your own language model? for example I want to take some texts and force AI to use only this text to answer my questions
@volkanazer9997 3 หลายเดือนก่อน
Let me know when you've got it figured out. I'm curious about this as well.
@stanTrX 2 หลายเดือนก่อน
can i upload and work with documents with ollama?
@samwitteveenai 2 หลายเดือนก่อน
yes you will need to code it to do a custom RAG
@stanTrX 2 หลายเดือนก่อน
@@samwitteveenai thanks good man but whats custom RAG?
@kunalr_ai 8 หลายเดือนก่อน
Why this new model
@foolcj9999 3 หลายเดือนก่อน
can u make a video of ollama interaction using voice input. and it replies back like whisper
@samwitteveenai 3 หลายเดือนก่อน
Interesting Idea!
@HitopFaded 3 หลายเดือนก่อน
I’m trying to run it in a python environment if possible to build on top of it
@samwitteveenai 3 หลายเดือนก่อน ⁺¹
I have another vid there on Ollama's Python SDK etc
@HitopFaded 3 หลายเดือนก่อน
@@samwitteveenaithanks I’ll check it out
@kevinehsani3358 8 หลายเดือนก่อน
I am sure windows users can probably install it under wsl
@samwitteveenai 8 หลายเดือนก่อน
I was wondering about this. i asked one of my staff to give it a quick try it but he couldn't get it working.
@LITTLEFREDOX2 3 หลายเดือนก่อน
windows version is here
@DaeOh 8 หลายเดือนก่อน ⁺²
Would you consider not referring to models like Llama and Mistral as "open-source?" It sets a precedent. "Freeware," maybe?
@alx8439 8 หลายเดือนก่อน
It's a good question how we should refer to such models. It's not 100% Foss compliant because of the restrictions which come into place if you have like 700 millions of users, if my memory serves me well. But this is more like restriction for a couple of companies like ms, Google, tiktok. Who cares about them? Or am I missing something bigger?
@spirobel2.0 8 หลายเดือนก่อน ⁺²
Mistral is completely open
@clray123 8 หลายเดือนก่อน ⁺¹
Do not mix Llama and Mistral together. Mistral has a truly open license, Llama is the Facebook/Meta poison.
@DaeOh 8 หลายเดือนก่อน
It's not open-source because you can't reproduce it without the source (training data)... Just making the equivalent of binaries available for commercial use doesn't make something "open-source..."
@fontende 8 หลายเดือนก่อน ⁺⁵
I run only locally and cloud services are anyway blocked to our region (quite many people don't have access to such, more than 2 billions, China+dozen others countries, mostly by political non-scentific logic). And hardware allow such, thankfully to China which recycle servers and bringing on market quite a secret chips from Intel, like Xeon 22 cores, which was never released outside enterprise, it costs only 150 bucks. My motherboard Asrock X99 Extreme4 become defacto standard in China for such socket, also 150 bucks, can be filled with 256gb Ram, i've bought mine in 2020th during Gpt2, which was impossible to run locally in it's max size 1558millions, there wasn't any of current tools, i was able to run by terminal 774millions on Gpu and it's was a mess of text.
@anispinner 8 หลายเดือนก่อน ⁺¹
Obama
@antonpictures 5 หลายเดือนก่อน
~ % ollama pull 01-ai/Yi-VL-6B
pulling manifest
Error: pull model manifest: file does not exist
@astronosmage3722 8 หลายเดือนก่อน ⁺³
would say oobabooga is still the way to go
@alx8439 8 หลายเดือนก่อน ⁺¹
Yeah. Or h20 gpt / llm studio

ต่อไป

เล่นอัตโนมัติ

Running Gemma using HuggingFace Transformers or Ollama