Adding Custom Models to Ollama

Matt Williams

มุมมอง 34 761

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 9 ม.ค. 2025

ความคิดเห็น • 104

@banalMinuta 10 หลายเดือนก่อน ⁺⁹
Just found this channel, you are the G.O.A.T sir!
@SlavaClass 11 หลายเดือนก่อน ⁺⁷
Have wondered several times how to do this. Thanks!
One area I'm interested in is local function-calling LLMs (ideally with Ollama). Would love an explanation of how to get those working, if there are any reasonable solutions yet
@technovangelist 11 หลายเดือนก่อน
That’s a great idea. Thanks
@SlavaClass 11 หลายเดือนก่อน
I should also mention that typescript + typesafety would be a big plus as well, perhaps with zod. But really any info on plumbing open-source solutions would be great
@technovangelist 11 หลายเดือนก่อน ⁺¹
I’m a much bigger fan of typescript. It’s so much easier for me to write than python.
@AlekseyRubtsov 11 หลายเดือนก่อน ⁺³
Thank you for video. It's easy to get GGUF format now, and I was stupid enough to miss the fact that one can go with just Modelfile and have it all in Ollama. Thanks.
@vishalnagda7 9 หลายเดือนก่อน ⁺¹
This video is the solution to what exactly I was looking for. And I'm also glad to see someone else using terminal like me :P
@DinoLopez 11 หลายเดือนก่อน ⁺⁵
I wonder if could be possible to have very small LLM for a particular use cases, for example Java, Ruby, Python and are like modules. this way if only a particular programming is required only have to load that particular lamguage Model.
@technovangelist 11 หลายเดือนก่อน ⁺²
That is exactly the scenario I am most excited about
@dib9900 6 หลายเดือนก่อน ⁺²
How to produce modelfile for Embeddings models?
@_lilnuggetwithbbqsauce3615 9 หลายเดือนก่อน ⁺¹
thank you so much this was really helpful
really good video 👍
@PaulRobello 6 หลายเดือนก่อน
What markup do i need to add to the title or description to get the blue bubbles with the tag label like "7B" "70B" etc?
@Pure_Science_and_Technology 11 หลายเดือนก่อน ⁺¹
The one thing that brings me anxiety in creating and using the dataset. A video would be great on that.. love your videos.
@technovangelist 11 หลายเดือนก่อน ⁺¹
Can you tell me more about what you mean by that? What dataset?
@Pure_Science_and_Technology 11 หลายเดือนก่อน
@@technovangelist omg! I just read what I wrote. Sorry. I meant to say, when we are required to fine-tune a model for a specific domain or required functionality. Curation of data, preparing it properly, ensuring the correct setting are used. As well as the evaluation of the dataset and testing.
@SteveLacy-r9e 4 หลายเดือนก่อน
What if I created a model with a new architecture or I made an architectural tweak to an existing model? In other words, something that changes the number/type of model layers or the number of training parameters, etc. Is there a path for porting a model with this kind of customized architecture to run on Ollama? What would the process be?
@technovangelist 4 หลายเดือนก่อน
New architecture? No idea
@TimothyGraupmann 11 หลายเดือนก่อน
(1:16) What architectures are supported? I only found these options in the comments of a PR..
1. LlamaForCausalLM
2. MistralForCausalLM
3. RWForCausalLM
4. FalconForCausalLM
5. GPTNeoXForCausalLM
6. GPTBigCodeForCausalLM
@TimothyGraupmann 11 หลายเดือนก่อน
I see a lot of OCR models use `VisionEncoderDecoderModel` is that supported?
@gparakeet 11 หลายเดือนก่อน
Ollama/quantize Dockerhub page lists next model architectures supported:
1. LlamaForCausalLM
2. MistralForCausalLM
3. YiForCausalLM
4. LlavaLlama
5. RWForCausalLM
6. FalconForCausalLM
7. GPTNeoXForCausalLM
8. GPTBigCodeForCausalLM
9. MPTForCausalLM
10. BaichuanForCausalLM
11. PersimmonForCausalLM
12. GPTRefactForCausalLM
13. BloomForCausalLM
@technovangelist 11 หลายเดือนก่อน
I seem to remember that isn't 100% accurate.
@gparakeet 11 หลายเดือนก่อน
Ollama/quantize docker page lists next architectures:
1. LlamaForCausalLM
2. MistralForCausalLM
3. YiForCausalLM
4. LlavaLlama
5. RWForCausalLM
6. FalconForCausalLM
7. GPTNeoXForCausalLM
8. GPTBigCodeForCausalLM
9. MPTForCausalLM
10. BaichuanForCausalLM
11. PersimmonForCausalLM
12. GPTRefactForCausalLM
13. BloomForCausalLM
@technovangelist 11 หลายเดือนก่อน
looks like you sent that twice. Sorry, I have comments wait for me to approve. In a distant past there was a spam problem. And doing it this way I ensure that I can answer every question that comes in. The interface for comments is worse than the Ollama Discord. And I want to ensure I address everything that comes in.
@isaakcarteraugustus1819 8 หลายเดือนก่อน
Great video! how do I add tags (what do I have to type in the terminal) in so that I can upload different quants to the same Ollama model repo?
@AlekseyRubtsov 11 หลายเดือนก่อน
Thanks!
@technovangelist 11 หลายเดือนก่อน
Wow. That’s the first time I have seen one of those. Thanks so much. I don’t know what to say. Thank you.
@AlekseyRubtsov 11 หลายเดือนก่อน
@@technovangelist I hope, no, - I'm sure you will see more of those. Just continue.
@Rewe4life 2 หลายเดือนก่อน
Hi,
I am trying hard to get an embedding model for german language going on ollama. the architecture in the config file is named this: "BertForMaskedLM"
I assume that it does not work because the Architecture is not supported. I have two questions regarding that:
- can you tell me where I can find a list of architectures supported by ollama? I am unable to find one.
- is there a way to get it working with ollama even with the named architecture?
@technovangelist 2 หลายเดือนก่อน
I don' t know where that list is. and not a way that I know of.
@giuliozeloni6684 8 หลายเดือนก่อน
hey, thanks for the video. Im trying to quantize a llama3 model with the docker image showed in video, but i think it is not supported. will the docker image be updated?
@LevitatingSarcophagus 21 วันที่ผ่านมา
Hello Matt. I am working with an already quantized model in exl2 format. Since it is already quantized I wanted to make it compatible for working with Ollama. I created the modelfile and run the ollama create cmd but I am running into an error: "unsupported content type: unknown". Could you help me out? Or at least let me know if it is even possible to convert from exl2 to gguf?
@technovangelist 21 วันที่ผ่านมา
I don’t know of any converter. May be easier just to convert from the original to gguf then quantize. Even really big models take just a few minutes with normal hw.
@_lilnuggetwithbbqsauce3615 9 หลายเดือนก่อน
I am quite new to all of this and there are some things that I don't understand. First, when I do this command hfdownloader -s : -m it tells me that hfdownloader is an unknown command even tho I downloaded the executable. Secondly, I don't understand what you mean at 1:54 when you say go to where you want your model to be downloaded since you don't show any folder folder being selected, only the terminal. Could you please explain ? thank you in advance
@technovangelist 9 หลายเดือนก่อน
hfdownloader probably isn't in your path. for the second part, you have to decide where to download something, same as with downloading anything from the Internet. choose that place and run the command there.
@_lilnuggetwithbbqsauce3615 9 หลายเดือนก่อน
@@technovangelist ty so much for such a quick answer, however things are still not so clear for me. How do I put hfdownloader in the path ?
@technovangelist 9 หลายเดือนก่อน
I suggest you look for tutorials on how to work with your OS, especially the command line.
@ibrahimabualhaol2540 6 หลายเดือนก่อน
./hfdownloader
@JohnSigvald 3 หลายเดือนก่อน
Thank you for this!
@antonpictures 11 หลายเดือนก่อน ⁺¹
This is nuts! How large is the ollama team? How many programmers?
@userou-ig1ze 11 หลายเดือนก่อน
I think it's just this guy
@technovangelist 11 หลายเดือนก่อน
There are a few in the team. But under 10. I’m not.
@glorified3142 11 หลายเดือนก่อน
I wish you can cover a video on how to finetune tinyllama for inferencing in ollama.
@technovangelist 11 หลายเดือนก่อน ⁺²
Yes. I really want to do one on this.
@glorified3142 11 หลายเดือนก่อน
@@technovangelist thanks in advance.
@MUHAMMADQASIMMAKAOfficial 11 หลายเดือนก่อน ⁺³
Very nice sharing 👍
Have a nice day 😊
@technovangelist 11 หลายเดือนก่อน
Thanks
@cryptowl1901 10 หลายเดือนก่อน
Thank you so much for such a great video❤❤
@muraliytm3316 7 หลายเดือนก่อน
Hello sir, whenever I run local models they are not using my VRAM, I saw its usage in taskmanager and their usage is not much, does making it default increases the speed of llm, so how can I make my VRAM as default for running any local llm. But sorry I have only 4gb VRAM nvidia gtx 1650
@technovangelist 7 หลายเดือนก่อน
Looks like it supports the 1650 Ti but not the 1650. Need to upgrade to get that. Nvidia doesnt support that with their drivers.
@SIR_Studios786 7 หลายเดือนก่อน
all architecture models can be converted to gguf, or any specific list , ?
@technovangelist 7 หลายเดือนก่อน
Not everything but a lot of them can.
@SIR_Studios786 7 หลายเดือนก่อน
@@technovangelist i need speech to text, do you know any model that can be converted to gguf, or can you help if any, i would be highly grateful
@technovangelist 7 หลายเดือนก่อน
Ollama is for text to text and text/image to text. For speech to text take a look at OpenAI’s Whisper models that you can install locally.
@valueray 9 หลายเดือนก่อน
Whats the best file format for RAG? Is there a list what works best?
@andrebarsotti 7 หลายเดือนก่อน
where can I get the architectures are compatible with ollama?
@technovangelist 7 หลายเดือนก่อน ⁺¹
I don't know if there is any one list anymore
@TimothyGraupmann 11 หลายเดือนก่อน
Where can we find compatible models other than HuggingFace? Are TensorFlow HUB formats compatible?
@technovangelist 11 หลายเดือนก่อน
I don’t know about that. But it’s safetensors and PyTorch models that are supported.
@TimothyGraupmann 11 หลายเดือนก่อน
This was such a great video just perfect for what I wanted to learn next. And this new model would work automatically with the api/chat and api/generate? Let's say we follow these steps on the Whisper model and somehow it works. The Whisper model has a translate() method. How would we add an api/translate custom endpoint to the Ollama API? Previously I used a nginx container to make a proxy so I could add my custom endpoint to that proxy. The Ollama API is golang. It would be great if there was some kind of plugin folder that I could drop Golang scripts that the API would automatically include. With the docker mount, if there was some predefined named mount point that Ollama monitored to automatically pull in your API additions that would be great!
@technovangelist 11 หลายเดือนก่อน
If a model works on ollama it gets all the endpoints. But a non llava model can’t do image stuff. The whisper model doesn’t have the endpoint. It’s the runner in front that does.
@thenetgamer2 8 หลายเดือนก่อน ⁺¹
This should really include some resources to go to if one runs into problems. I've noticed that a lot of these videos you do are a bit too narrow of focus to actually help people get into this, unless they were already into something of a similar nature.
@michaelallen2971 3 หลายเดือนก่อน
Lol I'm still stuck on alot of things from these videos. I really just wanted to do one of his recent videos incorporating web search in local llama responses. But no real instructions
@Codegix 6 หลายเดือนก่อน
Thanks Matt!
@ToweringToska 3 หลายเดือนก่อน
This is way too woofin' hard for me, I thought I'd simply right-click and Save As a file on Huggingface and save it to whichever directory Ollama wants them. But I need to convert them to work, and to select a quantization type? My thoughts get hazy as soon as simply getting and placing a file requires me to learn CMD commands. = ,=;
@technovangelist 3 หลายเดือนก่อน
Its not for everyone. it’s a dev tool first and requires some basic cli skills...
@ToweringToska 3 หลายเดือนก่อน
@@technovangelist That's fine, I'm going to try out LM Studio next, Oobabooga was quick and easy to use but FlashAttention in it only supports GPUs as new as Ampere now, unlike my 1080, so I'm looking for something quicker that I can use.
@technovangelist 3 หลายเดือนก่อน
Omg. Ollama is sooo much easier than either of those. No question
@technovangelist 3 หลายเดือนก่อน
Just get the model you want from ollama. Too much work getting them from hf
@benevolencemessiah3549 3 หลายเดือนก่อน
@@technovangelist respectfully, I’d disagree. I have some custom finetuned and merged models and corresponding quants that I made but I can’t quite figure out how to convert them to a Modelfile. Maybe it’s the Go syntax throwing me off. But incidentally, since these are GGUF files, couldn’t the relevant instruction metadata be imported automatically? And the GGUF parameters templated into the generated Modelfile? Maybe I’m missing something, but why not just use llama.cpp/GGUF files directly? I’m quite surprised how much trouble I’m having despite Ollama being a quite popular tool.
@cloudsystem3740 7 หลายเดือนก่อน
how to convert gpt2 ?
@userou-ig1ze 11 หลายเดือนก่อน
This guy is a beast, when does he sleep?! Keep it up!!! Thanks!!! I was just wondering today how to do that, especially since many models don't provide this standard format easy to import in ollama.
Will ollama support the openai http api format, so it can be integrated easier into autogen studio? ^^
Will ollama support easier RAG and web requests from console?
@technovangelist 11 หลายเดือนก่อน ⁺¹
so will it support openai? i doubt it. Will ollama do rag on the console? that’s a bit out of scope for the project. But there are plenty of extensions that are doing it well.
@technovangelist 11 หลายเดือนก่อน
re openai. just kidding, watch out for the next release video
@PleaseOpenSourceAI 11 หลายเดือนก่อน
That was good! Thanks! I'd be interested to see how to make quantized llava models. Even better if it would be new moondream1 vision model by vikhyat
@technovangelist 11 หลายเดือนก่อน
Interesting. Thanks.
@darthcryod1562 10 หลายเดือนก่อน
great video! followed this video just to try running ollama with mistral and the emoji model mentioned in this video, but i see that is so painfully slow in my win11 even though i got 22GB ram and amd 6800S video card, anyone facing same issue on windows? i tried running using wsl2 and it was a bit faster but still slow compared to what these videos show. any suggestions? ran on my M1 mac 16G and its even faster than what wsl2
@technovangelist 10 หลายเดือนก่อน
I assume that’s an AMD card. I don't think amd support is enabled yet on Windows
@darthcryod1562 10 หลายเดือนก่อน
@@technovangelist - could be, also windowsdefender is goiing crazy with the ollama.exe detected Trojan:Script/Wacatac.B!m will remove ollama, this is concerning that installer has a trojan!
@technovangelist 10 หลายเดือนก่อน ⁺¹
the ollama team is working with msft to get them to fix defender because it is broken on this. there is no trojan in ollama, or the hundreds of other tools that are using the latest go compiler that this is having a false positive on.
@diegocaumont5677 3 หลายเดือนก่อน
Issues with architectures that aren't supported.
@diegocaumont5677 3 หลายเดือนก่อน
unknown architecture LlavaQwenForCausalLM
@technovangelist 3 หลายเดือนก่อน
Yup, can't work with architectures it doesn't know about.
@user-he8qc4mr4i 9 หลายเดือนก่อน
this seems like a pin in the neck!
@technovangelist 9 หลายเดือนก่อน
For models not already on ollama this process is a single command and done in 5 minutes. It’s pretty painless. And this process goes away soon. It’s an old video. But it’s still faster than anything else out there
@puruzsuz31 28 วันที่ผ่านมา
please keep boomer jokes with you.
@technovangelist 28 วันที่ผ่านมา
Had to watch it again. I don’t have any jokes in this one.
@technovangelist 28 วันที่ผ่านมา
Oh was that the problem? I didn’t include any. Got it
@JarppaGuru 9 หลายเดือนก่อน
it allready was model. how do make own model from scratch. nobody know LOL. and we not need do it bcoz you did it and share it. some take your model and quantize again LOL. nobody start scratch
@technovangelist 9 หลายเดือนก่อน
The example I showed was just the model weights. In ollama a model is everything to make it useful. The weights is just a part of it, along with the system prompt and template. There are plenty of places that show how to make a model from scratch. The downside is that no one has done it for less that $100k.
@JarppaGuru 9 หลายเดือนก่อน
@@technovangelist can be done. take long. pc cost 4k how many years need train LOL. i was kinda mean how make model what has. hi mom, and ai answer hi son. it not take long. just want knoiw all commands LOL.
lets train model. use AI make question and answer then train lol. i not get it xD we allready had jarvis
@Simone-ek9hb 8 หลายเดือนก่อน ⁺¹
PS C:\programmazione\ollama\ollama-ita> docker run --rm -v .:/model ollama/quantize -q q4_0 /model
Traceback (most recent call last):
File "/workdir/llama.cpp/convert.py", line 1658, in
/workdir/llama.cpp/gguf-py
Loading model file /model/model-00001-of-00004.safetensors
Loading model file /model/model-00001-of-00004.safetensors
Loading model file /model/model-00002-of-00004.safetensors
Loading model file /model/model-00003-of-00004.safetensors
Loading model file /model/model-00004-of-00004.safetensors
params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_kv=8, f_norm_eps=1e-05, n_experts=None, n_experts_used=None, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=, path_model=PosixPath('/model'))
main(sys.argv[1:]) # Exclude the first element (script name) from sys.argv
File "/workdir/llama.cpp/convert.py", line 1614, in main
vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
File "/workdir/llama.cpp/convert.py", line 1409, in load_vocab
path = self._select_file(vocabtype)
File "/workdir/llama.cpp/convert.py", line 1384, in _select_file
raise FileNotFoundError(f"{vocabtype} {file_key} not found.")
FileNotFoundError: spm tokenizer.model not found.
anyone?

ต่อไป

เล่นอัตโนมัติ