3. How to use the Ollama.com site to Find Models

Matt Williams

มุมมอง 10 052

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ธ.ค. 2024
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 45

@fabriai 4 หลายเดือนก่อน ⁺²
After watching this video, I can't stop singing "Das Model" from Kraftwerk. Thanks, Matt; this course is awesome.
@talktotask-ub5fh 2 หลายเดือนก่อน
Hi Matt, great content.
I loved this way, with the subtitled videos.
Thanks!
@em22en 4 หลายเดือนก่อน
Loved the hints to choose the best model for the problem you want to solve
@vexy1987 4 หลายเดือนก่อน ⁺¹
Thanks for these Matt. Super useful. I hope you'll continue through to Open WebUI and it's more advanced features.
@squartochi 4 หลายเดือนก่อน
Thanks for taking time to make these videos!
@AK-ox3mv หลายเดือนก่อน ⁺¹
6:10 a video on benchmarks world is so necessary
@CuvelierPhilippe 4 หลายเดือนก่อน ⁺¹
Thanks for this nex course
@nielsstighansen1185 4 หลายเดือนก่อน
Thanks for this and all af your videos.
How much is “a lot of extra memory” ?
Would 32GB RAM be enough or do I need 128GB RAM on new M4 MacBook?
Llama3.1 runs just fine in 32GB RAM
@technovangelist 4 หลายเดือนก่อน
Depends on the size of the model, the size of the max context, and the size of the context you are using. There isn't a great calculator either
@jimlynch9390 4 หลายเดือนก่อน ⁺¹
Hey, Matt. This is a spot on topic in a highly desirable and necessary course. Thank you. Just one question, You mentioned to be careful setting the context size 'cause you might run out of memory. Is that CPU or GPU memory? If you have a bit of GPU VRAM, does the main memory get used for more than just what a program might normally use for program storage and temporary data?
@andrewzhao7769 4 หลายเดือนก่อน
thank you very much Matt, this is really helpful
@unokometanti8922 4 หลายเดือนก่อน
Great stuff. As usual I’d say. So, other than ‘hit and miss’ approach…any possible way you might suggest for hunting down the right model to use with Fabric, for instance?
@technovangelist 4 หลายเดือนก่อน ⁺¹
Definitely not hit and miss. Try a lot and be methodical. Find the best one for you.
@MoeMan-f2w 4 หลายเดือนก่อน
TBH thought it would be a boring basic subject 😅 boy I was wrong!
Thanks for the video ❤ keep it up
@willTryAgainTmrw 2 หลายเดือนก่อน
What does "K_L/M/S" etc mean for quantized models? Why are L larger than M for same quantization?
@CrazyTechy 4 หลายเดือนก่อน
Matt, thanks for your content. Is there an Ollama model that you can use to check for plagiarism? I am creating short articles using ChatGPT. Another question. Is there a command that can interrupt llama3.1 while it’s outputting an answer? /bye doesn’t work.
@technovangelist 4 หลายเดือนก่อน
Ctrl c will stop.
@technovangelist 4 หลายเดือนก่อน
I don’t think a model will check but that seems a good use for rag. Do a search for similar content, chunk it up and your comparison article. Then similarity search. If it has a bunch of chunks very similar to content in any one other article it would be another piece of evidence pointing to plagiarism. But it might still need some assessment to figure it out for sure.
@CrazyTechy 4 หลายเดือนก่อน
@@technovangelist Matt, I now understand RAG and how you can use it to extend an LLM, but I won't be able to implement your very good idea. But, I see how you think--deep tech. So, what do you think about Grammarly? It will check text, and it's just $12 a month. When I graduated in 1973, they only had mainframes. I worked for Chrysler (MI Tank). And worked with Madonna's father, Tony Ciccone.
@technovangelist 4 หลายเดือนก่อน
I used to use grammarly until the company I worked at banned the use of it for security issues.
@CrazyTechy 4 หลายเดือนก่อน
@@technovangelist OMG. I will need to do a search on that. I worry about my solar powered WiFi camera I bought from Amazon and that WiFi power adapter my wife uses to activate our coffee maker in the morning. Thanks.
@jonasmenter3640 3 หลายเดือนก่อน
What would you day is the best model for pdf to json tasks? :) and is there a way to get the output without linebreaks? greetings
@spacekill 4 หลายเดือนก่อน
"If, for example, I have more than one model downloaded, and one is chat, another is multimodal, and another generates images, can I make it so that Ollama chooses which model to use based on a prompt, or does it by default use the one you've chosen with the `ollama run` command?"
@technovangelist 4 หลายเดือนก่อน
It doesn’t do that. But you could build an app that does that.
@spacekill 4 หลายเดือนก่อน
@@technovangelist ok . 100 Thanks
@NLPprompter 3 หลายเดือนก่อน
can i do this in model file?
FROM llama3.1
PARAMETER num_ctx 130000
or should i set that in environment instead?
@technovangelist 3 หลายเดือนก่อน
Yup. That goes in the modelfile.
@NLPprompter 3 หลายเดือนก่อน
@@technovangelist hm... so... how do i check if this custom model is successfully using 130k context rather than 2k default context?,
I'm wondering this because... here is the story:
i was try zed code editor and load deepseek-coder-v2 as expected in zed it show 2k context length (i believe it is the default deepaeek... ollama)
then i do ollama create mydeepseek with max_ctx 130k specified in modelfile
back to zed load that mydeepseek in zed and.... it still show 2k maximum context length
I re check the model file and it is still set at 130k
scratching my head, then i decide to edit zed configs.json or is it settings.json i forgot which file name but anyway in there i specified mydeepseek in ollama should have maximum token 130k
then re open zed wala... it is 130k max.
then i wonder how do i check mydeepseek maxctx, i believe zed have default max token 2k global ollama setting unless user specify it, or.... my model file is wrong typed.
@technovangelist 3 หลายเดือนก่อน
The parameter in num_ctx not max_ctx as you show in this text
@technovangelist 3 หลายเดือนก่อน
You can also set it in the api. Maybe zed is overriding what is set in the modelfile.
@technovangelist 3 หลายเดือนก่อน ⁺¹
But the best way to set it for ollama is in the modelfile. That’s not an environment var thing.
@muraliytm3316 4 หลายเดือนก่อน
Hello sir can you explain me how to install cuda drivers and make ollama use gpu for running models
@technovangelist 4 หลายเดือนก่อน
Follow nvidia instructions
@technovangelist 4 หลายเดือนก่อน
Ollama will use the gpu automatically if it’s supported. If you have a very old gpu it won’t work. What gpu do you have
@muraliytm3316 4 หลายเดือนก่อน
@@technovangelist I have nvidia gtx 1650 4gb sir, Thank you very much for responding fastly and I have an issue of antimalware executable running on my windows laptop and it is consuming a lot of memory how can i fix that
@technovangelist 4 หลายเดือนก่อน
Easy. Remove that software and don’t do anything silly with your computer
@technovangelist 4 หลายเดือนก่อน ⁺¹
I don’t see the 1650 being supported. The ti version is.
@FrankSchwarzfree 4 หลายเดือนก่อน
YEAH!!!!!!!
@mpesakapoeta 4 หลายเดือนก่อน
How can i download a model in .gguf format locally,my reason is am transferring the model to a computer being used remotely in a health facility with no phone or internet network.
@technovangelist 4 หลายเดือนก่อน
You want to dl the model from hf? And then add to ollama? Or you want to do with ollama then transfer to a different computer? Ollama uses gguf but I don’t understand exactly what you want
@FlorianCalmer 4 หลายเดือนก่อน
It's too bad, we used to be able to filter by newest models including the user submitted ones. It was fun discovering new user models but now there's no way to do that.
@ZakkFromSource 2 วันที่ผ่านมา
Latest meaning the most popular tag is dumb, hopefully this gets fixed in the future. It would make a lot more sense for latest to mean the most recent version added and popular to mean the most poplar.

ต่อไป

เล่นอัตโนมัติ