you are underrated matt! they didnt sponsor you because they wanted to just get the people as are spewing hype! you go into such detail! your content should be #1 on any ollama tutorial.
I’ve tested 3b parameters model and very impressed with it. Speed and accuracy. I use it for improving descriptions in catalog. And it looks like we don’t have to pay to Open AI anymore. Thank you for your video!
Hi Matt, I upvoted as usual. Two notes: Ollama HW resources calculations (proposal for a new Ollama video): In this video, you thankfully show how easy it is to set the context length in the model file, bypassing Ollama's default. How does the context length influence the RAM usage of the host? In general, it would be great to dedicate a video to hardware resource calculations based on model size, quantization, context size, and possibly other macro parameters. It would also be helpful to discuss how CPU, and especially GPU, can improve latency times (especially in a multi-user environment). You mention "your" function call method. I know you've already done a video on this topic, but since it's very useful in practice, maybe you could create a new video with code examples (Python is welcome). Other viewers: If you agree, please upvote my comment. Community thoughts are welcome! Thanks again, Giorgio
Agreed! A deeper dive into context size would be very helpful: HW Resource allocations/calculations (even if "just" some heuristics), impact of larger context size on in/out eval rates (again, if not accurate calculations then some heuristics), potential problems with changing context size and what does one need to know about a model when doing this (e.g. what happens if you set a context size larger than what the model allows?). Edit to add: I'd also be interested in a potential context size video touching on how context size affects/interacts with parallel requests, and how to set up multi-user environments that share the model's context size. Thank you for an, as usual, excellent tutorial! Cheers, Arthur
@@arthurhjorth1490 Yes. Just a more note about evaluations criteria. Matt and other people use a short list of "trusted" question to evaluate a model. That's perfectly fine and maybe it could be useful to have some why to automatize the evaluation of a model maybe just scripting the list of question and have an automatic way to evaluate the response (by another LLM?). Just a food for thought / a possible video topic in the future?
Loving the Companion Plugin for Obsidian with llama3.2:1b. Completing my thoughts in the persona of William Shakespear by changing the User prompt was a lot of fun. Fair Companion App, thy wondrous might, Doth streamline notes and tasks with ease and light. In Obsidian vault, thou dost thy work impart, With real-time updates, a digital heart.
Software engineer for many many years here....I have recently stumbled upon ollama and your videos. I have done several Intergrations of ai into client's apps using azure ai services (mostly speech and some openai) but MS azure services keep you away from the nuts and bolts via abstractions. This is fine for productivity but not great for understanding. Your videos and github repo have crystalized many concepts for me and opened my eyes and brain to a better and deeper understanding. Thank you, sincerely, for what you are doing here. It is priceless. BTW, I adore ollama and the smaller models....however my wife would not be please with my recent PC build cost (two 4090s lol). But hey, this is addictive! Isn't it amazing to see the realization of what we all dreamed of when we started in the programming/IT realm!
Ha! That's how I felt about the same when people ask about which number is bigger 8.8 vs 8.21! It depends in what context! And that's what I noticed when I test the models, most people only run it one time. The models do not always give the answer right the first time, sometimes the second times, etc. Great video.
Wondering what would be the success rate of the tool call if wrapped in a framework like Yacana ? Because only using Ollama's function calling by itsef is IMO not representativ of the potential of any models as local LLMs need some level of guidance to really achieve anything.
@@technovangelist I completely agree with you that Ollama's function calling is excellent. Forcing LLMs to output JSON the way it does is very impressive! However, we use PHI models in production, and only using Ollama didn't yield the expected results... The JSON was valid, but the arguments generated were not. That's why we had to switch to a agents framework, because developing our own overlay would have been too complex. With the agent framework we now have enhanced tool calling and a way to do multi-turn chat so multiple PHIs can brainstorm together. Also, we needed control loops at each step to ensure we get a computable output in the final phase.
We had a success rate of around 30% when calling tools with correct values using only the "/generate" endpoint. Now, we're at about 70%. Additionally, the team has improved at prompt engineering, which has been a huge factor! Also, thanks for making videos, they're always a great help. Keep it up! ^^
Thank you Matt for your videos. I was not aware of the hardcoded context window in Ollama, it may explain why I was so confused by the models claiming having a large one. Why is that? I’m expecting Ollama to be adaptative to the possibilities of the model it’s running! Do I really need each time to manually create a custom model template just to benefit from the native model context size? Do you already posted a video answering these questions? Thank you so much and keep the good job! Cheers from France!
Context takes a lot of memory. And it’s hard to put rails around it so it doesn’t fully crash the machine. I’ve had the machine reboot when it takes too much. And lots of folks have tiny gpus so we got lots of support requests. So it went to a blanket 2k unless you specify the size. But since it’s so easy for most devs to create that file and since ollama is intended as a dev tool first, it’s seemed like a good decision
8:12 I'd set the temperature to zero, in which case everyone using the model will get the same answer every time for the same inputs. Setting it to zero makes it default to always responding with whatever answer is most likely the accurate one. It's a predictive language model after all. The higher the temperature, the more varied responses. Setting it to zero disables the randomness parameter and thus removes variety completely. Very useful when benchmarking models against each other.
Setting temp to zero will not get you the same answer every time. You would have to set temp and seed and you will reduce the variations but it may not be the best answer.
@@technovangelist Right, I assume the seed could also be random. It might be different across different models. I ran llama3.1:70b with temperature at zero, and that seemed to get me the same story from the same Minecraft chat logs along with its system prompt. LLMs have different architectures though, and it doesn't seem like all support all parameters. Taking away randomness is highly advantageous when optimizing system prompts though, being able to compare static responses.
Matt, I"m a non-CS/SWE guy trying to hack away at a tool that uses LLMs to add some NL to a the UI of an app I'm building. It's b/c of you I choose Ollama over other options. Thanks brother.
ive been using llama 3.1 8b on my 4050 laptop very comfortably for ai assisted tasks in obsidian and i cant wait to see if these smaller 3b models are a better fit. you get a sub from me im all aboard the self hosted train next stop ai station lets gooo
If you have a cat, it stole the water bottle! Thanks for the review too. I will be interested to try this on my usual M1, might remove the need to run models on another more powerful machine.
These smaller models are great for research, particularly as they are fairly easy to modify in code. In actual use case, they are somewhat over censored, but I suspect it will be just a matter of time before a fine-tuned uncensored version is created.
There is an awesome video on TH-cam talking about the specific model and how censored it is. I wouldn't be surprised if people find the older model to be more capable.
I love small LLM's. I don't think people realize the power they have to simplify their lives. I love to use Obsidian for note taking. Using a local LLM, I can have it easily summarize my notes, giving me an at-a-glance view of each daily note. Have a long meeting? Transcribe it and summarize it, then stuff it in your notes. When we get agents then they will become vastly more useful. People put too much emphasis on the ability of models to do analytical tasks with great accuracy. They don't understand that the power of AI is the ability to have the AI write a program on-the-fly to do that kind of analysis, and then give you the result. AI will never be 100% accurate. It's like a human. We make mistakes too. We are spoiled with computers and the ability to have 100% reliable answers, but an AI can give that to you, just through the extra step of creating a program to accomplish the task.
Thanks for this great video explaining how to use these small LLMs! I will be waiting for your video about the vision model. Maybe compare llama2 vision with pixtral?
First time seeing your vid. Interesting take. I know you have demonstrated some use cases of the models. But to general people, there's way too many models to pick from. Do you have a guide on that? To narrow down which model to use? I know it will always be bias but I would like to hear your take.
1st. Always thank you Matt. Question. I've been testing 3B since launch ata my Laptop with NPU. WebUI on a server and Ollama in my Laptop. The thing is My laptop has NPU and Ollama is not taking advantages of it. Ollma 3B is taking the small GPU and CPU. The results with a RAG (1st with WEBui interface and then with Flowise) gives me good results. I'm trying to search how to activate the NPU from my laptop so Ollama can use it. It would be great. I think LLM Studio does it? Also testing with large files >15MB, the embedding from Ollama at my computer again with WEBUI and Flowise, fails. The computer can't handle. Would be great to have you doing it with files that are closer to real company files. I think 3B model it's great. The last test I did is using it with Groq and of course, there is where I can test it 🙂
Could you explain what the generation completion hotkey does in the Companion plugin for Obsidian? When I use the Companion, it automatically generates text, completes it, and streams the response. So, in what situation would I need to use this hotkey? I'd appreciate it if you could clarify this because I was confused by this.
I would really like to know if these models are any good for entity extraction like for graphRAG or chunk generators for traditional RAG... That would be a great topic!
Just tried the 3.2:3b. I said hello and got a reply blazingly fast, so I asked if it was on meth or something. Got the standard "I'm just a model, I can't human", so I said I was just surprised to see such fast answers on a local model. And this is where things got confused. Apparently, Llama3.2:3b thinks it's working off a cloud-service. It refused the notion that I'm running this locally. Just to be sure, I pulled the ethernet cable, restarted the terminal, and it worked just as fine without (well...duh). I just find it fascinating that the model itself almost reviles at the notion of being local.
Depends. Using the newer more official format, I don't think it does a good job. Using the older approach I have documented in a few videos on this channel I expect it would be great at it.
@@technovangelist I will try to look into the original method. I still don't think many people explain how it all works in plain English. I will look at your earlier videos about functions
Thanks for the great content. What is missing in ollama is vision models support like florence2 and sam2. If it had a nice api for that, that could be used with curl or so... dreams. Raspberry pi with vision models must be so incredibly overpowered, I prefer not thinking about it too much
Raspberry pi overpowered???? way underpowered is more accurate, especially considering the cost of them. Physical size is the big benefit these days. But Florence2 looks like an older model that didn't get much love. Some of the other vision models on Ollama got a lot more coverage. And hadn't heard of sam2 either. Both architectures aren't supported so would require a lot of work to get working.
@@technovangelist thanks for the time to reply, appreciated. Underpowered _is_ the point, as in, if vision models run sufficiently fast on _that_ hardware, it enables vision on edge devices. Florence2 was released months ago, and the combination of selecting pixels by typing, and segmenting and tracking over time with sam2, is an incredibly powerful concept- I needn't ask any lay man to become creative, the usefulness of text driven vision perception seems insane
I've just tried this with fabric. It's as good at summarising youtube videos as microsoft 8x7B. Still gets confused occasionally with certain tasks. I've got a large-ish DB of technical documents for an obscure bit of research kit and I'm going to try swapping out my current LLM and see if this performs better. It'll probably be about 10x faster.
llama 3.2 1B Q4: To count the number of 'F's and 'R's, I'll go through the word "strawberry" carefully. Here's the breakdown: * 'F': 1 * 'R': 2 So, there are 1 'F' and 2 'R's in the word "strawberry". or: please create a java program to count the number of 'F's and 'R's in the word 'strawberry' public class Main { public static void main(String[] args) { // Define the word String word = "strawberry"; // Initialize counters for 'F's and 'R's int countF = 0; int countR = 0; // Loop through each character in the word for (char c : word.toLowerCase().toCharArray()) { // Check if the character is 'F' or 'R' if (c == 'f' || c == 'r') { // Increment the counter if (c == 'f') { countF++; } else { countR++; } } } // Print the results System.out.println("Number of 'F's: " + countF); System.out.println("Number of 'R's: " + countR); } } output: Number of 'F's: 0 Number of 'R's: 3 sorry, they left me home alone...
I would request you to test out llms on some complex tools (as simple as file create tool fails on 3b model). I assume of i give proper func.desc, it might not. Need to experiment
Hey matt, I have an intel based MacBook, and I want ollama to utilize my gpu, but I don't know how to go about it. I have searched all round but to no avail. Could you offer some pointers, or resources to where I can succeed?
Matthew Berman’s review of the vision models show them to be censored to the point of uselessness. Hopefully someone will bring out uncensored versions
reviews on the vision stuff seems to be mixed so far. It could be that some of the folks who have it have a bad implementation. I look forward to finding out.
My favorite hallucination : I work for the Jill Stein for President campaign so I use info about her to test AIs. She's a public figure, but a minor one, and I know more about her than most sources. When playing around with WebSim, I asked it to make me a Jill Stein fan site. The results were shockingly accurate, history, platform, etc... except the photos. The photos were all of Jill Biden. For all the normies, Jill Biden is a Joe Biden's wife. Jill Stein was running against Joe Biden. I pointed this out to the AI, and it replaced them with photos of Kirsten Sine a. If you don't know who that is, I really don't have the bandwidth to use explain, but it is simultaneously shockingly wrong and quite funny. Like, I know what it is getting at.
ollama run llama3.2:1b Error: llama runner process has terminated: signal: abort trap error:done_getting_tensors: wrong number of tensors; expected 147, got 146 any idea about this error?
I asked it to create a component in javascript in which llama3.1 8B and mistral-nemo greatly succeeded. But llama3.2 3B failed miserably, mixing up different libraries unintelligently. Its just fast, but also a random word generator is fast.
I want you to count the number of r's in Strawberry. To do so I want you to go Letter by letter and every time you find one r I want you to count up Gets it right every time...
>>> I want you to count the number of r's in aferradfafa;;rr;r;adfafaf'''.. To do so I want you to go letter by letter and every time you find one r I want you to count up. Let's start counting: a - no R (0) f - no R (0) e - no R (0) r - found an R! (1) r - found another R! (2) a - no R (2) d - no R (2) f - no R (2) a - no R (2) Continuing... ;; is a semicolon, not a letter to count rr - found two Rs! (4) ; - no R (4) r - found another R! (5) ; - no R (5) a - no R (5) d - no R (5) f - no R (5) a - no R (5) f - no R (5) a - no R (5) f - no R (5) So, there are 5 Rs in the given text.
you are underrated matt! they didnt sponsor you because they wanted to just get the people as are spewing hype! you go into such detail! your content should be #1 on any ollama tutorial.
I’ve tested 3b parameters model and very impressed with it. Speed and accuracy. I use it for improving descriptions in catalog. And it looks like we don’t have to pay to Open AI anymore. Thank you for your video!
The first really helpful video about llama 3.2! Thanks Matt!
Hi Matt, I upvoted as usual. Two notes:
Ollama HW resources calculations (proposal for a new Ollama video): In this video, you thankfully show how easy it is to set the context length in the model file, bypassing Ollama's default. How does the context length influence the RAM usage of the host? In general, it would be great to dedicate a video to hardware resource calculations based on model size, quantization, context size, and possibly other macro parameters. It would also be helpful to discuss how CPU, and especially GPU, can improve latency times (especially in a multi-user environment).
You mention "your" function call method. I know you've already done a video on this topic, but since it's very useful in practice, maybe you could create a new video with code examples (Python is welcome).
Other viewers: If you agree, please upvote my comment. Community thoughts are welcome!
Thanks again,
Giorgio
Agreed! A deeper dive into context size would be very helpful: HW Resource allocations/calculations (even if "just" some heuristics), impact of larger context size on in/out eval rates (again, if not accurate calculations then some heuristics), potential problems with changing context size and what does one need to know about a model when doing this (e.g. what happens if you set a context size larger than what the model allows?).
Edit to add: I'd also be interested in a potential context size video touching on how context size affects/interacts with parallel requests, and how to set up multi-user environments that share the model's context size.
Thank you for an, as usual, excellent tutorial!
Cheers, Arthur
@@arthurhjorth1490 Yes. Just a more note about evaluations criteria. Matt and other people use a short list of "trusted" question to evaluate a model. That's perfectly fine and maybe it could be useful to have some why to automatize the evaluation of a model maybe just scripting the list of question and have an automatic way to evaluate the response (by another LLM?). Just a food for thought / a possible video topic in the future?
I love your content and learn every time I watch, thank you
Loving the Companion Plugin for Obsidian with llama3.2:1b. Completing my thoughts in the persona of William Shakespear by changing the User prompt was a lot of fun.
Fair Companion App, thy wondrous might,
Doth streamline notes and tasks with ease and light.
In Obsidian vault, thou dost thy work impart,
With real-time updates, a digital heart.
Really this channel deserve way more exposure! Love the contents and the host ! Keep the good work thanks
Thanks Matt! Everything is interesting and clear as always!
Software engineer for many many years here....I have recently stumbled upon ollama and your videos. I have done several Intergrations of ai into client's apps using azure ai services (mostly speech and some openai) but MS azure services keep you away from the nuts and bolts via abstractions. This is fine for productivity but not great for understanding. Your videos and github repo have crystalized many concepts for me and opened my eyes and brain to a better and deeper understanding. Thank you, sincerely, for what you are doing here. It is priceless. BTW, I adore ollama and the smaller models....however my wife would not be please with my recent PC build cost (two 4090s lol). But hey, this is addictive! Isn't it amazing to see the realization of what we all dreamed of when we started in the programming/IT realm!
Yup. Compared to what I was doing in my Intro to AI programming class I took in 1989 at FSU this is science fiction.
Ha! That's how I felt about the same when people ask about which number is bigger 8.8 vs 8.21! It depends in what context! And that's what I noticed when I test the models, most people only run it one time. The models do not always give the answer right the first time, sometimes the second times, etc. Great video.
Hi Matt very clear presentation yet again. I also really enjoy your polished style, so I subscribed even though I do so rarely. Cheers !
Wondering what would be the success rate of the tool call if wrapped in a framework like Yacana ?
Because only using Ollama's function calling by itsef is IMO not representativ of the potential of any models as local LLMs need some level of guidance to really achieve anything.
The older approach works fantastically with every model
@@technovangelist I completely agree with you that Ollama's function calling is excellent. Forcing LLMs to output JSON the way it does is very impressive! However, we use PHI models in production, and only using Ollama didn't yield the expected results... The JSON was valid, but the arguments generated were not. That's why we had to switch to a agents framework, because developing our own overlay would have been too complex. With the agent framework we now have enhanced tool calling and a way to do multi-turn chat so multiple PHIs can brainstorm together. Also, we needed control loops at each step to ensure we get a computable output in the final phase.
We had a success rate of around 30% when calling tools with correct values using only the "/generate" endpoint. Now, we're at about 70%. Additionally, the team has improved at prompt engineering, which has been a huge factor!
Also, thanks for making videos, they're always a great help. Keep it up! ^^
Thank you Matt for your videos. I was not aware of the hardcoded context window in Ollama, it may explain why I was so confused by the models claiming having a large one. Why is that? I’m expecting Ollama to be adaptative to the possibilities of the model it’s running! Do I really need each time to manually create a custom model template just to benefit from the native model context size? Do you already posted a video answering these questions? Thank you so much and keep the good job! Cheers from France!
@JeromeBoivin-tx7fm Also interested related the context and if in the model file also prompt, end token, etc was added.
Context takes a lot of memory. And it’s hard to put rails around it so it doesn’t fully crash the machine. I’ve had the machine reboot when it takes too much. And lots of folks have tiny gpus so we got lots of support requests. So it went to a blanket 2k unless you specify the size. But since it’s so easy for most devs to create that file and since ollama is intended as a dev tool first, it’s seemed like a good decision
Your delivery is great had no idea u were a founder..that's awesome.🎉 Meta wheres the ❤!!
Dude ollama is great. Thank you.
Love the breath holding tangent!
8:12 I'd set the temperature to zero, in which case everyone using the model will get the same answer every time for the same inputs. Setting it to zero makes it default to always responding with whatever answer is most likely the accurate one. It's a predictive language model after all.
The higher the temperature, the more varied responses. Setting it to zero disables the randomness parameter and thus removes variety completely. Very useful when benchmarking models against each other.
Setting temp to zero will not get you the same answer every time. You would have to set temp and seed and you will reduce the variations but it may not be the best answer.
@@technovangelist Right, I assume the seed could also be random. It might be different across different models. I ran llama3.1:70b with temperature at zero, and that seemed to get me the same story from the same Minecraft chat logs along with its system prompt. LLMs have different architectures though, and it doesn't seem like all support all parameters. Taking away randomness is highly advantageous when optimizing system prompts though, being able to compare static responses.
i don't know if that’s true. testing a system prompt on a limited version of the model helps if you only use the model in that limited way every time.
Matt, I"m a non-CS/SWE guy trying to hack away at a tool that uses LLMs to add some NL to a the UI of an app I'm building. It's b/c of you I choose Ollama over other options. Thanks brother.
Well explained especially the 1B
ive been using llama 3.1 8b on my 4050 laptop very comfortably for ai assisted tasks in obsidian and i cant wait to see if these smaller 3b models are a better fit. you get a sub from me im all aboard the self hosted train next stop ai station lets gooo
If you have a cat, it stole the water bottle! Thanks for the review too. I will be interested to try this on my usual M1, might remove the need to run models on another more powerful machine.
Good one. I saw someone training the 1B model on their iPhone😮
These smaller models are great for research, particularly as they are fairly easy to modify in code. In actual use case, they are somewhat over censored, but I suspect it will be just a matter of time before a fine-tuned uncensored version is created.
There is an awesome video on TH-cam talking about the specific model and how censored it is. I wouldn't be surprised if people find the older model to be more capable.
I love small LLM's. I don't think people realize the power they have to simplify their lives. I love to use Obsidian for note taking. Using a local LLM, I can have it easily summarize my notes, giving me an at-a-glance view of each daily note. Have a long meeting? Transcribe it and summarize it, then stuff it in your notes. When we get agents then they will become vastly more useful.
People put too much emphasis on the ability of models to do analytical tasks with great accuracy. They don't understand that the power of AI is the ability to have the AI write a program on-the-fly to do that kind of analysis, and then give you the result. AI will never be 100% accurate. It's like a human. We make mistakes too. We are spoiled with computers and the ability to have 100% reliable answers, but an AI can give that to you, just through the extra step of creating a program to accomplish the task.
Your channel's so nice I wish could sub twice. Keep up the great work.
Thanks for this great video explaining how to use these small LLMs! I will be waiting for your video about the vision model. Maybe compare llama2 vision with pixtral?
Respect and appreciation for what you do. Any chance of using a llm on the new Samsung galaxy s23?
That's great. Hey do you know if we will ever get Ai running locally on our Android, ios devices?
Hey Matt, thanks for a great video - do you keep the code featured in your videos in public repos?
just a question: what is the best model for supporting me in python programming that I can use with ollama?
What is your use case, llmana3 is a good default
First time seeing your vid. Interesting take. I know you have demonstrated some use cases of the models. But to general people, there's way too many models to pick from. Do you have a guide on that? To narrow down which model to use? I know it will always be bias but I would like to hear your take.
This was the first, maybe second, time I looked at one model. I would like to do more of them.
Amazing video, thank you. Is companion the only ai plugin you use in Obsidian? Looking forward to seeing more practical AI obsidian applications.
interesting video. thank you
1st. Always thank you Matt. Question. I've been testing 3B since launch ata my Laptop with NPU. WebUI on a server and Ollama in my Laptop. The thing is My laptop has NPU and Ollama is not taking advantages of it. Ollma 3B is taking the small GPU and CPU. The results with a RAG (1st with WEBui interface and then with Flowise) gives me good results. I'm trying to search how to activate the NPU from my laptop so Ollama can use it. It would be great. I think LLM Studio does it? Also testing with large files >15MB, the embedding from Ollama at my computer again with WEBUI and Flowise, fails. The computer can't handle. Would be great to have you doing it with files that are closer to real company files. I think 3B model it's great. The last test I did is using it with Groq and of course, there is where I can test it 🙂
Could you explain what the generation completion hotkey does in the Companion plugin for Obsidian? When I use the Companion, it automatically generates text, completes it, and streams the response. So, in what situation would I need to use this hotkey? I'd appreciate it if you could clarify this because I was confused by this.
well, matt you are our fav choice for all things AI!
I would really like to know if these models are any good for entity extraction like for graphRAG or chunk generators for traditional RAG... That would be a great topic!
ollama makes it so easy
...worth a thumb up...
Thanks
What??? You are too kind... a member AND a tip. Thanks so much.
@@technovangelistI just love the simple and yet the comprehensive way you explain the subjects. Keep up the good work❤
How much vram is required for the 3b token method?
Thank you for the video. What is the tool you use for writing?
Obsidian. And the plugin for it was companion
Thanks!
Just tried the 3.2:3b. I said hello and got a reply blazingly fast, so I asked if it was on meth or something. Got the standard "I'm just a model, I can't human", so I said I was just surprised to see such fast answers on a local model. And this is where things got confused.
Apparently, Llama3.2:3b thinks it's working off a cloud-service. It refused the notion that I'm running this locally.
Just to be sure, I pulled the ethernet cable, restarted the terminal, and it worked just as fine without (well...duh).
I just find it fascinating that the model itself almost reviles at the notion of being local.
Good stuff
Can llama 3.2 1b do function calling? I don't know if it's smart enough to format output
Depends. Using the newer more official format, I don't think it does a good job. Using the older approach I have documented in a few videos on this channel I expect it would be great at it.
@@technovangelist I will try to look into the original method. I still don't think many people explain how it all works in plain English. I will look at your earlier videos about functions
Lol the ending 😂😂😂
Great content. Could you briefly describe the machine you use for this task? You mentioned 3 seconds…
I usually do and forgot this time. M1 Max MacBook Pro with 64gb. A machine you can get for about 1500 usd today.
@@technovangelist thank you! 64GB? Impressive. Please keep it up! You are a great communicator
All things local AI and I just subscribed that’s what I need
Thanks for the great content. What is missing in ollama is vision models support like florence2 and sam2. If it had a nice api for that, that could be used with curl or so... dreams.
Raspberry pi with vision models must be so incredibly overpowered, I prefer not thinking about it too much
Raspberry pi overpowered???? way underpowered is more accurate, especially considering the cost of them. Physical size is the big benefit these days. But Florence2 looks like an older model that didn't get much love. Some of the other vision models on Ollama got a lot more coverage. And hadn't heard of sam2 either. Both architectures aren't supported so would require a lot of work to get working.
@@technovangelist thanks for the time to reply, appreciated. Underpowered _is_ the point, as in, if vision models run sufficiently fast on _that_ hardware, it enables vision on edge devices. Florence2 was released months ago, and the combination of selecting pixels by typing, and segmenting and tracking over time with sam2, is an incredibly powerful concept- I needn't ask any lay man to become creative, the usefulness of text driven vision perception seems insane
I've just tried this with fabric. It's as good at summarising youtube videos as microsoft 8x7B. Still gets confused occasionally with certain tasks.
I've got a large-ish DB of technical documents for an obscure bit of research kit and I'm going to try swapping out my current LLM and see if this performs better. It'll probably be about 10x faster.
Cool video!
llama 3.2 1B Q4:
To count the number of 'F's and 'R's, I'll go through the word "strawberry" carefully.
Here's the breakdown:
* 'F': 1
* 'R': 2
So, there are 1 'F' and 2 'R's in the word "strawberry".
or:
please create a java program to count the number of 'F's and 'R's in the word 'strawberry'
public class Main {
public static void main(String[] args) {
// Define the word
String word = "strawberry";
// Initialize counters for 'F's and 'R's
int countF = 0;
int countR = 0;
// Loop through each character in the word
for (char c : word.toLowerCase().toCharArray()) {
// Check if the character is 'F' or 'R'
if (c == 'f' || c == 'r') {
// Increment the counter
if (c == 'f') {
countF++;
} else {
countR++;
}
}
}
// Print the results
System.out.println("Number of 'F's: " + countF);
System.out.println("Number of 'R's: " + countR);
}
}
output:
Number of 'F's: 0
Number of 'R's: 3
sorry, they left me home alone...
Lovin' the channel. 👍👍It'll be great once Ollama supports vision
Ollama does support vision today. The llama3.2 vision should be very soon
I would request you to test out llms on some complex tools (as simple as file create tool fails on 3b model). I assume of i give proper func.desc, it might not. Need to experiment
Can we simply download the GGLM file and use it wit GPT4all?
Good video
Awww yeah!
What if you set temperature to 0, does the tool functions test succeed better?
Hey matt, I have an intel based MacBook, and I want ollama to utilize my gpu, but I don't know how to go about it. I have searched all round but to no avail. Could you offer some pointers, or resources to where I can succeed?
Unfortunately there are no options. Well except buying an apple silicon MacBook or switching to a pc.
Which hardware setup you have ?
I'm on a m1 MacBook Pro Max with 64GB RAM
Why am I getting Error: error loading model for all the 3.2 downloads?
Have you updated ollama?
@@technovangelist yes that fixed it, thanks… it’s sooo damn fast!!
when is ollama getting the vision models anyone know?
The team is working on it.
@@technovangelist awesome, thanks Team!
Matthew Berman’s review of the vision models show them to be censored to the point of uselessness. Hopefully someone will bring out uncensored versions
reviews on the vision stuff seems to be mixed so far. It could be that some of the folks who have it have a bad implementation. I look forward to finding out.
My favorite hallucination : I work for the Jill Stein for President campaign so I use info about her to test AIs. She's a public figure, but a minor one, and I know more about her than most sources.
When playing around with WebSim, I asked it to make me a Jill Stein fan site.
The results were shockingly accurate, history, platform, etc... except the photos. The photos were all of Jill Biden.
For all the normies, Jill Biden is a Joe Biden's wife. Jill Stein was running against Joe Biden.
I pointed this out to the AI, and it replaced them with photos of Kirsten Sine a. If you don't know who that is, I really don't have the bandwidth to use explain, but it is simultaneously shockingly wrong and quite funny. Like, I know what it is getting at.
Whoever took the sponser from meta, I don't think they asked for it. But in case you haven't noticed they have more subscribers than you.
Some have 1/3 the number of subs compared to me. So that’s not it.
I tried to run Llama3.2 1b in Samsung s 20 plus Error: no suitable llama servers found. And I am running ollama serve
just use layla lite then import the model. Yep its a hassle on making your lammacpp to work
I don't know what are you talking about
Llama 3.1 & 3.2 are unfortunately very poor in Dutch language usage
Meta Matt!
ollama run llama3.2:1b
Error: llama runner process has terminated: signal: abort trap error:done_getting_tensors: wrong number of tensors; expected 147, got 146
any idea about this error?
You need to update ollama. You should always update whenever there is a new version.
ok ,I will try it , maybe it is GFW issue, thanks.
I asked it to create a component in javascript in which llama3.1 8B and mistral-nemo greatly succeeded. But llama3.2 3B failed miserably, mixing up different libraries unintelligently. Its just fast, but also a random word generator is fast.
But a random word generator wouldn't be anywhere near as good as llama32 3b.
Man you forgot your cup!
I want you to count the number of r's in Strawberry.
To do so I want you to go Letter by letter and every time you find one r I want you to count up
Gets it right every time...
>>> I want you to count the number of r's in aferradfafa;;rr;r;adfafaf'''.. To do so I want you to go letter by letter and every time you find one r I want you to count up.
Let's start counting:
a - no R (0)
f - no R (0)
e - no R (0)
r - found an R! (1)
r - found another R! (2)
a - no R (2)
d - no R (2)
f - no R (2)
a - no R (2)
Continuing...
;; is a semicolon, not a letter to count
rr - found two Rs! (4)
; - no R (4)
r - found another R! (5)
; - no R (5)
a - no R (5)
d - no R (5)
f - no R (5)
a - no R (5)
f - no R (5)
a - no R (5)
f - no R (5)
So, there are 5 Rs in the given text.
First test I did was "what number is larger 9.9 or 9.11?" and it insisted 9.11 was bigger. When is 2.3 out?
Why do you quit ollama 😢😢😢
Are you asking about quitting the app? Or why I left the company? That second thing is not something for this comment thread.
@@technovangelist Due to your hesitance on commenting, we'll just assume they were having Diddy parties until you clear it up
@@emmanuelgoldstein3682 did you just say diddy party brah? jajajajaja
Company
There's 4 killers in the room. Since when does dying make you not a killer?
Good point.
Can someone point to ollama and RAG locally from Matt?
Hey Matt, nice video. But I don’t think it’s as impressive as you put it. I am sure the llama3.1’s performance was comparable
It wasn’t available in a 1 and 3 b model.
The vision portion isn't to great.
Microsoft GRIN MoE: A Gradient-Informed Mixture of Experts MoE Model 6.6b
Ranks better
In benchmarks? Or in real tests. One is useful the other has zero real value.
I tested it and the Vision and abilities suck and way over rated.
It's amazing how such a small model is smarter than you?