Great videos, Sam. Learnt so much from your videos. RE: Llama vision model on Ollama, I have been trying to get it to work with both pictures and tools but it looks like it can only do pictures and structured output and no tool calling support yet. Any idea on how to get around this limitation?
Very informative video regarding Vision based models with structured outputs. If possible, could you also make a video on a simple langchain or langgraph app using vision based models of ollama for reading and describing into structured outputs, all the images in a document let's say pdf? Thanks in advance
do you know if this model would be good for getting the coordinates of stuff in images? For example I would like to get the coordinates of a dog in an image, the model might return a bounding box [[x1, y1], [x2, y2]]
I literally thought of this yesterday and was using a system prompt to force it to respond as a dictionary Wtf is up with 2025 being perfect, and what’s the catch
@@brando2818 the whole point of using Ollama is to run open source models, on your own hardware. OpenAI, Anthrop\c, Google already offer structured output.
None of you actually do anything 'challenging' or intriguing with all the garbage about 'ocr'..... try this with a image with ornate unstructured table data and try this...... sure, you get the SAME GARBAGE all the influencers do: "hey look at me, I'm wasting a bunch of nvme, ssd, or hdd disk writes' and a bunch of code to simply do what any 8 year old can do = open the file and copy/paste the content into notepad and feed that to the LLM. Most of you act like since you have figured out to extract a SIMPLE document on a model or something, that you have accomplished something (that an 8-year-old can do, copy/paste? but try this with largely unstructured table data, and I'd put serious money on any of you 'influencers' falling flat on your face. You still need to use expressions to structure the data. BEFORE you, as in this video said 'put it into a database'. LOL, 'put it into a database'' .....liiiike what? Chunked randomly? So that will create GOOD training data? Or a RAG? Thats a joke. But the show must go on!!! IT's all about the Click BAIT! Get those CLicks, man. Seriously, don't post another 'ocr this or that ' video until you 'ocr' something that an 8-year-old can't copy paste without an LLM
I would love to see a simple example of how to fine-tune a vision model with ollama.
OMG, this is exactly what I need. Thanks so much.
Cool to see how you approached NER using an LLM. I've been using SpaCy.
I normally use Spacy for anything at scale. You can use LLMs to make good datasets for custom entities and then use that to train the Spacy model
Thanks for these updates, quite difficult to keep up with all the new releases nowadays
Miles and IA? I'm all for it!
Really appreciate your channel! Could you make a video to help us better understand what specs are required for using LLMs locally?
Great videos, Sam. Learnt so much from your videos. RE: Llama vision model on Ollama, I have been trying to get it to work with both pictures and tools but it looks like it can only do pictures and structured output and no tool calling support yet. Any idea on how to get around this limitation?
very useful
Very informative video regarding Vision based models with structured outputs. If possible, could you also make a video on a simple langchain or langgraph app using vision based models of ollama for reading and describing into structured outputs, all the images in a document let's say pdf? Thanks in advance
Check out ColPaLi.
do you know if this model would be good for getting the coordinates of stuff in images? For example I would like to get the coordinates of a dog in an image, the model might return a bounding box [[x1, y1], [x2, y2]]
These models are probably not good enough for that at the moment, but certainly things like the new Gemini model can do that kind of task.
I literally thought of this yesterday and was using a system prompt to force it to respond as a dictionary
Wtf is up with 2025 being perfect, and what’s the catch
I have tried it, it depends on the model itself.
The amount of hacking you have to do to just get "ok" results says it all. Not production quality, and won't be any time soon.
Have you tried it with better models than were used here?
that's not to be expected with models of that size
@@brando2818 the whole point of using Ollama is to run open source models, on your own hardware. OpenAI, Anthrop\c, Google already offer structured output.
Why Hindi audio track is not available 😢
How do I turn this on? Will look into it.
None of you actually do anything 'challenging' or intriguing with all the garbage about 'ocr'..... try this with a image with ornate unstructured table data and try this...... sure, you get the SAME GARBAGE all the influencers do: "hey look at me, I'm wasting a bunch of nvme, ssd, or hdd disk writes' and a bunch of code to simply do what any 8 year old can do = open the file and copy/paste the content into notepad and feed that to the LLM.
Most of you act like since you have figured out to extract a SIMPLE document on a model or something, that you have accomplished something (that an 8-year-old can do, copy/paste? but try this with largely unstructured table data, and I'd put serious money on any of you 'influencers' falling flat on your face. You still need to use expressions to structure the data. BEFORE you, as in this video said 'put it into a database'. LOL, 'put it into a database'' .....liiiike what? Chunked randomly? So that will create GOOD training data? Or a RAG? Thats a joke.
But the show must go on!!! IT's all about the Click BAIT! Get those CLicks, man.
Seriously, don't post another 'ocr this or that ' video until you 'ocr' something that an 8-year-old can't copy paste without an LLM
very useful