Not sure what it is with TH-camrs and vision models. These are not the use cases for business or even pet projects. If we want OCR, we'll use OCR. What you want is to intelligently answer questions. E.g. What level is the water at on this reservoir - the image should have a measure stick on the water. How many boxes are in this image. Does this look safe or dangerous Etc.
Oh interesting, those are good ideas. I suppose you could also do how many parking spaces are free or something like that. I do use ChatGPT for the first three examples, although admittedly not the last one comparing the footballers. This is the first open model (that I've tried) that pulls code out of an image - all the other ones I've tried start hallucinating at some point. The TH-cam thumbnail critique as well - I think that's more than just OCR? I do sometimes use vision models to interpret graphs/charts and compare them to each other, but I haven't tried that with Llama 3.2 vision
Face identification is on the list of things the tech companies don't want you doing with the models. It ties into to many of the dystopian futures depicted in fiction.
I found most of the Llava models were able to identify famous people at least! I mean it doesn't really matter, but it's interesting that they seem to censor it like this
Facial identification would be moving into their lane. How do you think they make all their money with free products? You are the product. Google DeepFace for one example related to meta. (google will auto correct to deepfake so search DeepFace -deepfake)
As for the model repeatedly failing to identify Ronaldo in the picture, perhaps lowering the temperature would be an idea? EDIT: I've tried playing with the temperature (same LLM and same image), but it doesn't seem to have a significant effect on the results (except when temp=0, of course). After several runs I'd say Ronaldo is identified about 33% of the time.
@@RuairiODonnellFOTO The prompt I used was a double question, something like "Can you describe the picture? Who is the person depicted?" I've tried a few more runs and now I'd say it's less than 33%, maybe 10% success. Most of the time the model says it cannot provide names of people based on their photograph. As I said, changing the temperature doesn't seem to have a measurable effect on the answers.
It's weird why it can't pick it up. IIRC all the Llava models were able to identify him and for every other example that I tried Llama 3.2 vision is better than Llava.
I wonder whether it's deliberately not identifying people. It's sometimes even reluctant to say anything at all about a photo e.g. when I give it photos of myself
@@learndatawithmark I think you're right. It looks like they've done something during training (or after) that makes it behave that way. Obviously, it's not been a total success. No matter how well trained LLMs are, they are still hard to tame!
I think its good enough, but what if it was uncensored?
can you make a viedo on comparing the llava 7b vs llam3.211b vision
please
Not sure what it is with TH-camrs and vision models.
These are not the use cases for business or even pet projects.
If we want OCR, we'll use OCR. What you want is to intelligently answer questions.
E.g. What level is the water at on this reservoir - the image should have a measure stick on the water.
How many boxes are in this image.
Does this look safe or dangerous
Etc.
Oh interesting, those are good ideas. I suppose you could also do how many parking spaces are free or something like that.
I do use ChatGPT for the first three examples, although admittedly not the last one comparing the footballers. This is the first open model (that I've tried) that pulls code out of an image - all the other ones I've tried start hallucinating at some point.
The TH-cam thumbnail critique as well - I think that's more than just OCR?
I do sometimes use vision models to interpret graphs/charts and compare them to each other, but I haven't tried that with Llama 3.2 vision
OCR still sucks. Always gets it wrong in real life. But good point!
Face identification is on the list of things the tech companies don't want you doing with the models. It ties into to many of the dystopian futures depicted in fiction.
I found most of the Llava models were able to identify famous people at least! I mean it doesn't really matter, but it's interesting that they seem to censor it like this
Facial identification would be moving into their lane. How do you think they make all their money with free products? You are the product. Google DeepFace for one example related to meta. (google will auto correct to deepfake so search DeepFace -deepfake)
What is the hardware configuration you are using?
I have a Mac M1 Max with 64GB RAM that it splits between the GPU and CPU
As for the model repeatedly failing to identify Ronaldo in the picture, perhaps lowering the temperature would be an idea? EDIT: I've tried playing with the temperature (same LLM and same image), but it doesn't seem to have a significant effect on the results (except when temp=0, of course). After several runs I'd say Ronaldo is identified about 33% of the time.
What prompts did you use for the 30% success? Did temp change any behaviour/success?
@@RuairiODonnellFOTO The prompt I used was a double question, something like "Can you describe the picture? Who is the person depicted?" I've tried a few more runs and now I'd say it's less than 33%, maybe 10% success. Most of the time the model says it cannot provide names of people based on their photograph. As I said, changing the temperature doesn't seem to have a measurable effect on the answers.
It's weird why it can't pick it up. IIRC all the Llava models were able to identify him and for every other example that I tried Llama 3.2 vision is better than Llava.
I wonder whether it's deliberately not identifying people. It's sometimes even reluctant to say anything at all about a photo e.g. when I give it photos of myself
@@learndatawithmark I think you're right. It looks like they've done something during training (or after) that makes it behave that way. Obviously, it's not been a total success. No matter how well trained LLMs are, they are still hard to tame!
Has anyone tried the 90B model to see if it can name messi or ronaldo?
I think that would be insanely slow on my machine so I haven't tried it!