A quick search shows that "EXIF metadata is restricted in size to 64 kB in JPEG images, because according to the specification, this information must be contained within a single JPEG APP1 segment." The relevant metadata tag is ImageDescription.
@@WhySoBroke import piexif def add_description_to_exif(image_file, description): # Load the existing EXIF data exif_dict = piexif.load(image_file) # Add or update the EXIF tag with your description # For example, using the UserComment tag exif_dict['Exif'][piexif.ExifIFD.UserComment] = piexif.helper.UserComment.dump(description) # Write the modified EXIF data back to the image exif_bytes = piexif.dump(exif_dict) piexif.insert(exif_bytes, image_file) # Usage example description = "Generated description of the image." image_file = "path_to_your_image.jpg" add_description_to_exif(image_file, description)
I am really struggling with this getting the data in places that the tools i use will use or display this. Plus it appears that the windows thumbnail uses up most of the available space in the EXIF space so I will need to drop that piece. On top of that all, the libraries like decompressing the images which I really don't like.
These are the 4 questions i ask llava and then I put the results manually in the comment section of the exif metadata: describe this image in great detail write the 10 most relevant questions for this image answer the 10 above questions in the correct order write the 20 most relevant tags for instagram I will try to automate this workflow to keyword my photo collection , thanks for this tutorial!
I am trying to do this myself. I am struggling with the exif writing. I keep getting space limitation errors. I think it's due to the windows thumbnails.
Good job, and thank you for again sharing your knowledge showing us how to do useful stuff. I'd also be interested in seeing how you create a professional web user interface for this and other projects going forward. What are some good ways of doing this which are easy to make look good and modern, and which run on all major browsers?
Good to know this. I look forward to an example of how to create a professional front end using NextJS if you'd like to recommend a tutorial or create one here @@samwitteveenai.
This is right into the awesomeness space! Thanks for sharing this project! (yesterday I was working on a similar solution using ComfyUi + Python exporting but this is way cleaner)
Great video, really what I was looking for, some useful real world cases on how to use LLM models locally, (instead of paying a company to do this for us, of course more secure and private). What I would love to see, is how to integrate this example to create a Tweet for us about the image, store it in the CSV file, and then be able to post the image with that tweet at intervals directly, maybe using Twitter's API? Not very tech savvy myself, but very interested in putting LLMs to some real world use and automation. Thanks for making these videos.
Awesome video. appreciate you demystifying the process and tying in queuing, dataframe, and rag concepts some powerful stuff. Will be interesting to do an apple to apple comparison with GPT Vision and Gemini Vision functionality.
@@christopherd.winnan8701 I haven't tried it with this model, but I tried it using the Moondream model with red bounding boxes and it was able to workout what was inside. been working on getting it to give me BB coordinates for things.
Maybe i would be useful to save the generated description in the image file itself, for example in exif/iptc description field if image file supports it.
Any tips on getting a more consistent response with only the necessary text I want extracted from an image? I’ve played around with the prompt quite a bit and even provided an output example.
I have a loop where I generate a response, then have another prompt ask if this response is correct for the image. If no, try again. I like the big llava for the first writing and a smaller llava or moondream for the checking. It can take a couple minutes for the multiple attempts, but that's ok.
why pass the file to ollama as bytes and not an image file ? is it faster that way? Also do you know any hacks for ollama to return precisely a specific number of words (or a range) every time ?
thank you for the sharing Sam! i tried the same thing here, but nothin happen, the process seems to be stuck and show only : Processing ./images\1.png any idea why?
Ollama normally support the Instruction Tuned models rather than the base model. You can do a custom install for any model including base models if they are converted to the right format. If you mean the model that gets loaded yes you can do that in the API.
for me I was having issues with it working with PNGs that is the reason I added it. I will have a look again and see if maybe I just had something set wrong the first time.
Just FYI for others needing to reference their local ollama. client = Client(host='192.168.0.25:11434') response = client.generate(model='llava:34b-v1.6', prompt='describe this image and make sure to include anything notable about it (include text you see in the image):', images=[image_bytes])
Windows sucks. It really really sucks 😂 Get off that Microsoft telemetry machine while you still can. (All in jest I don't actually care which OS you use. )
Yes please on tutorial building even more functionality of this example! 😀
You can save this text meta back to your image files to EXIF, so it will be always going hand-to-hand without the need of extra files lying around
Great insight! Can you please provide more details for those of us getting started? Many thanks in advance!
A quick search shows that "EXIF metadata is restricted in size to 64 kB in JPEG images, because according to the specification, this information must be contained within a single JPEG APP1 segment." The relevant metadata tag is ImageDescription.
@@WhySoBroke import piexif
def add_description_to_exif(image_file, description):
# Load the existing EXIF data
exif_dict = piexif.load(image_file)
# Add or update the EXIF tag with your description
# For example, using the UserComment tag
exif_dict['Exif'][piexif.ExifIFD.UserComment] = piexif.helper.UserComment.dump(description)
# Write the modified EXIF data back to the image
exif_bytes = piexif.dump(exif_dict)
piexif.insert(exif_bytes, image_file)
# Usage example
description = "Generated description of the image."
image_file = "path_to_your_image.jpg"
add_description_to_exif(image_file, description)
I am really struggling with this getting the data in places that the tools i use will use or display this. Plus it appears that the windows thumbnail uses up most of the available space in the EXIF space so I will need to drop that piece. On top of that all, the libraries like decompressing the images which I really don't like.
These are the 4 questions i ask llava and then I put the results manually in the comment section of the exif metadata:
describe this image in great detail
write the 10 most relevant questions for this image
answer the 10 above questions in the correct order
write the 20 most relevant tags for instagram
I will try to automate this workflow to keyword my photo collection , thanks for this tutorial!
I am trying to do this myself. I am struggling with the exif writing. I keep getting space limitation errors. I think it's due to the windows thumbnails.
This is great ! With the idea to put the result to the exif metadata, this would be awesome 😎
Good job, and thank you for again sharing your knowledge showing us how to do useful stuff. I'd also be interested in seeing how you create a professional web user interface for this and other projects going forward. What are some good ways of doing this which are easy to make look good and modern, and which run on all major browsers?
normally I use NextJS for that kind of thing
Good to know this. I look forward to an example of how to create a professional front end using NextJS if you'd like to recommend a tutorial or create one here @@samwitteveenai.
This is right into the awesomeness space! Thanks for sharing this project! (yesterday I was working on a similar solution using ComfyUi + Python exporting but this is way cleaner)
I really like the look of ComfyUI I need to try and make some time to play with it.
Very interesting and insightful. Thank you very much, Sam.
Great video, really what I was looking for, some useful real world cases on how to use LLM models locally, (instead of paying a company to do this for us, of course more secure and private). What I would love to see, is how to integrate this example to create a Tweet for us about the image, store it in the CSV file, and then be able to post the image with that tweet at intervals directly, maybe using Twitter's API? Not very tech savvy myself, but very interested in putting LLMs to some real world use and automation. Thanks for making these videos.
Love it!!! Great content and super Ollama in action!
Thanks!!
Awesome video. appreciate you demystifying the process and tying in queuing, dataframe, and rag concepts some powerful stuff. Will be interesting to do an apple to apple comparison with GPT Vision and Gemini Vision functionality.
Next vid will be some cool Gemini stuff
Thanks for sharing this is very useful and its a good source that i keep coming back to
Excellent !!! Was just playing around with moondream. Perfect timing ;)
I was hoping they would put Moondream in here as well. I also played with that and was impressed what it could do for the size.
that is what i exactly looking forward thanks a lot
Did you add custom rag? Capture snapshots from webcam. I’m trying to learn for days getting stuck on rag. 🎉
Thanks a lot. Great content!
Please do some more examples of identifying difficult screen shots.
Have you also thought about how boxing could improve this process?
boxing meaning bounding boxes?
@@samwitteveenai - Great minds think alike! ;-)
@@christopherd.winnan8701 I haven't tried it with this model, but I tried it using the Moondream model with red bounding boxes and it was able to workout what was inside. been working on getting it to give me BB coordinates for things.
@@samwitteveenai - Thank you for the great research you are doing. Always looking forward to more of your excellent vids!
Great video, Thanks very much
can you do a tutorial bout AI agent, for image and video
Maybe i would be useful to save the generated description in the image file itself, for example in exif/iptc description field if image file supports it.
I can't wait until these multimodal local models can read charts and graphs reliably.
llava:34b-v1.6 running very slowly and not using GPU whereas llava:13b-v1.6 working fine.
my system specs
Ram: 32 GB
Gup: nvidia3060 12GB
Any tips on getting a more consistent response with only the necessary text I want extracted from an image? I’ve played around with the prompt quite a bit and even provided an output example.
I have a loop where I generate a response, then have another prompt ask if this response is correct for the image. If no, try again. I like the big llava for the first writing and a smaller llava or moondream for the checking. It can take a couple minutes for the multiple attempts, but that's ok.
why pass the file to ollama as bytes and not an image file ? is it faster that way? Also do you know any hacks for ollama to return precisely a specific number of words (or a range) every time ?
Is this supported via the OpenAI Compatible client?
I can't seem to get it to work. Anyone had any luck?
just what the doctor ordered.
thank you for the sharing Sam! i tried the same thing here, but nothin happen, the process seems to be stuck and show only :
Processing ./images\1.png
any idea why?
oh my bad. i update ollama and it works
Is there any way to indicate the base model? It is not in localhost in my case..Thanks
Ollama normally support the Instruction Tuned models rather than the base model. You can do a custom install for any model including base models if they are converted to the right format. If you mean the model that gets loaded yes you can do that in the API.
Are you using GPU? Or all on CPU RAM?
He is using a Mac mini which has a unified memory architecture. So, while the GPUs are used they do not have their own dedicated memory.
@mshonle is totally right, no NVIDIA GPU used just the inbuilt Mac one
Ollama llava and bakllava handle PNG . What are you gaining by converting to bytes?
for me I was having issues with it working with PNGs that is the reason I added it. I will have a look again and see if maybe I just had something set wrong the first time.
Can you try out video of an application and ask Gemini to code an application with simular functionality and design. Something simple
Ur welcome 😂
Microsoft stole your idea
Just FYI for others needing to reference their local ollama.
client = Client(host='192.168.0.25:11434')
response = client.generate(model='llava:34b-v1.6',
prompt='describe this image and make sure to include anything notable about it (include text you see in the image):',
images=[image_bytes])
Being a windows user... I am still waiting....
Windows sucks. It really really sucks 😂
Get off that Microsoft telemetry machine while you still can. (All in jest I don't actually care which OS you use. )
Windows version is just out, at least a beta version.