Image Annotation with LLava & Ollama

Sam Witteveen

มุมมอง 32 196

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 28 ธ.ค. 2024

ความคิดเห็น •

@IdPreferNot1 10 หลายเดือนก่อน ⁺⁶
Yes please on tutorial building even more functionality of this example! 😀
@alx8439 10 หลายเดือนก่อน ⁺¹⁴
You can save this text meta back to your image files to EXIF, so it will be always going hand-to-hand without the need of extra files lying around
@WhySoBroke 10 หลายเดือนก่อน ⁺¹
Great insight! Can you please provide more details for those of us getting started? Many thanks in advance!
@brianhauk8136 10 หลายเดือนก่อน
A quick search shows that "EXIF metadata is restricted in size to 64 kB in JPEG images, because according to the specification, this information must be contained within a single JPEG APP1 segment." The relevant metadata tag is ImageDescription.
@Pure_Science_and_Technology 8 หลายเดือนก่อน
@@WhySoBroke import piexif
def add_description_to_exif(image_file, description):
# Load the existing EXIF data
exif_dict = piexif.load(image_file)
# Add or update the EXIF tag with your description
# For example, using the UserComment tag
exif_dict['Exif'][piexif.ExifIFD.UserComment] = piexif.helper.UserComment.dump(description)
# Write the modified EXIF data back to the image
exif_bytes = piexif.dump(exif_dict)
piexif.insert(exif_bytes, image_file)
# Usage example
description = "Generated description of the image."
image_file = "path_to_your_image.jpg"
add_description_to_exif(image_file, description)
@MikeCreuzer 6 หลายเดือนก่อน
I am really struggling with this getting the data in places that the tools i use will use or display this. Plus it appears that the windows thumbnail uses up most of the available space in the EXIF space so I will need to drop that piece. On top of that all, the libraries like decompressing the images which I really don't like.
@MassimilianoGrecoPh 10 หลายเดือนก่อน ⁺²
These are the 4 questions i ask llava and then I put the results manually in the comment section of the exif metadata:
describe this image in great detail
write the 10 most relevant questions for this image
answer the 10 above questions in the correct order
write the 20 most relevant tags for instagram
I will try to automate this workflow to keyword my photo collection , thanks for this tutorial!
@MikeCreuzer 6 หลายเดือนก่อน
I am trying to do this myself. I am struggling with the exif writing. I keep getting space limitation errors. I think it's due to the windows thumbnails.
@pnddesign 10 หลายเดือนก่อน ⁺¹
This is great ! With the idea to put the result to the exif metadata, this would be awesome 😎
@brianhauk8136 10 หลายเดือนก่อน ⁺¹
Good job, and thank you for again sharing your knowledge showing us how to do useful stuff. I'd also be interested in seeing how you create a professional web user interface for this and other projects going forward. What are some good ways of doing this which are easy to make look good and modern, and which run on all major browsers?
@samwitteveenai 10 หลายเดือนก่อน
normally I use NextJS for that kind of thing
@brianhauk8136 9 หลายเดือนก่อน
Good to know this. I look forward to an example of how to create a professional front end using NextJS if you'd like to recommend a tutorial or create one here @@samwitteveenai.
@LaHoraMaker 10 หลายเดือนก่อน
This is right into the awesomeness space! Thanks for sharing this project! (yesterday I was working on a similar solution using ComfyUi + Python exporting but this is way cleaner)
@samwitteveenai 10 หลายเดือนก่อน
I really like the look of ComfyUI I need to try and make some time to play with it.
@guanjwcn 10 หลายเดือนก่อน ⁺¹
Very interesting and insightful. Thank you very much, Sam.
@carlosterrazas5091 10 หลายเดือนก่อน
Great video, really what I was looking for, some useful real world cases on how to use LLM models locally, (instead of paying a company to do this for us, of course more secure and private). What I would love to see, is how to integrate this example to create a Tweet for us about the image, store it in the CSV file, and then be able to post the image with that tweet at intervals directly, maybe using Twitter's API? Not very tech savvy myself, but very interested in putting LLMs to some real world use and automation. Thanks for making these videos.
@IvanFioravanti 10 หลายเดือนก่อน
Love it!!! Great content and super Ollama in action!
@samwitteveenai 10 หลายเดือนก่อน
Thanks!!
@donb5521 10 หลายเดือนก่อน
Awesome video. appreciate you demystifying the process and tying in queuing, dataframe, and rag concepts some powerful stuff. Will be interesting to do an apple to apple comparison with GPT Vision and Gemini Vision functionality.
@samwitteveenai 10 หลายเดือนก่อน ⁺¹
Next vid will be some cool Gemini stuff
@mrpasak 6 หลายเดือนก่อน
Thanks for sharing this is very useful and its a good source that i keep coming back to
@miriamramstudio3982 10 หลายเดือนก่อน
Excellent !!! Was just playing around with moondream. Perfect timing ;)
@samwitteveenai 10 หลายเดือนก่อน ⁺¹
I was hoping they would put Moondream in here as well. I also played with that and was impressed what it could do for the size.
@lazut273 10 หลายเดือนก่อน
that is what i exactly looking forward thanks a lot
@antonpictures 4 หลายเดือนก่อน
Did you add custom rag? Capture snapshots from webcam. I’m trying to learn for days getting stuck on rag. 🎉
@andikunar7183 10 หลายเดือนก่อน ⁺¹
Thanks a lot. Great content!
@christopherd.winnan8701 10 หลายเดือนก่อน ⁺¹
Please do some more examples of identifying difficult screen shots.
Have you also thought about how boxing could improve this process?
@samwitteveenai 10 หลายเดือนก่อน ⁺¹
boxing meaning bounding boxes?
@christopherd.winnan8701 10 หลายเดือนก่อน
@@samwitteveenai - Great minds think alike! ;-)
@samwitteveenai 10 หลายเดือนก่อน
@@christopherd.winnan8701 I haven't tried it with this model, but I tried it using the Moondream model with red bounding boxes and it was able to workout what was inside. been working on getting it to give me BB coordinates for things.
@christopherd.winnan8701 10 หลายเดือนก่อน
@@samwitteveenai - Thank you for the great research you are doing. Always looking forward to more of your excellent vids!
@renierdelacruz4652 10 หลายเดือนก่อน
Great video, Thanks very much
@fintech1378 10 หลายเดือนก่อน ⁺¹
can you do a tutorial bout AI agent, for image and video
@CezarPopescu 5 หลายเดือนก่อน
Maybe i would be useful to save the generated description in the image file itself, for example in exif/iptc description field if image file supports it.
@theh1ve 10 หลายเดือนก่อน
I can't wait until these multimodal local models can read charts and graphs reliably.
@edits_for_fun 10 หลายเดือนก่อน ⁺²
llava:34b-v1.6 running very slowly and not using GPU whereas llava:13b-v1.6 working fine.
my system specs
Ram: 32 GB
Gup: nvidia3060 12GB
@JDP-uq7zn 10 หลายเดือนก่อน
Any tips on getting a more consistent response with only the necessary text I want extracted from an image? I’ve played around with the prompt quite a bit and even provided an output example.
@MikeCreuzer 6 หลายเดือนก่อน
I have a loop where I generate a response, then have another prompt ask if this response is correct for the image. If no, try again. I like the big llava for the first writing and a smaller llava or moondream for the checking. It can take a couple minutes for the multiple attempts, but that's ok.
@squiddymute 9 หลายเดือนก่อน
why pass the file to ollama as bytes and not an image file ? is it faster that way? Also do you know any hacks for ollama to return precisely a specific number of words (or a range) every time ?
@dkapsali 2 หลายเดือนก่อน
Is this supported via the OpenAI Compatible client?
I can't seem to get it to work. Anyone had any luck?
@databasemadness 10 หลายเดือนก่อน
just what the doctor ordered.
@oznek1839 10 หลายเดือนก่อน
thank you for the sharing Sam! i tried the same thing here, but nothin happen, the process seems to be stuck and show only :
Processing ./images\1.png
any idea why?
@oznek1839 10 หลายเดือนก่อน
oh my bad. i update ollama and it works
@federikoPlus 8 หลายเดือนก่อน
Is there any way to indicate the base model? It is not in localhost in my case..Thanks
@samwitteveenai 8 หลายเดือนก่อน
Ollama normally support the Instruction Tuned models rather than the base model. You can do a custom install for any model including base models if they are converted to the right format. If you mean the model that gets loaded yes you can do that in the API.
@Ryan-yj4sd 10 หลายเดือนก่อน
Are you using GPU? Or all on CPU RAM?
@mshonle 10 หลายเดือนก่อน ⁺¹
He is using a Mac mini which has a unified memory architecture. So, while the GPUs are used they do not have their own dedicated memory.
@samwitteveenai 10 หลายเดือนก่อน
@mshonle is totally right, no NVIDIA GPU used just the inbuilt Mac one
@chrisadamec900 10 หลายเดือนก่อน
Ollama llava and bakllava handle PNG . What are you gaining by converting to bytes?
@samwitteveenai 10 หลายเดือนก่อน
for me I was having issues with it working with PNGs that is the reason I added it. I will have a look again and see if maybe I just had something set wrong the first time.
@brabes76 10 หลายเดือนก่อน
Can you try out video of an application and ask Gemini to code an application with simular functionality and design. Something simple
@HistoryIsAbsurd 10 หลายเดือนก่อน
Ur welcome 😂
@TheGuillotineKing 5 หลายเดือนก่อน
Microsoft stole your idea
@Pure_Science_and_Technology 8 หลายเดือนก่อน
Just FYI for others needing to reference their local ollama.
client = Client(host='192.168.0.25:11434')
response = client.generate(model='llava:34b-v1.6',
prompt='describe this image and make sure to include anything notable about it (include text you see in the image):',
images=[image_bytes])
@SigmaScorpion 10 หลายเดือนก่อน ⁺¹
Being a windows user... I am still waiting....
@collectivelogic 10 หลายเดือนก่อน
Windows sucks. It really really sucks 😂
Get off that Microsoft telemetry machine while you still can. (All in jest I don't actually care which OS you use. )
@samwitteveenai 10 หลายเดือนก่อน
Windows version is just out, at least a beta version.

ต่อไป

เล่นอัตโนมัติ

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3