OCR Using Microsoft's Florence-2 Vision Model on Free Google Colab
ฝัง
- เผยแพร่เมื่อ 25 มิ.ย. 2024
- In this video, I demonstrate how to implement Microsoft's recently released Florence-2 novel Foundational Vision Model on a free Google Colab workspace using a T4 GPU. I use Optical Character Recognition (OCR) as the primary use case to showcase the model's capabilities.
You'll learn:
1. An introduction to the Florence-2 Vision Model
2. Loading and configuring the Florence-2
3. Implementing OCR task with this advanced model
4. Evaluating the performance and results of OCR using Florence-2 Vision Model.
Code Link - colab.research.google.com/dri...
Florence-2 Model - huggingface.co/microsoft/Flor...
#florence2 #vision #multimodal #multimodalai #llm #microsoftai #googlecolab #ocr #machinelearning #ai #tutorial #freeresources #attention #objectdetection #segmentation - วิทยาศาสตร์และเทคโนโลยี
i want to intergate this in an android app , how to do it ?
Thanks for sharing! very useful
wow... you are super smart..... especially when you change the code for OCR REGION....! Amazing !!!
Glad it helped!
Yes really, No one does that on TH-cam, rest of all teach only basics. Thanks bro
Thanks
Good video
Any luck with making use of the raw OCR results? I find it picks up more than the ocr_with_region
Any luck on Finetuning the OCR part with custom dataset other than English?
Haven't tried yet, but will try to make a video on finetuning.
How much RAM does it need to run on a CPU?
In full precision, it would need approximately 10-11 GB of RAM for inference. If you are not able run it on CPU, you can try with quantized model.
Can I run this on cpu ?
Yes you can. Change the "device_map" argument to "cpu". And also make sure to not move input tensors to "cuda".
@@theailearner1857 thanks 🤜🤛