There is an update in Phi-3 Vision's Hugging Face page. Now you need not to comment lines in code files to run model without flash attention. You just need to import model in eager mode. (huggingface.co/microsoft/Phi-3-vision-128k-instruct#sample-inference-code) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='eager') # use _attn_implementation='eager' to disable flash attention
I just tried this model on my cpu. It appears that the model loads successfully, but it remains in a running state without producing any output thus far. My system's RAM capacity is 8 GB. Could this limitation be the reason for the lack of functionality?
Awesome video but this model is unreliable. It extract text on some pages, other times it just stops midway or returns a blank output. I thought, its for sure the low gpu power of the T4, so I tried it directly with azure, and it reproduced the same outcome.
There is an update in Phi-3 Vision's Hugging Face page. Now you need not to comment lines in code files to run model without flash attention. You just need to import model in eager mode. (huggingface.co/microsoft/Phi-3-vision-128k-instruct#sample-inference-code)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='eager') # use _attn_implementation='eager' to disable flash attention
I just tried this model on my cpu. It appears that the model loads successfully, but it remains in a running state without producing any output thus far. My system's RAM capacity is 8 GB. Could this limitation be the reason for the lack of functionality?
Quite enriching video. I will be trying it and letting you know my experience.
How do we get the bounding boxes of the OCR text using phi3 ?
Hey man, thank you!
I mean, cool. but if you really can't run it locally you likely have bigger issues. The Phi-3 model is just that small that can run about anywhere.
Awesome video but this model is unreliable. It extract text on some pages, other times it just stops midway or returns a blank output. I thought, its for sure the low gpu power of the T4, so I tried it directly with azure, and it reproduced the same outcome.
Try to change prompt and test it out. And still if it doesn't work you might need to fine tune this model on domain specific documents.