OCR Using Microsoft's Phi-3 Vision Model on Free Google Colab

แชร์
ฝัง
  • เผยแพร่เมื่อ 13 ต.ค. 2024

ความคิดเห็น • 8

  • @theailearner1857
    @theailearner1857  4 หลายเดือนก่อน +2

    There is an update in Phi-3 Vision's Hugging Face page. Now you need not to comment lines in code files to run model without flash attention. You just need to import model in eager mode. (huggingface.co/microsoft/Phi-3-vision-128k-instruct#sample-inference-code)
    model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='eager') # use _attn_implementation='eager' to disable flash attention

    • @ai_enthusiastic_
      @ai_enthusiastic_ 3 หลายเดือนก่อน

      I just tried this model on my cpu. It appears that the model loads successfully, but it remains in a running state without producing any output thus far. My system's RAM capacity is 8 GB. Could this limitation be the reason for the lack of functionality?

  • @arunbhyashaswi1515
    @arunbhyashaswi1515 4 หลายเดือนก่อน

    Quite enriching video. I will be trying it and letting you know my experience.

  • @phanikrishna8215
    @phanikrishna8215 2 หลายเดือนก่อน

    How do we get the bounding boxes of the OCR text using phi3 ?

  • @d.d.z.
    @d.d.z. 4 หลายเดือนก่อน

    Hey man, thank you!

  • @gabrielesilinic
    @gabrielesilinic 3 หลายเดือนก่อน

    I mean, cool. but if you really can't run it locally you likely have bigger issues. The Phi-3 model is just that small that can run about anywhere.

  • @Cloudvenus666
    @Cloudvenus666 4 หลายเดือนก่อน +2

    Awesome video but this model is unreliable. It extract text on some pages, other times it just stops midway or returns a blank output. I thought, its for sure the low gpu power of the T4, so I tried it directly with azure, and it reproduced the same outcome.

    • @theailearner1857
      @theailearner1857  4 หลายเดือนก่อน +1

      Try to change prompt and test it out. And still if it doesn't work you might need to fine tune this model on domain specific documents.