Best Way to OCR a PDF in Python - spaCy Layout

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 ก.พ. 2025
  • In this video, I'm going to show you the best way to OCR a PDF in Python with the new spaCy Layout package. The best part about this package is that it gives you access to all the important metadata generated from a spaCy pipeline alongside layout detection and OCR. This means you will have bounding boxes for the labeled regions of text on a given image. You can also do table detection.
    spaCy Layout: github.com/exp...
    GitHub Repo: github.com/wjb...
    Join this channel to get access to perks:
    / @python-programming
    If you enjoy this video, please subscribe.
    ✅Be my Patron: / wjbmattingly
    ✅PayPal: www.paypal.com...
    If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
    If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
    You can follow me at:
    / wjb_mattingly

ความคิดเห็น • 16

  • @msickand
    @msickand 2 ชั่วโมงที่ผ่านมา

    Man, this is an amazing video. So helpful, a big THANK YOU. A Table video would also be fantastic, and thanks in advance for that!😉

  • @VentureMLops
    @VentureMLops 18 วันที่ผ่านมา +2

    Interesting. Waiting video with tables!)

    • @python-programming
      @python-programming  17 วันที่ผ่านมา

      Thanks! I'll work on that table video in the near future. As for the math formulae, I don't work with those often, but I have seen some promising models, specifically fine-tunes of tf-id

  • @sheikhakbar2067
    @sheikhakbar2067 27 วันที่ผ่านมา +3

    Thanks; I have been looking for such a tool.

  • @flyingzeppo
    @flyingzeppo 27 วันที่ผ่านมา +2

    Very interesting. Thank you.

  • @GuidoAmabili
    @GuidoAmabili 5 วันที่ผ่านมา

    Very interesting, thank you! How would you go about if you had to improve the accuray and train your models to work on specific types of documents ? What are the main steps using these new capabilities ?

  • @kn8u
    @kn8u 4 วันที่ผ่านมา

    I'm working on a small academic helper chatbot. Can I use this to prepare my documents which are just scans of textbooks? I'll be using the output in the RAG workflow.

  • @Osman-dy5br
    @Osman-dy5br 19 วันที่ผ่านมา +1

    Would this be able to support extracting mathematical formulae?

    • @python-programming
      @python-programming  17 วันที่ผ่านมา +1

      Good question! Formula is one of the labels. There are a lot of quality models that can convert formulae to Latex so even if the OCR is bad, you could use the bboxes and feed that image to a better quality model for formulae

  • @smazorize
    @smazorize 16 วันที่ผ่านมา

    I am struggling with trying to extract tilted and vertical texts from PDF documents and embed them back into the pdf document so that it can be searchable, do you have a solution on that? OCRmyPDF library doesnt help, would spacy and CV help with this?

  • @nitondeauricergeson
    @nitondeauricergeson 11 วันที่ผ่านมา +1

    can you make a table video?

  • @science_electronique
    @science_electronique 10 วันที่ผ่านมา +1

    use Gemini OCR with good prompt

    • @traveling-historian
      @traveling-historian 10 วันที่ผ่านมา

      Thanks for the comment! That’s a good suggestion for some usecases, but not all. If bounding boxes and labels are important, then this is better, assuming you have standard typed text. Also, this approach is faster and local. It also handles aligning the output as a spaCy Doc which gives you linguistic analysis too.