Python: Renaming PDFs using text inside a document with regex

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ส.ค. 2024

ความคิดเห็น • 21

  • @stephencodes
    @stephencodes  2 ปีที่แล้ว

    IMPORTANT:
    If you do NOT CARE about what comes after the keyword for the positive lookbehind expression, use the following instead:
    (?

    • @stefaniamarques9979
      @stefaniamarques9979 ปีที่แล้ว

      Hi stephen, how do I make this, but the line after?
      I´m a litle bit confused...
      Thanks

    • @stephencodes
      @stephencodes  ปีที่แล้ว

      @@stefaniamarques9979 Hi Stefania, if we were trying to capture the "Purchase method" line we would uise this regex (I've also included a link at the end of this comment):
      Order #:.*[
      ](.+)
      Keep in mind we would have to change our code on line 21 (see github) to include .group(1) instead of just .group() (if I did this correctly, I did test it a bit).
      Refer to this link which explains it better: stackoverflow.com/questions/65503992/how-to-extract-the-next-line-after-a-specific-keyword-when-there-are-words-in-be

  • @TacosYBurritos8P
    @TacosYBurritos8P 2 ปีที่แล้ว +5

    Omg the first time my comment is in a video! Thank you so much for this amazing tutorial! When are you going to set up a Patreon?! Or I can pay you back in calculus videos or any higher level math tutoring!

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว

      Haha, first time for everything! I probably won't setup a Patreon unless I manage to do this a bit more seriously. Fortunately, I've already passed calculus 1 & 2 (somehow, haha). Feel free to make another request, I do enjoy making these videos!

  • @parkourninja21
    @parkourninja21 10 หลายเดือนก่อน

    Thank you, Steve. You saved me from manually renaming nearly 400 PDFs. And many more in the future. I'm a trial attorney who handles big medical files that are often unorganized. My RegEx is \d{1,2}(\/|-)\d{1,2}(\/|-)(\d{4}|\d{2}) then I rearrange, pad the pieces, and add a random 3-digit string to make the filename unique to sort and group date-related records.

    • @stephencodes
      @stephencodes  10 หลายเดือนก่อน +1

      This is awesome! I remember when I decided to first post these videos I was working a COOP and they had me looking at massive Excel files (12+ tabs & 4000+ rows) for links to other workbooks. When I first started it was done all by hand so I decided to make a Python script to search the Excel files. I remember thinking of other ways I could automate manual processes and I heard from a friend renaming a massive amount of PDFs by hand. That's when I thought to make a video on it. I'm glad it managed to help someone... never thought a trial attorney would be using Python! I've seen people in some of my fourth-year Computer Science classes that didn't even know how to code in it (congrats!), it is a very useful tool that can do so much. Enjoy all that time back :)

  • @giamonioz
    @giamonioz 2 ปีที่แล้ว +1

    Man
    Thank you so much, worked like a charm

  • @edpereira7767
    @edpereira7767 10 หลายเดือนก่อน

    Awesome code ! Thank you !!!

  • @johnnyb79904
    @johnnyb79904 ปีที่แล้ว

    This is awesome man. Nice work. Would it be difficult to edit the code to exclude special characters? It worked perfectly other than instances where I had a "/" in the lookup text.

    • @stephencodes
      @stephencodes  ปีที่แล้ว

      Hi Michael,
      Thanks! I think it should be possible. Just so I understand, say I have the following text (I want to extract what comes after orderID):
      Name: Bob
      Phone #: xxxxxxxx
      orderID: XAOP/1232232
      Email: string@gmail.com
      You would only want to extract XAOP1232232? (not including the "/").

  • @aayushaggarwal9492
    @aayushaggarwal9492 ปีที่แล้ว

    Hey Stephen - i have like thousands of pdf’s in a folder with a difference that like we take your case some pdf’s have Order # basis which we want to rename however some pdf in same folder has Product # instead of order #. So how to rename within the same code? Do or works in Regex?

  • @noctischen3253
    @noctischen3253 ปีที่แล้ว

    Hi Stephen, this works like a dream.
    But when I try to change the cr_regex line to suit my case it does not work.
    The text in my file is B/L番号(1) JBX1A12345. I only want the JBX1A12345 so I tried to change to cr_regex = r'(?

  • @greenlight4056
    @greenlight4056 2 ปีที่แล้ว

    how we do it ? if want to take diif text from pdf like case num , doc number , name and save with this file name
    for example:
    using the naming format "C:\...\Case Name\DocumentNumber FilingDate LastName FilingType.pdf."
    "C:\...\Leal v. Bedel et al\#026 2022-07-02 Staedter Motion for Extension of Time to File Answer.pdf."

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +1

      Are trying to create file folders in this example? (Leal v. Bedel et al). Another question; is your Case Name, DocumentNumber, FilingData, LastName and FilingType on separate lines in the PDF?This shouldn't be too hard to implement, let me know and I can try and make a video.

    • @greenlight4056
      @greenlight4056 2 ปีที่แล้ว

      @@stephencodes now i want to scrape multiple text from pdf like(name,casenumber)
      and i want to save like that:
      c:/stephen/45236476/.pdf

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +1

      @@greenlight4056 Sorry for another question, I want to make sure I understand what you're saying. I'm a bit confused by your (\)s.
      Say I have this PDF file:
      Case name: Leal v. Bedel et al
      Document number: #026
      Filing date: 2022-07-02
      Last name: Staedter
      Filing Type: Motion for Extension of Time to File Answer
      I wouild first create a folder called "Leal v. Bedel et al": prnt.sc/GLgg2tlZbyvm
      Then I would save a pdf like so (inside the above folder): prnt.sc/5b1A2OP6YX_y

    • @greenlight4056
      @greenlight4056 2 ปีที่แล้ว

      @@stephencodes yes

    • @WAAB101
      @WAAB101 ปีที่แล้ว

      @@stephencodes Hi Stephen! could you please make a video on how to do it and especially when there is for instance invoice number and OCR number then pick OCR. also if I could save it with an underscore between them. Ex: OCRnumber_comapnyname. I really appreciate your support and time!

  • @Lioneriod
    @Lioneriod 10 หลายเดือนก่อน

    Bro is just insane, thank you so much for this video man