Rename PDFs using text content from a document (PYTHON)

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ม.ค. 2022
  • This tutorial is a response to a comment on the last video asking how you would rename a PDF using the title of the file which is located inside the PDF we are trying to rename.
    If you would like another tutorial request it in the comments and I will try and get back
    Github:
    github.com/steve-codes/PDF-re...
    Last video:
    • Changing PDF file name...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 33

  • @abdallahabdelmajeed8120
    @abdallahabdelmajeed8120 2 ปีที่แล้ว +2

    Wonderful, thank you very much!

  • @pretro6136
    @pretro6136 11 หลายเดือนก่อน +1

    lmao i actually needed to create a pdf editor recently and thought this was going to be changing the header to a line in the text🤣

  • @dara0013
    @dara0013 2 ปีที่แล้ว +2

    Thank you very much!

  • @robertcenusa8636
    @robertcenusa8636 2 ปีที่แล้ว +2

    Finally the proper tutorial.
    Thank you for your work!
    Could you please help to rename it by the 16th line for example? Many thanks anyway!

    • @robertcenusa8636
      @robertcenusa8636 2 ปีที่แล้ว

      I've managed it (i think) with this code:
      new_file_name = text.splitlines()[15]
      rename(pdf, new_file_name + '.pdf')

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +1

      @@robertcenusa8636 Glad you got it working! Yeah I think that should work, just keep in mind splitlines() will split on the following characters, not just newline characters: www.programiz.com/python-programming/methods/string/splitlines (probably be rare you encounter any of the other characters though).

  • @mrnarason
    @mrnarason 2 ปีที่แล้ว +4

    can you give some examples how to grab, locate or isolate the text to rename the pdf file name from different sections or under particular conditions of the pdf file? The regular expression method you mentioned in the github seems pretty useful for this.

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +1

      Yes, I'll try and make a video doing some examples of regular expressions. It might take a week or so since I have to bursh up on it.

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +1

      Sorry, I haven't forgotten about this request. I've just been really busy with school and job searching for the summer. During reading week (this week) I will try and do a video.

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +3

      Again, sorry for the delay but the new video + code is up.

    • @Lioneriod
      @Lioneriod 9 หลายเดือนก่อน

      @@stephencodes bro is the most responsible youtuber I've ever seem. Also, thank you a lot for making these videos!

  • @TacosYBurritos8P
    @TacosYBurritos8P 2 ปีที่แล้ว +1

    How would you rename the pdf after scanning for specific words or format? For example renaming the pdf whatever is in the pdf after the the term “invoice#” or “due” to rename it the due date. Or having a list of names to searching within the pdf and if a particular name is found in the pdf the pdf is renamed that.

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว +1

      New video covers this (just uploaded, sorry for delay)

  • @mountainboyindesert2357
    @mountainboyindesert2357 ปีที่แล้ว

    Thank for the nice video, can you suggest how can we modify code if we have multipage pdf and want to extract few pages only, also if pdf is scanned doc.

    • @stephencodes
      @stephencodes  ปีที่แล้ว +1

      Hi,
      Thanks for the comment! Yes, extracting certain pages from a PDF is possible with PyMuPDF. Extracting text from a scanned PDF is doable but the last time I tried to do it not all the text was extracted properly (it depends on the scan quality of the PDF). I took a brief look online and it does look like it could be done with another library (I would have to do more research).
      Unfortunatly I just started my last semester of University so I'm a bit swamped at the moment with coding projects, so it might be awhile before I can take a better look.

    • @mountainboyindesert2357
      @mountainboyindesert2357 ปีที่แล้ว

      @@stephencodes thank you for reply..

  • @mohamedbouamara689
    @mohamedbouamara689 2 ปีที่แล้ว

    Hi sir.
    Great work, many thanks.
    How can I change the paper size of multiple pdf files without change there names with for loop?
    thank you too much.

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว

      Hi Mohamed, you can just call me Stephen, I think this should be doable. You're talking about the print size right? Like A1, A2, A3... paper size?

  • @KinqNick
    @KinqNick ปีที่แล้ว

    nice video. Can u explain why us using a virtual enviroment and why u nit using pypdf2?

    • @stephencodes
      @stephencodes  ปีที่แล้ว

      Hi Kick, the reason for using a virtual environment is to keep packages separate for different projects. If we didn't use a virtual environment we would be installing packages globally across the operating system which could cause issues if different projects use different versions of the same package. Another reason is I sometimes mess up installations and it is a pain to try and fix a package that is installed globally (in this case I just re-create the virtual environment). Although I only really used one package I do it more out of habit and convenience. I believe the reason for using PyMuPDF over pypdf2 was because pypdf2 messed up the document structure when grabbing the contents of PDFs. I tried a couple other packages but PyMuPDF was the one that worked.

  • @rsilo718
    @rsilo718 2 ปีที่แล้ว

    Can you telll me how can i rename on the 2nd element on the file?. example instead of renaming it case 7891223 ill rename it to Larry on the description.

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว

      I have a video up that does something like this with regex. If you want to pull a name without using a format structure (grabbing text that comes after a certain word for example), then you may need to use another library. Is this what you're trying to do?

  • @dickyalamsyah7010
    @dickyalamsyah7010 2 ปีที่แล้ว

    Hi, I want to know how you fix if that is duplicate text or duplicate pdf to be renamed?

    • @stephencodes
      @stephencodes  2 ปีที่แล้ว

      I just uploaded a new video, I go over a simple way to catch errors like this

  • @sopojarwo3483
    @sopojarwo3483 10 หลายเดือนก่อน

    How to remove pdf page using text content with python? is there tutorial....

  • @bonfirehost
    @bonfirehost ปีที่แล้ว

    pls make video on pdf 2 line 3 word

  • @kevindarsono185
    @kevindarsono185 7 หลายเดือนก่อน

    can u help me, error 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PowerShell\1\ShellIds\Microsoft.PowerShell' is denied. To change the execution
    policy for the default (LocalMachine) scope, start Windows PowerShell with the "Run as administrator" option.

    • @stephencodes
      @stephencodes  7 หลายเดือนก่อน

      It looks like you are trying to run a powershell script but windows has powershell disabled for security reasons. Out of curiosity did you change your default terminal in the steps at 0:43?

  • @alexsherwood4551
    @alexsherwood4551 ปีที่แล้ว

    I'm getting this error " [WinError 32] The process cannot access the file because it is being used by another process:". Any ideas?

    • @alexsherwood4551
      @alexsherwood4551 ปีที่แล้ว

      This error was due to problems with indentation. My rename() function was tabbed over one too many placing it under the "with open" block instead of the for loop block. If anyone else makes this mistake, double check that your indentation matches the indentation on line 14 at time stamp 6:46 in the video.

    • @stephencodes
      @stephencodes  ปีที่แล้ว +1

      I believe that error happens if you're trying to rename the pdf while it is open. Operating systems have restrictions on how data can be accessed/changed. If one process is reading a file (you having the pdf open) and another is trying to modify an attribute of the file (the python script changing the filename) the operating system won't allow it. If you try opening the pdf in any viewer, like adobe acrobat and try running the script, I think you'll get that same error.

  • @martinimhoff5973
    @martinimhoff5973 ปีที่แล้ว

    I followed line by line and i get File "", line 1, in
    ModuleNotFoundError: No module named 'fitz'

    • @stephencodes
      @stephencodes  ปีที่แล้ว

      Hi Martin, did you install PyMuPDF? The command is in the github under "Steps". If you're getting that error it means that library ins't installed.