Merge multiple PDF files based on their name using Python (Real-World Example)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 47

  • @CodingIsFun
    @CodingIsFun  2 ปีที่แล้ว +10

    *The task was somewhat specific, but I hope you learned something new! :)*

  • @asankacool1
    @asankacool1 2 ปีที่แล้ว +2

    Wow, this video represents a very practical scenario in the field of science industrials operation data analytics.
    My another suggestion would be the same scenario for excel file type append method for a key on ‘financial year month’ basis which the Key then also needs to be converted to a DATE format for proper analytics, graphs and exact time series order.
    Btw, great video Sven!!! 👍👍👍

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว +1

      Glad you find the video helpful. Thanks a lot for watching & your great suggestion.

  • @brazilleros
    @brazilleros 2 ปีที่แล้ว

    As always, understandable clean code and perfect solution! Thank you Sven, for your videos and professional attitude .

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว +1

      Happy to hear that you enjoyed this one too! Thanks for the comments and support!

  • @KhalilYasser
    @KhalilYasser 2 ปีที่แล้ว +1

    Awesome. I am waiting for your videos day after day.

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      Happy to hear that! 😃 As always, thank you very much for your comment. Your support is much appreciated!

  • @dynamics9000
    @dynamics9000 2 ปีที่แล้ว

    woww this video is a wonderful video and pushed me to some other videos in your channel. great content. thanks for uploads. ,,,,

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      Glad you like the videos! Thanks for watching & your comment! :)

  • @torque6389
    @torque6389 2 ปีที่แล้ว

    Love the videos! Very helpful. Thank you!

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      Happy to hear that. Thanks for watching :)

  • @s33wagz
    @s33wagz ปีที่แล้ว +1

    Been looking into the best way to create a simple gui that shows a list of pdfs in a folder, has an area for creating an output pdf to combine files into (list of multiple output files as these are "pdf packages" that are being built), has a button to copy a selected pdf into the desired output file (this I imagine would just be the file path of the selected pdf to append or merge with the desired output pdf)
    I'm considering doing everything in excel but I'm now considering React/JS or maybe Python.
    What would you suggest?

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว

      Thank you for tuning in. If you're interested in building a GUI for your Python project, there are several options available. Personally, I highly recommend "PySimpleGUI". I've already created several tutorials on this library on my channel, so be sure to check them out. Happy coding!

    • @s33wagz
      @s33wagz ปีที่แล้ว

      @@CodingIsFun I did poke around after my comment and found your stuff on guis.
      I just gotta find all the pieces to this puzzle is all.

  • @tobiewaldeck7105
    @tobiewaldeck7105 7 หลายเดือนก่อน

    Hi. I have tried pymupdf and pypdf2 to merge forms with fill-able fields in them. Either fields are missing from resulting pages or all the fill-able field values are the same. What is going on?

    • @CodingIsFun
      @CodingIsFun  7 หลายเดือนก่อน

      Thanks for watching. Hard to tell from a distance. Sorry, that I cannot help. Cheers, Sven ✌️

  • @diegodanciguer4901
    @diegodanciguer4901 ปีที่แล้ว

    very good video, is there a way to choose which order the pdf needs to be merged?

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว

      Thanks! Try to sort the list with the file names, see the following example: stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list
      Happy Coding!

  • @dule1635
    @dule1635 2 ปีที่แล้ว

    Please, share the lesson how to make the book mark for the combined PDF file. Thank you very much!

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      Thanks for watching and your suggestion!

  • @Aditya-mx4gv
    @Aditya-mx4gv ปีที่แล้ว

    Hi,
    when I run the program it again starts to scan the already merged files, I want it to only scan the newly added files in the folder and to perform merge operation to those only, could you help me with this, thank you

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว +1

      Thanks for watching. Sure! To achieve this, you can maintain a record of the merged files in a separate text file. You can then read this file before scanning the folder to exclude any previously merged files. Here's an updated version of your script to do this: pastebin.com/ejfsSAXB

    • @Aditya-mx4gv
      @Aditya-mx4gv ปีที่แล้ว

      @@CodingIsFun Hey, thanks for this, it worked really well!👍

  • @manuelbibbes2914
    @manuelbibbes2914 ปีที่แล้ว

    Thank you for your Video and yes i already learned something even if it didn't work for me.
    I got a the an Error: "TypeError: unhashable type: 'list'" and don't know how to handle that for now.
    Do you have a tip for me?

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว +1

      Thanks for watching. Can you please clone the repo and try again? Thanks!

    • @manuelbibbes2914
      @manuelbibbes2914 ปีที่แล้ว

      @@CodingIsFun Thank you for your superfast reply. Amazing it worked just fine after renaming PdfFileReader, PdfFileMerger to PdfMerger, PdfReader. 😁🤙

  • @raajashekaran
    @raajashekaran 2 ปีที่แล้ว

    Hi Thank you for your wonderful video, can include adding the Header and Footer along with Merge Please.

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      Thanks! What exactly do you mean by Header & Footer? Could you please provide more details? Thanks!

    • @raajashekaran
      @raajashekaran 2 ปีที่แล้ว

      @@CodingIsFun Hi my nature of work to combine similar date pdf file and adding confidential note in each page header and footer. Like confidential yellow/ Green note in all pdf files header and footer. Unable to add existing pdf files. can you jelp me🙏

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      @@raajashekaran, I do not have the time to code an entire solution for you, but you might want to check out the following blog article on how to add a header/footer to PDFs using Python: dock2learn.com/tech/how-to-add-headers-and-footers-to-existing-pdf-document-using-python/
      I hope it helps! Happy Coding!

  • @dain7787
    @dain7787 8 หลายเดือนก่อน

    Hi just one question. Iv got error on file.name part line 18 ..... are there any solutions???

    • @CodingIsFun
      @CodingIsFun  8 หลายเดือนก่อน

      Hey there, thanks for watching the video! I'm sorry I can't help you with your problem based on the information you provided. To give me a better idea of what's going on, it would be super helpful if you could write down which line of code is causing the error, let me know if you modified the code from the tutorial, and explain in more detail what you did to troubleshoot the problem. Don't forget to also give me some context about your setup and environment.
      If you're having trouble figuring things out, another option is to join our Discord server at pyhtonandvba.com/discord. You can ask your question there and maybe someone in the community can help out.
      Thanks for understanding.

  • @everythinginpython1657
    @everythinginpython1657 ปีที่แล้ว

    if someone wants to authenticate data with then this code might be help them.
    for key in keys:
    merger = PdfMerger()
    base_file_name = None
    for file in pdf_files:
    str_pdf_file = str(file)
    split_str_pdf_files = str_pdf_file.split(" ")
    if split_str_pdf_files[0].endswith(key):
    merger.append(PdfReader(str(file), "rb"))
    if len(file.name) >= BASE_FILE_NAME_LENGTH:
    base_file_name = file.name
    if base_file_name:
    print(base_file_name)
    merger.write(str(pdf_output_dir / base_file_name))
    merger.close()

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว

      Thanks for watching. Could you please let me know what you mean by authenticating the data? Thanks! :)

  • @gaganrastogi9624
    @gaganrastogi9624 2 ปีที่แล้ว

    Please make a video, or just explain or give clue, How to covert all pdfs in folder to excel, or extract table and save Excel for each file.

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว +1

      Thanks for watching & your suggestion.
      Regarding your request, have a look at the following blog/video:
      1. Extract table from PDF and convert to pandas dataframe: www.geeksforgeeks.org/how-to-extract-pdf-tables-in-python/
      2. Export Pandas DataFrames to new & existing Excel workbook: th-cam.com/video/DroafWQXqDw/w-d-xo.html
      Once you have a working solution, you could iterate over all pdf files in a folder: th-cam.com/video/w6-28jcr09Q/w-d-xo.html
      I hope it helps! Happy Coding!

    • @gaganrastogi9624
      @gaganrastogi9624 2 ปีที่แล้ว +1

      Thankyou so much for your valuable time.

  • @jastorgallywix4424
    @jastorgallywix4424 ปีที่แล้ว

    IMPORTANT
    Change all occurences of "PdfFileMerger" to "PdfMerger" and "PdfFileReader" to "PdfReader"
    then the code will work.
    PdfFileMerger and PdfFileReader are no longer available(removed in PyPDF2 3.0.0.).

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว

      Hi Jastor Gallywix,
      Thanks for pointing this out. Feel free to send a pull request in GitHub (github.com/Sven-Bo/merge-pdfs-based-on-name).
      Your support is much appreciated!
      Cheers,
      Sven ✌

  • @dimasramawib
    @dimasramawib ปีที่แล้ว

    Hi, Sven! Thank you for such a helpful video! One question though, I have multiple files just like you've shown in the video. My files looks something like '001.pdf', '002.pdf', '001 - Content.pdf', '002 - Another Content.pdf' mixed in a single folder just like in the video as well. However, when I run the code, the merged file content order are '001 - Content.pdf" on the first page and '001.pdf' on the second page. My question is how can I swap the order of the content so that the merged content will be '001.pdf' on the first page and '001 - Content.pdf' on the second page? Cheers

    • @CodingIsFun
      @CodingIsFun  ปีที่แล้ว

      Thanks for watching. Your script could look something like this:
      from pathlib import Path
      from PyPDF2 import PdfFileMerger, PdfFileReader # pip install PyPDF2
      # Define input directory for the pdf files
      pdf_dir = Path(__file__).parent / "pdf_files"
      # Define & create output directory
      pdf_output_dir = Path(__file__).parent / "OUPUT"
      pdf_output_dir.mkdir(parents=True, exist_ok=True)
      # Determine the file name length of the base file
      # Example of the base files:
      # '902 17.03.2022 2000004496.pdf', '904 17.03.2022 2000004497.pdf'
      BASE_FILE_NAME_LENGTH = 20
      # Define the desired order of the pdf files with specific key
      pdf_order = {'902': ['902 17.03.2022 2000004496.pdf','902 18.03.2022 2000004496.pdf'],
      '905': ['905 17.03.2022 2000004495.pdf'],
      '904': ['904 18.03.2022 2000004497.pdf']}
      for key, files in pdf_order.items():
      merger = PdfFileMerger()
      for file in files:
      pdf_file = pdf_dir / file
      if pdf_file.is_file():
      merger.append(PdfFileReader(str(pdf_file), "rb"))
      if len(file) >= BASE_FILE_NAME_LENGTH:
      base_file_name = file
      merger.write(str(pdf_output_dir / base_file_name))
      merger.close()
      ___
      Coffee donations are always welcome: pythonandvba.com/coffee-donation

  • @qulinxao
    @qulinxao 2 ปีที่แล้ว

    sorry, but your algo is O(n^2). simple change building keys on: keys={}; set(keys.setdefault(file.name[:3],[]).append(file.name)for file in pdf_files)
    now U don't need rescan all pdf_files for each key just:
    for key in keys:
    merger=PdfFileMerger()
    for file in keys[key]:
    merger.append.....

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว

      Thanks, appreciate that! 👍
      Additionally, I also had to convert the file string to a Path object:
      for key in keys:
      merger = PdfFileMerger()
      for file in keys[key]:
      file = pdf_dir / file

  • @alejandramunoz8597
    @alejandramunoz8597 2 ปีที่แล้ว

    hi, could you help me with below error,
    how can i define path, thank you
    NameError Traceback (most recent call last)
    Cell In [12], line 1
    ----> 1 pdf_dir = Path(__file__).parent / "pdf_files"
    NameError: name 'Path' is not defined

    • @CodingIsFun
      @CodingIsFun  2 ปีที่แล้ว +1

      Ensure to import pathlib. from pathlib import Path

    • @alejandramunoz8597
      @alejandramunoz8597 2 ปีที่แล้ว

      @@CodingIsFun thank you, it works!, is there any way to add a order in pdf pages that we have combined