How To Read PDF Files in Python using PyPDF2

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ก.พ. 2025
  • In this video, we will talk about reading PDF files in Python using PyPDF2 package.
    Blog- learn-automati...
    Prerequisite - Python Basics
    • #1 What Is Python- Pyt...
    Selenium With Python Series
    • Selenium WebDriver Tut...
    Connect with us:
    Linkedin- / mukesh-otwani-93631b99
    Instagram- / mukeshotwani
    Facebook Group- / 256655817858291
    Facebook Page- / seleniumwebdrivermukesh
    Twitter- / mukeshotwani
    Blog- learn-automatio...

ความคิดเห็น • 53

  • @himankshekher8645
    @himankshekher8645 4 ปีที่แล้ว +5

    Keep doing the great work Mukesh.. I have learnt alot from your channel.. will continue to do so.. you upload small videos which is good to me since I get bored in one hour kinda video and loose interest in the topic
    Thanks for all you efforts..

    • @Mukeshotwani
      @Mukeshotwani  4 ปีที่แล้ว +1

      Hi Himank thank you so much, I am glad you liked Python series.

  • @snicket87
    @snicket87 2 ปีที่แล้ว +1

    Thank you! Helped my project from across the world! Greeting from Brazil!

  • @elizabethhanks1042
    @elizabethhanks1042 3 ปีที่แล้ว +1

    YESSSS thank you for your help!!

  • @wendyesquivel8179
    @wendyesquivel8179 2 ปีที่แล้ว

    this is exactly what I was looking for! Thanks :D

  • @tenzindorjee7689
    @tenzindorjee7689 2 ปีที่แล้ว +1

    Thank you somkuch bhaiya, really helped my project 🙏🙏🙏❤️

  • @emanuelalves593
    @emanuelalves593 ปีที่แล้ว

    Very useful! Thank you!

  • @KrishnaReddy-zz4yu
    @KrishnaReddy-zz4yu 4 ปีที่แล้ว

    What is the difference between jupyter notebook and Pycharm? In Jupyter, where we use Pandas to open a PDF/CSV/TXT files. Which is efficient to learn and apply in real time?

  • @KhalilYasser
    @KhalilYasser 4 ปีที่แล้ว +1

    Thanks a lot. Very useful.

  • @nitinkumarshukla6967
    @nitinkumarshukla6967 ปีที่แล้ว

    Can we do the same thing with uploaded pdf by user?

  • @verkar1965
    @verkar1965 3 ปีที่แล้ว

    Hi, is it a way to get splitted lines instead of just one merged line ? thank you

  • @KkdvPrasad
    @KkdvPrasad 4 ปีที่แล้ว

    Mukesh can we have an similar logic in Eclipse using Java?

  • @LeZinZin95
    @LeZinZin95 2 ปีที่แล้ว +1

    It worked, thank you

  • @ahmadrahmatulloyev162
    @ahmadrahmatulloyev162 2 ปีที่แล้ว

    very useful. Thanks

  • @bigbro1231000
    @bigbro1231000 2 ปีที่แล้ว

    Hello thank u for the informative video
    I have a problem compiling the code the pypdf gives me error progressbar not recognised how to solve it please

  • @freedoom4090
    @freedoom4090 2 ปีที่แล้ว +1

    very nice! does it work without java?

    • @Mukeshotwani
      @Mukeshotwani  2 ปีที่แล้ว +1

      Yes for Java we have diff lib

    • @freedoom4090
      @freedoom4090 2 ปีที่แล้ว

      @@Mukeshotwani thanks!

  • @centrodoreforco-aulasderef7743
    @centrodoreforco-aulasderef7743 3 ปีที่แล้ว

    Perfect!

  • @subhransupanda7052
    @subhransupanda7052 4 ปีที่แล้ว +3

    This will work for a simple PDF file but for a complex PDF where we have tables ,multiple pages,images,non English character ,there it would not work...could u plz show us reading a complex PDF file..

    • @Mukeshotwani
      @Mukeshotwani  4 ปีที่แล้ว +5

      Hi Subhranshu current approach will fetch the data from tables as well. Multiple pages already covered in video. When it comes to image you can open pdf in rb (read binary) mode which will return binary data. For non english char you can change the enconding.
      I will try to make video on this.

    • @subhransupanda7052
      @subhransupanda7052 4 ปีที่แล้ว

      @@Mukeshotwani Thanks a lot Mukesh..plz make a video on dis..u r a great inspiration for us..we are waiting for dat video ..

    • @NOTHING-j2h
      @NOTHING-j2h 4 ปีที่แล้ว +2

      How we can compare two pdfs where contents on both pdfs are same but they positioned in different locations of pdfs. We can’t compare line by line.

    • @Mukeshotwani
      @Mukeshotwani  4 ปีที่แล้ว

      Hi Ankur, above video is when you need to validate specific String or keyword in pdf. When it comes to comparing two pdfs then we have many lib in python which can help you. Please explore the same. One of the lib is pypi.org/project/diff-pdf-visually/

  • @SandeepKumar-px4kf
    @SandeepKumar-px4kf 2 ปีที่แล้ว

    please can you make one lacture that is taken input as excel file and output is docx file for that excel file

  • @Kenneth-f5d
    @Kenneth-f5d ปีที่แล้ว

    How to get the title of the PDF's content such as "A Simple PDF File"?

  • @en_coded
    @en_coded 2 ปีที่แล้ว

    can you do a simple write? I haven't seen a write 'hello world' its always read this pdf and write it on another pdf. what if I just want to start a pdf with strings and images...??

  • @sidstarsiddhu9275
    @sidstarsiddhu9275 3 ปีที่แล้ว +2

    sir i am having attribution error at line 3:
    reader=PyPDF2.pdfFileReader(file)
    AttributeError: module 'PyPDF2' has no attribute 'pdfFileReader'

    • @Mukeshotwani
      @Mukeshotwani  3 ปีที่แล้ว

      Hi Sidstar seems you have not installed lib properly. please try installing again and if you are working in Pycharm then do add in Pycharm too.

    • @Kmysiak1
      @Kmysiak1 3 ปีที่แล้ว

      its PdfFileReader() not pdfFileReader()

  • @ayushmittal2754
    @ayushmittal2754 2 ปีที่แล้ว +2

    what to do for extracting all pages of pdf. I have been searching for this solution for last 24 hours

    • @Mukeshotwani
      @Mukeshotwani  2 ปีที่แล้ว

      Hi Ayush, you can run a loop which will iterate all the pages one by one.

    • @wilianuhlmann5284
      @wilianuhlmann5284 2 ปีที่แล้ว

      @@Mukeshotwani You can give an example plz?

  • @srirajid
    @srirajid 3 ปีที่แล้ว +1

    May I know where you are writing the code

    • @Mukeshotwani
      @Mukeshotwani  3 ปีที่แล้ว

      Hi Raji I am using Pycharm th-cam.com/play/PL6flErFppaj3FhVG-3RGGQx-Mvj7DXrpX.html

  • @suhelmallick
    @suhelmallick ปีที่แล้ว +2

    a lot is old syntax.
    mine is the newer ayntax
    import PyPDF2
    file=open('Ansible+Roles.pdf', 'rb')
    reder=PyPDF2.PdfReader(file)
    print(len(reder.pages))
    page1=reder.pages[1]
    #print(page1.extract_text())
    pdfdata=page1.extract_text()
    assert "PRINCE" in pdfdata
    print("PRINCE" in pdfdata)

  • @nazishsultana5273
    @nazishsultana5273 3 ปีที่แล้ว

    Sir plz help me my code is not work it give warning xref table not zero index .I'd no for object will be corrected [pdf.py:1736]😢😢😢😢

  • @kamal3777
    @kamal3777 3 ปีที่แล้ว +2

    İ try to extract the text but it just gives an empty string

    • @Mukeshotwani
      @Mukeshotwani  3 ปีที่แล้ว

      Please debug your code. I have dedicated video on How to debug your code.

  • @lasnroo
    @lasnroo 2 ปีที่แล้ว +1

    how can I read line by line?

    • @lokusok5080
      @lokusok5080 2 ปีที่แล้ว +1

      from PyPDF2 import PdfReader
      reader = PdfReader("file.pdf")
      all_pages = reader.pages
      for page in range(len(all_pages)):
      text = all_pages[page].extract_text()
      for line in text.split("
      "):
      print(line)

  • @logapriyas6911
    @logapriyas6911 3 ปีที่แล้ว

    When I follow the above instructions I get superflous whitespace error 🙂 can any one help me with this issue

  • @monicalelli5369
    @monicalelli5369 2 ปีที่แล้ว

    import isn't working, any hint please?

    • @Jason-ot6jv
      @Jason-ot6jv 2 ปีที่แล้ว

      make sure yo do 'pip3 install PyPDF2' in the terminal

    • @thekyreefuller
      @thekyreefuller 2 ปีที่แล้ว

      @@Jason-ot6jv Hi Jason, I did this an in my terminal it says "Requirement already satisfied". I'm still getting the same "No module named PyPDF2" issue. Any thoughts?

  • @tanny_edits
    @tanny_edits 3 ปีที่แล้ว

    Bro I can't access my pdf using your code

    • @tanny_edits
      @tanny_edits 3 ปีที่แล้ว

      @Xeno The Strange i literally just copied what he does bro

    • @bryanhernandez2861
      @bryanhernandez2861 3 ปีที่แล้ว

      I had the same issue. I used r"\\users\\... and "C:"

  • @johnalbertson4424
    @johnalbertson4424 2 ปีที่แล้ว +1

    What language R U speaking?!?!
    It isn't english

    • @Mukeshotwani
      @Mukeshotwani  2 ปีที่แล้ว +4

      Main thing is did you get the concept or not?