PyPDF2 and PyPDF4 fails to extract text from the PDF

War · September 11, 2021, 3:38pm

import PyPDF4 as p2 
pdffile = open("XXXX.pdf","rb")
pdfread=p2.PdfFileReader(pdffile)
print(pdfread.getNumPages())
pageinfo=pdfread.getPage(0)
print(pageinfo.extractText())

While running the above the 4th line of code successfully returns the correct value i.e no. of pages in the PDF, however, the 6th line (PDF extraction) gives a one page long blank data. I’ve tried using PyPDF2 and PyPDF4 and ran the code in both Python terminal and sublimetext and in both cases the I received blank page instead of actual text.

PDF is a tax return and is completely all text format. No images whatsoever. What am I doing wrong ?

system · March 13, 2022, 3:38am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract numbers and text in Python Pandas Curriculum Help python	0	745	December 13, 2022
OCR types of things from pdf Curriculum Help python	1	453	February 10, 2021
REGULAR EXPRESSIONS Beginner Curriculum Help python	8	1116	February 19, 2020
Please advice: extracting text from pdf and writing to csv? (new to programming)	3	1648	May 24, 2020
How to extract table from a pdf and write to excel Curriculum Help python	1	812	January 10, 2022

PyPDF2 and PyPDF4 fails to extract text from the PDF

Related topics