PDF to TEXT
PyPDF2: PyPDF2 is a built-in library of python as a PDF-toolkit. It is also called pure
python package which means it can run on any platform without any dependencies on external libraries.
It is very useful tool for
websites that manage the Pdfs. PyPDF2 is capable of:
·
extracting
document information
·
splitting
documents page by page
·
merging
documents page by page
·
cropping
pages
·
merging
multiple pages into a single page
·
encrypting
and decrypting PDF files
·
and
more!
Installation: As PyPDF2 is a pure
python package , so you can install it using pip command.
pip install PyPDF2
|
Extracting Text From
PDF:
In this article You will learn how to extract
text from pdf using
Built in python library PyPDF2.
import PyPDF2
#8259 pdf name
pdf_obj=open("8259.pdf","rb")
pdf_reader=PyPDF2.PdfFileReader(pdf_obj)
#assign page no. at getPage(0)
pageObj=pdf_reader.getPage(0)
pageText= pageObj.extractText()
print(pageText)
PdfFileReader():-Initialize
PdfFileReader object that contain the pdf name or path it self and a mode “rb”
read.
getpage():-The
getpage() methon return a page and it takes one parameter that is the page
number to retrieve a page.
extracttext():-Fetch the specified or all pages in PDF file and extract text on the file as string type with extractText .
0 Comments