adsterra

Pdf to Text Converter in Python with source code

 PDF to TEXT 



PyPDF2: PyPDF2 is a built-in library of python  as a PDF-toolkit. It is also called pure python package which means it can run on any platform without any dependencies on external libraries.
It is very useful tool for websites that manage the Pdfs. PyPDF2 is capable of:
·         extracting document information
·         splitting documents page by page
·         merging documents page by page
·         cropping pages
·         merging multiple pages into a single page
·         encrypting and decrypting PDF files
·         and more!

Installation: As PyPDF2 is a pure python package , so you can install it using pip command.

pip install PyPDF2

Extracting Text From PDF:
In this article You will learn how to extract text from pdf using
Built in python library PyPDF2.               



import PyPDF2
#8259 pdf name
pdf_obj=open("8259.pdf","rb")
pdf_reader=PyPDF2.PdfFileReader(pdf_obj)
#assign page no. at getPage(0)
pageObj=pdf_reader.getPage(0)
pageText= pageObj.extractText()
print(pageText)

PdfFileReader():-Initialize PdfFileReader object that contain the pdf name or path it self and a mode “rb” read.

getpage():-The getpage() methon return a page and it takes one parameter that is the page number to retrieve a page.


extracttext():-Fetch the  specified or all  pages in PDF file and extract text on the file as string type with extractText .

Post a Comment

0 Comments