How to extract text from a PDF? [closed]

Want to improve this question? Update the question so it’s on-topic for Stack Overflow.

Closed 6 years ago.

Can anyone recommend a library/API for extracting the text and images from a PDF? We need to be able to get at text that is contained in pre-known regions of the document, so the API will need to give us positional information of each element on the page.

We would like that data to be output in xml or json format. We’re currently looking at PdfTextStream which seems pretty good, but would like to hear other peoples experiences and suggestions.

Are there alternatives (commercial ones or free) for extracting text from a pdf programatically?

How can I extract embedded fonts from a PDF as valid font files?
Recommended way to embed PDF in HTML?
How can I read pdf in python?
Python module for converting PDF to text
Is it possible to embed animated GIFs in PDFs?
How to extract data from a PDF file while keeping track of its structure?
IPython/Jupyter Problems saving notebook as PDF
How to display PDF in a new tab instead of downloading? [closed]
attach a PDF to an archives template?
Why embedded PDF documents sometimes failed to load on my website
Print Cforms form as pdf
Export whole wordpress blog to PDF or similar including images [duplicate]
(Only on Firefox) Why links to pdfs on my website ask me whether I want to save file?
Is there a publishing platform that can assemble various rss feeds into a single PDF newsletter for a community? [closed]
jQuery if div contains this text, replace that part of the text
How to place a text next to the picture?
How to display text in pygame?
How to remove spaces from a string using JavaScript?
How to display PDF file in HTML?
Linking to a pdf file with html
How can I replace text with CSS?
Making text background transparent but not text itself
Print Pdf in C#
How can I change the text color with jQuery?
How to set text color for my d3 chart title?
How can I align text directly beneath an image?
How to Read from a Text File, Character by Character in C++
How do I export my WordPress blog as a book? [closed]
Where do the favicons for Media Files come from
How can I input a single right-to-left paragraph (Hebrew) into an English page/post?
Automatically decrease font size for long words
How to remove images from showing in a post with the_content()?
Create thumbnail on PDF upload with Gravity Forms
Adding a custom line of text to php code
printf and _n in this example?
How to add a class to a link in text editor
How to make shortcode output display where I choose
Force PDF download from custom menu?
List and show uploaded pdf files dynamically
Instructions/Rules Inside Text Area
Simple Plugin with custom javascript wont work – no console error
PDF file randomly breaks upon upload
Is It Possible to Upload Certain Attachment Files To A Remote Server
Client PDF Upload (Catalogue) – Automatically update link to PDF
Password Protecting Media
Find file url in post content and add it to media library
copying text from ms word to wordpress post appears weird [closed]
All text disappeared (seems to be a database problem) [closed]
How can I make an attachment page for pdf uploads?
Text Stating the Domain Name Appears on Every Page… How to Get Rid of It [closed]
Change default italic from to in admin editor
placeholder text in category form label
How to remove ‘wordpress…’ text from page titles in tabs
WordPress’s “Text” Format
If custom image header does not exist display text header
Check & remove special characters in a field?
How to build a PDF repository in WordPress
Empty Pdf file generated with FPDF library in WordPress plugin [closed]
Error in pdf generating plugin using FPDF
Create a permalink to a pdf?
Why are thumbnails not being generated for PDF files?
Updating WordPress plugin admin panel footer text
Change meta data of pdf file
Trying to display text in a block in a plugin page
Add Watermark to PDF’s of logged in user in WordPress upon upload to media library [closed]
Implement a slideshow-like frontpage with text overlays
Test wrapping cutting words in half on my WP site [closed]
Change amount of text displayed on homepage posts
WordPress live, custom text box validation, how to?
How to add custom text editor in add post section?
Pdf visualiser embedded into wordpress website
Snippet to Format Elementor Text Box Throws Error
Change the “Register” headline in Woocommerce
Thumbnails not being generated for PDF files
How to hide particular plain text with link from different subscribers
Create a pdf from the entries in DB
Upload Image with a file URL and show dynamically on Frontend
Opening a file of the theme from outside
Show uploaded pdf files dynamically and filter by month name
generate PDF from member information
Elementor Text Editor Widget Not Working
How to swap text in menu item back and forth?
How to upload PDF from Front-End and post automatically?
Change font and Colours in Menus
add css to only body text
Password protected uploaded PDF page
Specific text not affected by CSS style [closed]
RTE always on text mode by default
Plugin PHP write permissions
How to embed PDF files inside content without media_send_to_editor?
Radio button problem using subtitles and mandatory field check
WordPress post text starts newline after 93 characters
export individual posts to text files or a single csv file
Weird google bot crawl problem
Add text to Text Widget using Javascript
How can I display my meta value in a textarea? [closed]
Woocommerce align prices and texts [closed]
Echo get_option displays as text
Updating Media Library PDF’s in bulk
How can I add custom sizes for PDF thumbnails generation?

Related Posts:

Leave a Comment Cancel reply