Want to improve this question? Update the question so it’s on-topic for Stack Overflow.
Closed 6 years ago.
Can anyone recommend a library/API for extracting the text and images from a PDF? We need to be able to get at text that is contained in pre-known regions of the document, so the API will need to give us positional information of each element on the page.
We would like that data to be output in xml
or json
format. We’re currently looking at PdfTextStream which seems pretty good, but would like to hear other peoples experiences and suggestions.
Are there alternatives (commercial ones or free) for extracting text from a pdf programatically?
Related Posts:
- How can I extract embedded fonts from a PDF as valid font files?
- Recommended way to embed PDF in HTML?
- How can I read pdf in python?
- Python module for converting PDF to text
- Is it possible to embed animated GIFs in PDFs?
- How to extract data from a PDF file while keeping track of its structure?
- IPython/Jupyter Problems saving notebook as PDF
- How to display PDF in a new tab instead of downloading? [closed]
- attach a PDF to an archives template?
- Why embedded PDF documents sometimes failed to load on my website
- Print Cforms form as pdf
- Export whole wordpress blog to PDF or similar including images [duplicate]
- (Only on Firefox) Why links to pdfs on my website ask me whether I want to save file?
- Is there a publishing platform that can assemble various rss feeds into a single PDF newsletter for a community? [closed]
- jQuery if div contains this text, replace that part of the text
- How to place a text next to the picture?
- How to display text in pygame?
- How to remove spaces from a string using JavaScript?
- How to display PDF file in HTML?
- Linking to a pdf file with html
- How can I replace text with CSS?
- Making text background transparent but not text itself
- Print Pdf in C#
- How can I change the text color with jQuery?
- How to set text color for my d3 chart title?
- How can I align text directly beneath an image?
- How to Read from a Text File, Character by Character in C++
- How do I export my WordPress blog as a book? [closed]
- Where do the favicons for Media Files come from
- How can I input a single right-to-left paragraph (Hebrew) into an English page/post?
- Automatically decrease font size for long words
- How to remove images from showing in a post with the_content()?
- Create thumbnail on PDF upload with Gravity Forms
- Adding a custom line of text to php code
- printf and _n in this example?
- How to add a class to a link in text editor
- How to make shortcode output display where I choose
- Force PDF download from custom menu?
- List and show uploaded pdf files dynamically
- Instructions/Rules Inside Text Area
- Simple Plugin with custom javascript wont work – no console error
- PDF file randomly breaks upon upload
- Is It Possible to Upload Certain Attachment Files To A Remote Server
- Client PDF Upload (Catalogue) – Automatically update link to PDF
- Password Protecting Media
- Find file url in post content and add it to media library
- copying text from ms word to wordpress post appears weird [closed]
- All text disappeared (seems to be a database problem) [closed]
- How can I make an attachment page for pdf uploads?
- Text Stating the Domain Name Appears on Every Page… How to Get Rid of It [closed]
- Change default italic from to in admin editor
- placeholder text in category form label
- How to remove ‘wordpress…’ text from page titles in tabs
- WordPress’s “Text” Format
- If custom image header does not exist display text header
- Check & remove special characters in a field?
- How to build a PDF repository in WordPress
- Empty Pdf file generated with FPDF library in WordPress plugin [closed]
- Error in pdf generating plugin using FPDF
- Create a permalink to a pdf?
- Why are thumbnails not being generated for PDF files?
- Updating WordPress plugin admin panel footer text
- Change meta data of pdf file
- Trying to display text in a block in a plugin page
- Add Watermark to PDF’s of logged in user in WordPress upon upload to media library [closed]
- Implement a slideshow-like frontpage with text overlays
- Test wrapping cutting words in half on my WP site [closed]
- Change amount of text displayed on homepage posts
- WordPress live, custom text box validation, how to?
- How to add custom text editor in add post section?
- Pdf visualiser embedded into wordpress website
- Snippet to Format Elementor Text Box Throws Error
- Change the “Register” headline in Woocommerce
- Thumbnails not being generated for PDF files
- How to hide particular plain text with link from different subscribers
- Create a pdf from the entries in DB
- Upload Image with a file URL and show dynamically on Frontend
- Opening a file of the theme from outside
- Show uploaded pdf files dynamically and filter by month name
- generate PDF from member information
- Elementor Text Editor Widget Not Working
- How to swap text in menu item back and forth?
- How to upload PDF from Front-End and post automatically?
- Change font and Colours in Menus
- add css to only body text
- Password protected uploaded PDF page
- Specific text not affected by CSS style [closed]
- RTE always on text mode by default
- Plugin PHP write permissions
- How to embed PDF files inside content without media_send_to_editor?
- Radio button problem using subtitles and mandatory field check
- WordPress post text starts newline after 93 characters
- export individual posts to text files or a single csv file
- Weird google bot crawl problem
- Add text to Text Widget using Javascript
- How can I display my meta value in a textarea? [closed]
- Woocommerce align prices and texts [closed]
- Echo get_option displays as text
- Updating Media Library PDF’s in bulk
- How can I add custom sizes for PDF thumbnails generation?