While importing from machine-readable formats, such as RSS is quite straightforward and usually has plenty of tools around… it is very different story with arbitrary HTML.
If you are not up to code this from scratch, then closest semi-automated tool I know is Dapper – it can process HTML pages and according to rules that you setup in visual interface convert it to other formats, including XML and RSS. On other hand I am not sure it will be able to handle your requirement of page discovery.
Related Posts:
- Java Package Does Not Exist Error
- Python circular importing?
- Where does the Media Library live in the database?
- How to import wordpress posts with images from one wordpress site to another?
- Failed to import Media
- What is the required format for importing posts into WordPress?
- What exactly does the import tool do?
- WordPress Theme Options Import/Export
- How to add featured thumbnail into import XML?
- One Click demo content installation feature in WordPress theme options panel
- Why do I lose all the slashes, i.e., ” \ “, in my blogs when I import XML files by the WordPress Importer plug-in?
- WordPress Import shows an error when uploading previously exported xml file
- Import and replace existing page/post content
- Disabling HTML Filtering When Importing Into WordPress Multisite
- How to re-Import the WordPress XML file after editing?
- Migrating WP site to another URL
- Import old SQL dump into new WordPress version
- Why shouldn’t @import be used to import a parent theme into a child theme?
- How to migrate a WordPress installation from one site to another, including all images?
- How to completely export and import WordPress?
- Import a very old WordPress Version into new WordPress
- WXR import problem — not including categories
- Moving a blog from Tumblr to WordPress
- Download/View Schema (or DTD) for XML Import
- Using the Importer on an IIS Server
- GUID not updated on import
- How to import a TypePad blog including all media?
- Easy way to import a post from Word?
- Convert WordPress.com embed links to normal embeds
- Importing large data from blogger
- Faking the “onSave” event
- Better way to remove HTML syntax from all content
- Importing custom post types using WordPress Importer, how to check for meta field values
- Is there a way to import Blogger into bbPress?
- What can I expect when moving from a hosted (WordPress.com) blog to one hosted by Page.ly
- WXR slicing script
- can’t import xml file
- WP CLI not importing media files
- Importing XML to WordPress, permalink problem
- Error: Failed to import tag/post
- How to get and save WordPress content (html, css, images, videos) from a Java program?
- Update Attached Image Size after Import
- Exporting and Importing WordPress Media Libraries and Galleries
- WordPress does not import all pages and post
- Programmatically save one/any post
- How to add more names in Baby Name Page?
- Batch attach unattached images
- How to access my wordpress via IP in shared hosting
- Feed format for woocommerce [closed]
- Is there a way to import a folder of HTML files into the blog?
- Using WordPress Importer, how to import media only?
- A question on WXR
- Problems with WordPress Import
- What is basic structure of xml for importing a page in wordpress?
- How to import content from WordPress.com to a self hosted WordPress installation?
- keep two blogs under the same domain
- In What WordPress Version Was the Import/Export Tool Introduced?
- Shows 1400 post published but don’t see them
- how to import a custom website into WordPress
- Moving WordPress site to an existing WordPress site
- How to import Theme Unit Test on localhost
- WXR xml files are being imported as a TXT files and showing up under ‘Media’
- WordPress WXR import – importing a large file
- Import HTML/JS post to WordPress?
- Is it possible to import several xml from different sites to another?
- After imorting posts from another blog double line breaks have been replaced with single line breaks
- Import post from XML files into custom post type programatically [closed]
- how do i import a site given that i have [domain].sql and a folder of site files? [closed]
- WordPress: Updating via Import
- Export Users From Live Site to Import onto staging
- create importer with importer api [closed]
- Import subdomain WordPress into main domain WordPress
- Exporting featured image
- CSV to WordPress posts: do I have to populate each possible field?
- How to add biography of user while importing?
- How to import XML to WordPress as post and custom fields?
- Programatticaly Import Pages while Maintaining Hierarchy
- product export and import using xml
- How to export a WordPress blog with no export function?
- Importing a database from a custom built CMS to WordPress
- Importing posts via MySql (a csv file) need to be automatically published
- Is there a way to show different source feeds on individual pages?
- Export WordPress from one domain to another domain
- Exporting by table
- automated import from blogger
- Error in importing wordpress xml [closed]
- How to import media from folder as opposed from old site?
- CSV Import Tables
- How to correctly move jquery script to external file
- Import function loses tags
- WordPress Import (CLI) returns Error Establishing a database connection
- WordPress xml import too slow
- Import WordPress content to my website
- Existing posts hidden after WP import
- Importing a Google Group to bbPress
- Import large database in php my admin
- Request failed due to an error: (http_request_fail
- Error importing website: “There has been a critical error on your website.”
- After importing items are not displaying
- How to import posts correctly?