While importing from machine-readable formats, such as RSS is quite straightforward and usually has plenty of tools around… it is very different story with arbitrary HTML.
If you are not up to code this from scratch, then closest semi-automated tool I know is Dapper – it can process HTML pages and according to rules that you setup in visual interface convert it to other formats, including XML and RSS. On other hand I am not sure it will be able to handle your requirement of page discovery.
Related Posts:
- Java Package Does Not Exist Error
- Python circular importing?
- Where does the Media Library live in the database?
- Failed to import Media
- What is the required format for importing posts into WordPress?
- What exactly does the import tool do?
- WordPress Theme Options Import/Export
- One Click demo content installation feature in WordPress theme options panel
- Why do I lose all the slashes, i.e., ” \ “, in my blogs when I import XML files by the WordPress Importer plug-in?
- Import and replace existing page/post content
- Disabling HTML Filtering When Importing Into WordPress Multisite
- How to re-Import the WordPress XML file after editing?
- Migrating WP site to another URL
- Why shouldn’t @import be used to import a parent theme into a child theme?
- How to migrate a WordPress installation from one site to another, including all images?
- How to completely export and import WordPress?
- WXR import problem — not including categories
- Moving a blog from Tumblr to WordPress
- Download/View Schema (or DTD) for XML Import
- Using the Importer on an IIS Server
- GUID not updated on import
- Convert WordPress.com embed links to normal embeds
- Importing large data from blogger
- Faking the “onSave” event
- Better way to remove HTML syntax from all content
- Is there a way to import Blogger into bbPress?
- What can I expect when moving from a hosted (WordPress.com) blog to one hosted by Page.ly
- WXR slicing script
- can’t import xml file
- Importing XML to WordPress, permalink problem
- Error: Failed to import tag/post
- Update Attached Image Size after Import
- Exporting and Importing WordPress Media Libraries and Galleries
- WordPress does not import all pages and post
- Programmatically save one/any post
- How to access my wordpress via IP in shared hosting
- Feed format for woocommerce [closed]
- Is there a way to import a folder of HTML files into the blog?
- Using WordPress Importer, how to import media only?
- What is basic structure of xml for importing a page in wordpress?
- How to import content from WordPress.com to a self hosted WordPress installation?
- keep two blogs under the same domain
- Shows 1400 post published but don’t see them
- how to import a custom website into WordPress
- Moving WordPress site to an existing WordPress site
- WXR xml files are being imported as a TXT files and showing up under ‘Media’
- Import HTML/JS post to WordPress?
- Is it possible to import several xml from different sites to another?
- After imorting posts from another blog double line breaks have been replaced with single line breaks
- Import post from XML files into custom post type programatically [closed]
- Exporting featured image
- How to add biography of user while importing?
- How to import XML to WordPress as post and custom fields?
- product export and import using xml
- How to export a WordPress blog with no export function?
- Importing a database from a custom built CMS to WordPress
- Importing posts via MySql (a csv file) need to be automatically published
- Is there a way to show different source feeds on individual pages?
- Export WordPress from one domain to another domain
- Exporting by table
- automated import from blogger
- Error in importing wordpress xml [closed]
- How to import media from folder as opposed from old site?
- CSV Import Tables
- How to correctly move jquery script to external file
- WordPress Import (CLI) returns Error Establishing a database connection
- WordPress xml import too slow
- Import WordPress content to my website
- Existing posts hidden after WP import
- Request failed due to an error: (http_request_fail
- Error importing website: “There has been a critical error on your website.”
- After importing items are not displaying
- How to import posts correctly?
- import svg-files from wxr – (upload works, import not)
- WordPress All-Import to ATUM Stock Management
- How can I import a BlogML file into my WordPress site?
- How to import several big Import XMLs?
- How to make bulk changes to wordpress posts
- Pull specific data from CSV
- Merge two wordpress sites
- Import demo data into wordpress template
- When migrating all backslashes were stripped and special signs are converted in html entities
- How to do one-off import or data migration?
- Import xml feed
- WordPress import error
- Importing many times to succeed
- Max execution time error with stream_body() in wp-includes/class-http.php
- Importing demo content from plugin
- How to import data from Mingle Forum plugin to bbPress?
- Export plugin omits many images
- Create CSV for import from File directories?
- Export and import all Plugin options
- Import data from file larger than 15 MB
- How to import individual blog posts into WordPress?
- Failed Media Import (importing from LIVE to local)
- Astra Site is not importing
- WXR? Export from Drupal 7 – import into WordPress
- WP-all-import problems with large input file [closed]
- How to create import option for xml
- How to add featured thumbnail into import XML using URL image?