How to register rows (resource) from custom table as posts of custom type on-the-fly?

I think you already give the answer: use custom post types: define a custom post type per type coming in.

Some notes for 2 way syncing:

  • metadata property changes dont change the modification date, which might be handy to verify while syncing. Luckily you can add some callback for this e.g.:

Example metadata fields changing > modified date

add_action( "updated_post_meta", array($this, 'updateModifiedDate'), 10 ,4);
                add_action( "added_post_meta", array($this, 'updateModifiedDate'), 10 ,4);
    function updateModifiedDate($post_id, $key, $value) {
                $array = array (
                        'ID' => $post_id,
                        'post_modified' => current_time('mysql',false),
                        'post_modified_gmt' => current_time('mysql',true)
                );
                wp_update_post($array);         
            }
  • It’s nice to store the last time you synced in a cache with “unique-key” – “modified” – ” some other attributes”. This will give you the ones that changed since the last time and new ones on both sides. Apart from this do a realtime match against both to their shared unique key. This information combined gives you the information on new ones on either side and/or CRUDN operations on either side (this gives you 75 different combinations you can check for, of which most result in the same action). (CRUDN) x (CRUDN) x (side1|side2|both)

In this way you can support 2 way syncing. Most can be done automagically, in some cases merges will need manual intervention or choices.

Generic Advice

  • When batch processing in complexer environments: use staging environment: tables that will contain the data to load, possible already transformed for easy loading. Then schedule loads to transform data, map data, etc.. to the production environment e.g. wp custom posts+meta+tax
  • Always log everything in sync logs, so you can always find what happened where, especially when syncs run automagically, put some timers there so you can see what the bottlenecks are, especially with large jobs that take hours (or weeks) to load. Bottlenecks are e.g. “sanitizing” the data which could take 1/3th of the total time.