$wpdb -> Batch insert from XML File?

After facing problems with exhausted memory and time consuming queries i’ve tried a different way bulk importing. The idea isn’t new but but to show a possible way i want to share my solution. Keep in mind, i’m not a professional in PHP/MySQL, so maybe there is a more effective or better way to accomplish this task in different manner or with better performance

Main function to BULK INSERT with $wpdb

Since @kuchenundkakao pushed me some new ideas using a temporary table to import the whole XML file and processing the data from in a second query. In Conclusion, the bulk_insert function didn’t check for any existing data or updating them, the table gets truncated every 24hrs after processing is done.

It’s used in PHP Class, so be aware if you try to just copy and paste – check the syntax!

// Table, including your prefix!
$table = $wpdb->prefix.'table_name';

$rows = array(
           array('price' => '12.00','vat' => '7.7'),
           array('price' => '230.00', 'vat' => '7.7')
            );

content of my bulk_insert function.

    function bulk_insert($table, $rows) {
        global $wpdb;

            /* Extract column list from first row of data! */ 
            $columns = array_keys($rows[0]);
            asort($columns);
            $columnList="`" . implode('`, `', $columns) . '`';

            /* Start building SQL, initialise data and placeholder arrays */
            $sql = "INSERT INTO `$table` ($columnList) VALUES\n";
            $placeholders = array();
            $data = array();

            /* Build placeholders for each row, and add values to data array */
            foreach ($rows as $row) {
                ksort($row);
                $rowPlaceholders = array();
                    foreach ($row as $key => $value) {
                        $data[] = $value;

                        /* differentiate values and set placeholders */
                        if (is_numeric($value)) {
                                $rowPlaceholders[] = is_float($value) ? '%f' : '%d';
                        } else {
                                $rowPlaceholders[] = '%s';
                        }
                    }
                $placeholders[] = '(' . implode(', ', $rowPlaceholders) . ')';
            }

            /* Stitching all rows together */
            $sql .= implode(",\n", $placeholders);
            // Run the query.  Returning number of affected rows for this chunk
            return $wpdb->query($wpdb->prepare($sql, $data));
    }

Chunking XML Data

It’s really different how much data you can push to bulk_insert() and depends alot on your hosting/server – so i’ve made it flexible and easy to adjust by limitting the amount of data before it’s send to bulk_insert().

What is this function doing?
It’s using XMLReader and SimpleXMLElement to parse the XML Document Line-by-Line instead parsing the whole document ( which mostly ends in exhausted memory ). After a given amount set by $limit of XML Elements in $array[] is reached, the function sends this chunk of your XML file as an array to bulk_insert() -> stamps the whole array into your database.

I’m passing a filename to load_price_xml2mysql for flexibility purposes like myfile.xml

function load_price_xml2mysql($filename) {
        global $wpdb;

    /* get xml file */

      $file = WP_UPLOAD_DIR.$filename;
      $xml = new XMLReader();
      $xml->open($file);

    /* set your limit here, define the limit in your plugin or just add $limit = 1000 */
        $limit = MYSQL_PRICE_INSERTLIMIT;

        $array = array();
        $table = $wpdb->prefix.'temp_api_price';

       /* counting entries, so we have to set zero before while */
        $i = 0;
        while($xml->read() && $xml->name != 'item'){ ; }
        while ( $xml->name === 'item' ) {
            ++$i;

            $element = new SimpleXMLElement( $xml->readOuterXML() );
                $array[] = array(
                                 'sku' => intval($element->SKU),
                                 'buying' => floatval($element->price->BUYING),
                                 'buying_ex_vat' => floatval($element->price->BUYING_EX),
                                 'vat' => floatval($element->price->VAT),
                                 'selling' => floatval($element->price->SELLING)
                                );

            /* start chunking the while routine by $limit
             passing the array to bulk_insert() if the limits gets reached. */

            if (count($array) == $limit) {
            $this->bulk_insert($table, $array);
            unset($array);
            }
                unset($element);
                $xml->next('item');
      }
        /* don't miss the last chunk, if it didn't reach the given limit, send it now! */
        $this->bulk_insert($table, $array);
       /* return the total amount of entries */
        return $i;
    }

Conclusion

You have to find your working $limit, just as reference:

product_item.xml
 - 140MB
 - 9.1 Million rows in document
 - 142.000 items 
$limit = 5000;
$runtime = 43.93240232324 [average]

Runtime = including parsing & inserting into MySQL Database.

6500 rows each was the max working limit, so i’ve decided getting a little more room and went down to $limit = 5000 and still hit the same average runtime. Pretty sure, most of the time is needed parsing the XML document, mabye i’ll do some benchmarking and tracking microtime form each function separate.

Thanks so @phatskat and @kuchenandkakao for pushing me into the right direction.