TransWikia.com

$wpdb -> Batch insert from XML File?

WordPress Development Asked by click here on December 27, 2020

Actually im stuck on this!

  • XML-File ~ 140.000 Rows
  • Filesize ~ 9.0 MB
  • LOAD XML INFILE 'file_name' isnt’t allowed ( shared hosting! )

The supplier provides a single XML file with quantity and sku. The XML is formatted like:

<item>
<LITM>0000001</LITM>
<STQU>1<STQU>
</item>

<item>
<LITM>0000002</LITM>
<STQU>4<STQU>
</item>

The following function works, but as you can think about – its slow!

function load_xml2mysql($filename,$type) {
    global $wpdb;

    $file = XML_DIR.$filename;
    $xml = new XMLReader();
    $xml->open($file);

    $charset = $wpdb->get_charset_collate();
    $table = $wpdb->prefix.'supplier_stock';

    ignore_user_abort();

    while($xml->read() && $xml->name != 'item'){ ; }

    while($xml->name == 'item')
    {
        $element = new SimpleXMLElement($xml->readOuterXML());

        $query = $wpdb->prepare("INSERT INTO {$table} (sku,stock) VALUES (%d,%d) ON DUPLICATE KEY UPDATE stock = %d", strval($element->LITM),strval($element->STQU),strval($element->STQU));
        $wpdb->query( $query );
        $wpdb->flush();

        $xml->next('item');
        unset($element);
    }
}

I’ve read alot about batch/bulk insert and the benchmarks are very clear on this. Querying 140.000 rows with my function is done in average 40 seconds. The Stock table is the smallest one from a set of three XML files, imagine inserting the main product data with a total filesize ~ 145.0 MB maybe put me in some trouble?

Any Ideas on this how to improve or batch insert?

3 Answers

After facing problems with exhausted memory and time consuming queries i've tried a different way bulk importing. The idea isn't new but but to show a possible way i want to share my solution. Keep in mind, i'm not a professional in PHP/MySQL, so maybe there is a more effective or better way to accomplish this task in different manner or with better performance

Main function to BULK INSERT with $wpdb

Since @kuchenundkakao pushed me some new ideas using a temporary table to import the whole XML file and processing the data from in a second query. In Conclusion, the bulk_insert function didn't check for any existing data or updating them, the table gets truncated every 24hrs after processing is done.

It's used in PHP Class, so be aware if you try to just copy and paste - check the syntax!

// Table, including your prefix!
$table = $wpdb->prefix.'table_name';

$rows = array(
           array('price' => '12.00','vat' => '7.7'),
           array('price' => '230.00', 'vat' => '7.7')
            );

content of my bulk_insert function.

    function bulk_insert($table, $rows) {
        global $wpdb;

            /* Extract column list from first row of data! */ 
            $columns = array_keys($rows[0]);
            asort($columns);
            $columnList = '`' . implode('`, `', $columns) . '`';

            /* Start building SQL, initialise data and placeholder arrays */
            $sql = "INSERT INTO `$table` ($columnList) VALUESn";
            $placeholders = array();
            $data = array();

            /* Build placeholders for each row, and add values to data array */
            foreach ($rows as $row) {
                ksort($row);
                $rowPlaceholders = array();
                    foreach ($row as $key => $value) {
                        $data[] = $value;

                        /* differentiate values and set placeholders */
                        if (is_numeric($value)) {
                                $rowPlaceholders[] = is_float($value) ? '%f' : '%d';
                        } else {
                                $rowPlaceholders[] = '%s';
                        }
                    }
                $placeholders[] = '(' . implode(', ', $rowPlaceholders) . ')';
            }

            /* Stitching all rows together */
            $sql .= implode(",n", $placeholders);
            // Run the query.  Returning number of affected rows for this chunk
            return $wpdb->query($wpdb->prepare($sql, $data));
    }

Chunking XML Data

It's really different how much data you can push to bulk_insert() and depends alot on your hosting/server - so i've made it flexible and easy to adjust by limitting the amount of data before it's send to bulk_insert().

What is this function doing? It's using XMLReader and SimpleXMLElement to parse the XML Document Line-by-Line instead parsing the whole document ( which mostly ends in exhausted memory ). After a given amount set by $limit of XML Elements in $array[] is reached, the function sends this chunk of your XML file as an array to bulk_insert() -> stamps the whole array into your database.

I'm passing a filename to load_price_xml2mysql for flexibility purposes like myfile.xml

function load_price_xml2mysql($filename) {
        global $wpdb;

    /* get xml file */

      $file = WP_UPLOAD_DIR.$filename;
      $xml = new XMLReader();
      $xml->open($file);

    /* set your limit here, define the limit in your plugin or just add $limit = 1000 */
        $limit = MYSQL_PRICE_INSERTLIMIT;

        $array = array();
        $table = $wpdb->prefix.'temp_api_price';

       /* counting entries, so we have to set zero before while */
        $i = 0;
        while($xml->read() && $xml->name != 'item'){ ; }
        while ( $xml->name === 'item' ) {
            ++$i;

            $element = new SimpleXMLElement( $xml->readOuterXML() );
                $array[] = array(
                                 'sku' => intval($element->SKU),
                                 'buying' => floatval($element->price->BUYING),
                                 'buying_ex_vat' => floatval($element->price->BUYING_EX),
                                 'vat' => floatval($element->price->VAT),
                                 'selling' => floatval($element->price->SELLING)
                                );

            /* start chunking the while routine by $limit
             passing the array to bulk_insert() if the limits gets reached. */

            if (count($array) == $limit) {
            $this->bulk_insert($table, $array);
            unset($array);
            }
                unset($element);
                $xml->next('item');
      }
        /* don't miss the last chunk, if it didn't reach the given limit, send it now! */
        $this->bulk_insert($table, $array);
       /* return the total amount of entries */
        return $i;
    }

Conclusion

You have to find your working $limit, just as reference:

product_item.xml
 - 140MB
 - 9.1 Million rows in document
 - 142.000 items 
$limit = 5000;
$runtime = 43.93240232324 [average]

Runtime = including parsing & inserting into MySQL Database.

6500 rows each was the max working limit, so i've decided getting a little more room and went down to $limit = 5000 and still hit the same average runtime. Pretty sure, most of the time is needed parsing the XML document, mabye i'll do some benchmarking and tracking microtime form each function separate.

Thanks so @phatskat and @kuchenandkakao for pushing me into the right direction.

Answered by click here on December 27, 2020

You can try to mitigate memory issues by using a Generator:

function load_xml2mysql($filename,$type) {
    global $wpdb;

    $charset = $wpdb->get_charset_collate();
    $table = $wpdb->prefix.'supplier_stock';

    ignore_user_abort();

    $query_template = "INSERT INTO {$table} (sku,stock) VALUES (%d,%d) ON DUPLICATE KEY UPDATE stock = %d";
    $wpdb->query( 'SET autocommit = 0' );
    foreach ( get_xml_element( $filename ) as $element ) {
        $query = $wpdb->prepare(
            $query_template,
            strval($element->LITM),
            strval($element->STQU),
            strval($element->STQU)
        );

        $wpdb->query( $query );
    }
    $wpdb->query( 'COMMIT' );
}

function get_xml_element( $filename ) {
    $file = XML_DIR.$filename;
    $xml = new XMLReader();
    $xml->open($file);

    // I'm not clear on what exactly this is doing, do you need it?
    // edit: this moves us to the first <item> node 
    while($xml->read() && $xml->name != 'item'){ ; }

    while ( $xml->name === 'item' ) {
        yield new SimpleXMLElement( $xml->readOuterXML() );
        $xml->next('item');
    }
}

Generators tend to be more efficient than just doing a loop, and in a case like this it may be what you need.

We're also doing SET autocommit=0 before you run your queries, this should provide some performance boost as you're telling MySQL not to commit the changes until we call COMMIT after the loop.

Answered by phatskat on December 27, 2020

First of all: don't flush after every insert. I did not test this, but it should be faster if you import all the data, then do the flush.

Second: What exactly do you want to accomplish? A onetime Import, or a regular update to your database? If you need option a), then take the waiting. Maybe increase the PHP Maximum Runtime if you need to.

If you want to regularly update your database, then you maybe should go for another approach. Like, first reading your XML File and converting it into an array/put it into a temporary database table, check if the article must be added or updated, then in a third step complete it by adding the new articles and updating the articles to be updated.

Happy Coding!

Answered by HU ist Sebastian on December 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP