Woocommerce CLI import

Goal

The goal in this particular case is to set up an robust, low cpu usage and fast command line interface which will process CSV files and update woocommerce by SKU.

The update by SKU is very important because the source data has no idea which IDs the Woocommerce products are linked to. SKU is unique in both Woocommerce as the source SAP B1 data.

Why? Current situation is unfavorable

At the moment Woocommerce products are updated through an puppeteer headless browser session, which actually does the following actions:

  • Log in to admin with user who only has access to one menu option and has capabilities to update/create products.
  • Open Woocommerce native importer url and upload the CSV file.
  • Click through the import options and run import.
  • Await updating process and give feedback on result.

In my opinion Puppeteer, or PhantomJS which was used before, are by far the ideal way of executing a mission critical system as hourly updates to products.

How? Coming to a solution

Before, lets make sure our data is clean

First I examined a robust way of making sure that the delivered CSV would be pre-processed into an fail safe CSV (or JSON) format.

jq

I tried using jq on the source CSV file to output an valid JSON array. Unfortunately, and rather expected, the CSV file had some oddities which were not conform standards. I found out by using the code below:

#!/bin/sh
jq -R -sr -f csv2json.jq test.csv

jq has an filter possibility in which the CSV can be processed into JSON. I found two filters online who partly did the job but I had to adjust some part of the filter to avoid some errors in this specific CSV format.

The custom parts were removing EOL chars from the header column names and trimming \” from description fields.

# jq filter
# objectify/1 takes an array of string values as inputs, converts
# numeric values to numbers, and packages the results into an object
# with keys specified by the "headers" array
def objectify(headers):
  # For jq 1.4, replace the following line by: def tonumberq: .;
  def tonumberq: tonumber? // .;
  def trimspace: if type == "string" then sub("^ +";"") | sub(" +$";"") else . end;
  def trimquote: if type == "string" then sub("^\"+";"") | sub("\"+$";"") else . end;
  . as $in
  | reduce range(0; headers|length) as $i ({}; .[headers[$i]] = ($in[$i] | trimspace | trimquote | tonumberq) );

#headers
def csv2table:
  # For jq 1.4, replace the following line by:  def trim: .;
  def trim: sub("^ +";"") | sub(" +$";"");
  def trimeol: sub("\r+$";"");
  split("\n") | map( split(",") | map(trim | trimeol) );

def csv2json:
  csv2table
  | .[0] as $headers
  | reduce (.[1:][] | select(length > 0) ) as $row
      ( []; . + [ $row|objectify($headers) ]);

csv2json

The issue with the above filter is that as though it will work great with simple CSV files it is not context sensitive. Think about prices written as: “99,00” these will be cut and results in an corrupt JSON array.

I did not feel like writing an context sensitive filter for jq and left it at that. Which quite bugged me because when the data is in a valid form jq can really make life great in bash.

csvkit

First I was looking at anycsv which is an python library. However implementing it did not feel natural to me. After some searching I came across csvkit which seemed like an extensive multi purpose library.

With the below command I tested the output:

#!/bin/sh
csvjson test.csv

Garbage ensued, I needed to prettify the output to make sense out of it:

#!/bin/sh
csvjson test.csv | jq

Yes, prettified JSON output – AND using jq again. This is going the right way. But hold on, why are my prices changed? csvkit’s csvjson seems to be doing some conversions. Let’s try to avoid those:

#!/bin/sh
csvjson test.csv -I | jq

The -i command disables type inference (and –locale, –date-format, –datetime-format) when parsing CSV input.

Good, now we have valid, robust and if necessary, customizable, CSV to JSON conversion going on. A solid base to work upon.

Updating products

So, now that the data is clean we can examine various ways of updating Woocommerce products.

wp-cli

First off I’d thought it would be a good idea to loop the jq results in bash and update the products one by one using the wp-cli, or in this case the wc-cli. I wrote a quick loop and executed an test for 70 products.

#!/bin/sh

#count records
count=`jq '. | length' data.json`

#loop records
for i in $( seq 0 $count )
do
  echo $i
  product=`jq ".[$i]" data.json`
  name=`jq -r ".[$i] .Name" data.json
  sku=`jq -r ".[$i] .SKU" data.json`

  #get id as we can not update by sku
  id=`wp wc product list --allow-root --sku="$sku" --user=1 --format="json" --fields="id" | jq '.[] .id'`

  result=`wp wc product update $id --name="$name" --allow-root`
done

Bad idea! Each call to wc-cli takes quite some time. Besides we need two calls to wc-cli, one to get the id from sku – because we cant update by sku – and one to do the actual update.

It took about 5 minutes to update 70 products. Besides this it would mean that all keys should be parsed in bash. Lets not go down this road..

wc-cli second aproach

So, we have to reduce the calls to wc-cli. Why not extend wc-cli with our own command and feed the json to it. This would mean one call to wc-cli and it handles the rest. Great, first lets build our own wc-cli command.

<?php

if ( ! defined( 'WP_CLI' ) && WP_CLI ) {
    return;
}

class cli_woo extends WP_CLI_Command {

  /**
   * Display version information.
   * ## OPTIONS
   *
   * [--wponly]
   * : Shows only WP version info, omitting the plugin one.
   */
  function version( $args, $assoc_args ) {
    if ( !empty( $assoc_args['wponly'] ) ) {
      WP_CLI::line( 'Version of WordPress is ' . get_bloginfo( 'version' ) . '.' );
    } else {
      WP_CLI::line( 'Version of this plugin is 0.1-beta, and version of WordPress ' . get_bloginfo( 'version' ) . '.' );
    }
  }

  /**
   * Display the number of plugins.
   */
  function plugins() {
    WP_CLI::line( 'There are a total of ' . sizeof( get_plugins() ) . ' plugins on this site.' );
  }
}

WP_CLI::add_command( 'woo', 'cli_woo' );

?>

Great, that seems fine. Now to test the code on our bash shell. With the following bash command we can see the results.

#!/bin/sh

wp woo plugins
wp woo version
wp woo version --wp-only

For the next step we need to load and loop the json data and update our product. Which can be done with the following code.

<?php
//..

  private $key='';
  private $secret='';

  /**
   * Woo import JSON
   * ## OPTIONS
   *  --file=<name>
   * : JSON file.
   */

  function import( $args, $assoc_args ) {
        if ( ! file_exists( $assoc_args['file'] ) )
                WP_CLI::error('file not found');

        $json = json_decode( file_get_contents( $assoc_args['file'] ), true );

        if ( ! $json )
                WP_CLI::error('incorrect JSON');

        WP_CLI::log('Importing..');

        foreach( $json as $row )
        {
                $id = wc_get_product_id_by_sku( $row['SKU'] );
                WP_CLI::line( $id . ' ' . $row['SKU'] );

                $data_args = array();
                $meta_data_index = 0;

                foreach ($row as $key => $value)
                {
                        switch (strtolower($key))
                        {
                                case 'name':
                                        $data_args['name'] = $value;
                                        break;
                                case 'stock':
                                        $data_args['stock_quantity'] = $value;
                                        break;
                        }
                }

                $api_response = wp_remote_post( 'https://../wp-json/wc/v3/products/'.$id, array(
                        'headers' => array(
                                'Authorization' => 'Basic ' . base64_encode( $this->key.':'.$this->secret )
                        ),
                        'body' => $data_args
                        // more params http://woocommerce.github.io/woocommerce-rest-api-docs/?shell#product-properties
                ) );

                if ( is_wp_error($api_response) ){
                        WP_CLI::line( print_r( $api_response, false ) );
                        //echo $result->get_error_message();
                }
                $body = json_decode( $api_response['body'] );

                if( wp_remote_retrieve_response_message( $api_response ) === 'OK' ) {
                        WP_CLI::line( print_r( $body->name, true ) . ' has been updated' );
                        WP_CLI::line( WP_CLI::colorize( "%GSuccess:%n test" ) );
                }
        }
  }

//..
?>

Let’s again test the import function, hope for the best, cross our fingers – with the following bash command.

#!/bin/sh
wp woo import --file="data.json"

Damn, unauthorized. We need to create a Woocommerce API key in WordPress backend admin Woocommerce > Settings > Advanced > Rest API.

After adding the key and secret the import of 70 products takes about 2 minutes. Which annoys me pretty much because I am aiming for much less than two freaking minutes. On to take three..

wc-cli, take three

So how to shave off time of an single cli command? Well, we are looping the Rest API, would it be possible to offer the data as an batch? Why yes, Woocommerce offers an batch command. Wow!

https://../wp-json/wc/v3/products/batch

With an new wc-cli command (hint: batch) and tweaking the above php code the new batch update takes an mere 15 seconds for 70 products.

On to an more extensive test, let’s fire 3000 products JSON to the batch wc-cli. Hint: it fails – the maximum batch update takes 100 products per run.

Another rewrite to loop data and send 100 instructions per Rest API batch call. Firing again 3000 products, aaaaand SUCCESS.

10 minutes, 3000 products, and with an live progress bar by using the \cli\line, or in this case \cli\bar code.

  \cli\line( 'Parsing '.$total.' CSV lines' );
  $bar = new \cli\progress\Bar('Progress', $total);
  $bar->display();
    //..loop
  $bar->tick();
    //..endloop
  $bar->finish();

The resulting output of the wc-cli batch command.

In comparison to the previous methods the batch update results in an lower cpu usage as you can see in the below screenshot of htop. Average of multiple runs is around 50%. Do note this is an low-end test droplet on digital ocean.

Leave a Reply

Your email address will not be published. Required fields are marked *