# Goal

How to set up an robust, low cpu usage and fast command line interface which will batch process CSV files and update woocommerce by SKU.

The update by SKU is very important because the source data has no idea which IDs the Woocommerce products are linked to. SKU is unique in both Woocommerce as the source SAP B1 data.

# Why?

Current situation is unfavorable, at the moment Woocommerce products are updated through an puppeteer headless browser session, which scripts the following actions:

• Open Woocommerce native importer url and upload the CSV file.
• Click through the import options and run import.
• Await updating process and give feedback on result.

Puppeteer, or PhantomJS which we used before, are by far the ideal way of executing a mission critical system as hourly product updates.

# How?

## Before, lets make sure our data is clean

First I examined a robust way of making sure that the delivered CSV would be pre-processed into an fail safe CSV (or JSON) format.

## jq

I tried using jq on the source CSV file to output an valid JSON array. Unfortunately, and rather expected, the CSV file had some oddities which were not conform standards. I found out by using the code below:

#!/bin/sh
jq -R -sr -f csv2json.jq test.csv

jq has an filter possibility in which the CSV can be processed into JSON. I found two filters online who partly did the job but I had to adjust some part of the filter to avoid some errors in this specific CSV format.

The custom parts were removing EOL chars from the header column names and trimming \” from description fields.

# jq filter
# objectify/1 takes an array of string values as inputs, converts
# numeric values to numbers, and packages the results into an object
# with keys specified by the "headers" array
# For jq 1.4, replace the following line by: def tonumberq: .;
def tonumberq: tonumber? // .;
def trimspace: if type == "string" then sub("^ +";"") | sub(" +$";"") else . end; def trimquote: if type == "string" then sub("^\"+";"") | sub("\"+$";"") else . end;
. as $in | reduce range(0; headers|length) as$i ({}; .[headers[$i]] = ($in[$i] | trimspace | trimquote | tonumberq) ); #headers def csv2table: # For jq 1.4, replace the following line by: def trim: .; def trim: sub("^ +";"") | sub(" +$";"");
def trimeol: sub("\r+$";""); split("\n") | map( split(",") | map(trim | trimeol) ); def csv2json: csv2table | .[0] as$headers
| reduce (.[1:][] | select(length > 0) ) as $row ( []; . + [$row|objectify($headers) ]); csv2json The issue with the above filter is that as though it will work great with simple CSV files it is not context sensitive. Think about prices written as: “99,00” these will be cut and results in an corrupt JSON array. I did not feel like writing an context sensitive filter for jq and left it at that. Which quite bugged me because when the data is in a valid form jq can really make life great in bash. ## csvkit First I was looking at anycsv which is an python library. However implementing it did not robust enough and relied on some odd instructions. After some research I came across csvkit which seemed like an extensive multi purpose library. With the below command I tested the output: #!/bin/sh csvjson test.csv Garbage ensued, I needed to prettify the output to make sense out of it: #!/bin/sh csvjson test.csv | jq Yes, prettified JSON output AND using jq again. This is going the right way. But hold on, why are my prices changed? csvkit’s csvjson seems to be doing some conversions. Let’s try to avoid those: #!/bin/sh csvjson test.csv -I | jq The -I command disables type inference (and –locale, –date-format, –datetime-format) when parsing CSV input. Good, now we have valid, robust and if necessary, customizable, CSV to JSON conversion going on. A solid base to work upon. # Updating products So, now that the data is clean we can examine various ways of updating Woocommerce products. ## wp-cli First off I’d thought it would be a good idea to loop the jq results in bash and update the products one by one using the wp-cli, or in this case the wc-cli. I wrote a quick loop and executed an test for 70 products. #!/bin/sh #count records count=jq '. | length' data.json #loop records for i in$( seq 0 $count ) do echo$i
product=jq ".[$i]" data.json name=jq -r ".[$i] .Name" data.json
sku=jq -r ".[$i] .SKU" data.json #get id as we can not update by sku id=wp wc product list --allow-root --sku="$sku" --user=1 --format="json" --fields="id" | jq '.[] .id'

result=wp wc product update $id --name="$name" --allow-root
done

Bad idea! Each call to wc-cli takes quite some time. Besides we need two calls to wc-cli, one to get the id from sku – because we cant update by sku – and one to do the actual update.

It took about 5 minutes to update 70 products. Besides this it would mean that all keys should be parsed in bash. Lets not go down this road..

## wp-cli, 2nd approach

So, we have to reduce the calls to wc-cli. Why not extend wc-cli with our own command and feed the json to it. This would mean one call to wc-cli and it handles the rest. Great, first lets build our own wc-cli command for testing purposes.

<?php

if ( ! defined( 'WP_CLI' ) && WP_CLI ) {
return;
}

class cli_woo extends WP_CLI_Command {

/**
* Display version information.
* ## OPTIONS
*
* [--wponly]
* : Shows only WP version info, omitting the plugin one.
*/
function version( $args,$assoc_args ) {
if ( !empty( $assoc_args['wponly'] ) ) { WP_CLI::line( 'Version of WordPress is ' . get_bloginfo( 'version' ) . '.' ); } else { WP_CLI::line( 'Version of this plugin is 0.1-beta, and version of WordPress ' . get_bloginfo( 'version' ) . '.' ); } } /** * Display the number of plugins. */ function plugins() { WP_CLI::line( 'There are a total of ' . sizeof( get_plugins() ) . ' plugins on this site.' ); } } WP_CLI::add_command( 'woo', 'cli_woo' ); ?> Great, that seems fine. Now to test the code on our bash shell. With the following bash command we can see the results. #!/bin/sh wp woo plugins wp woo version wp woo version --wp-only For the next step we need to load and loop the json data and update our product. Which can be done with the following code. <?php //.. private$key='';
private $secret=''; /** * Woo import JSON * ## OPTIONS * --file=<name> * : JSON file. */ function import($args, $assoc_args ) { if ( ! file_exists($assoc_args['file'] ) )

$json = json_decode( file_get_contents($assoc_args['file'] ), true );

if ( ! $json ) WP_CLI::error('incorrect JSON'); WP_CLI::log('Importing..'); foreach($json as $row ) {$id = wc_get_product_id_by_sku( $row['SKU'] ); WP_CLI::line($id . ' ' . $row['SKU'] );$data_args = array();

foreach ($row as$key => $value) { switch (strtolower($key))
{
case 'name':
$data_args['name'] =$value;
break;
case 'stock':
$data_args['stock_quantity'] =$value;
break;
}
}

$api_response = wp_remote_post( 'https://../wp-json/wc/v3/products/'.$id, array(
'Authorization' => 'Basic ' . base64_encode( $this->key.':'.$this->secret )
),
'body' => $data_args // more params http://woocommerce.github.io/woocommerce-rest-api-docs/?shell#product-properties ) ); if ( is_wp_error($api_response) ){
WP_CLI::line( print_r( $api_response, false ) ); //echo$result->get_error_message();
}
$body = json_decode($api_response['body'] );

if( wp_remote_retrieve_response_message( $api_response ) === 'OK' ) { WP_CLI::line( print_r($body->name, true ) . ' has been updated' );
WP_CLI::line( WP_CLI::colorize( "%GSuccess:%n test" ) );
}
}
}

//..
?>

Let’s again test the import function, hope for the best, cross our fingers – with the following bash command.

#!/bin/sh
wp woo import --file="data.json"

Damn, unauthorized. We need to create a Woocommerce API key in WordPress backend admin Woocommerce > Settings > Advanced > Rest API.

After adding the key and secret the import of 70 products takes about 2 minutes. Which annoys me pretty much because I am aiming for much less than two freaking minutes. On to take three..

## wp-cli, take 3

So how to shave off time of an single cli command? Well, we are looping the Rest API, would it be possible to offer the data as an batch? Why yes, Woocommerce offers an batch command. Wow!

https://../wp-json/wc/v3/products/batch

With an new wc-cli command (hint: batch) and tweaking the above php code the new batch update takes an mere 15 seconds for 70 products. Do note that for the batch command you need to offer the data of update and create products seperately. I retrieve the ID of the product by SKU to check if an update or create is required.

On to an more extensive test, let’s fire 3000 products JSON to the batch wc-cli. Hint: it fails – the maximum batch update takes 100 products per run.

Another rewrite to loop data and send 100 instructions per Rest API batch call. Firing again 3000 products, aaaaand SUCCESS.

10 minutes, 3000 products, and with an live progress bar by using the \cli\line, or in this case \cli\bar code.

  \cli\line( 'Parsing '.$total.' CSV lines' );$bar = new \cli\progress\Bar('Progress', $total);$bar->display();
//..loop
$bar->tick(); //..endloop$bar->finish();`

The resulting output of the wc-cli batch command.

# Conclusion

Very nice results, and the thirth approach results in an lower cpu usage as you can see in the below screenshot of htop. Average of multiple runs is around 50%. Do note this is an low-end test droplet on digital ocean.

For the complete wp-cli woocommerce bartch import script, check the following git repository wp-cli-import. There could be a lot of improvements added, like automatic field mapping which I will be workign on every now and then.