WordPress image CDN and optimization

Goal

Current situation is unfavorable, images are all stored and served from the webserver. This puts a lot of extra load and bandwidth usage on the webserver which we want to avoid.

Also when making snapshots or backing up the server 10.000+ images, which is actually 60.000+ with all media sizes, is inconvenient to say the least.

  • Optimize image load times
  • Reduce webserver load
  • Reduce used bandwith
  • Reduce webserver disk space

How?

Using plugins?

We have tried most, if not all, plugins to achieve the goal. The issues with most of them are the unfavorable vendor lock in, pricing, existing media issues etc etc.

Initially we did not want to create our own implementation and went ahead with Spaces Sync plugin which does one thing well, it hooks to add_attachment and delete_attachment hooks to copy or remove the media to and from S3 compatible storage.

CDN compatibility with other plugins

First pitfall..

Not all media plugins work well with offsite images. Think about regeneration plugins, crop plugins or any plugin which expects the file to be stored local.

The Fly Dynamic Image Resizer plugin is a great tool to reduce image sizes, especially when dealing with huge banner images. It creates the required image size on the fly when required. Call me autistic but I refuse to add a addtional image size for each media attachment when the size is rarely used.

Anyway, when an image is stored offsite, the image resizer is unable to create the required image from the source. First option would be to request the plugin author to get the file url instead of the file path. Or we can hook in the get_attached_file action hook and serve our full CDN url.

add_filter ( 'get_attached_file', 'my_get_attached_file', 1, 2);
function my_get_attached_file( $file, $attachment_id )
{    
     if ( ! file_exists ( $file ) ) # we will improve this later in this blog post :)
     {
         return str_replace('/home/991455.cloudwaysapps.com/juseyhuctm/public_html/wp-content/uploads','https://cdn.url.com',$file);
     }   
     return $file;
}

Ignore the file_exists for now. We will improve that later in the blog post but for now it will be sufficient.

Blocking or Non-blocking

Second pitfall..

Some plugins are blocking progress when uploading or removing media. Under normal conditions this should not be a issue. What if the CDN reacts slowly? It would block everything until the full job has completed.

For example, we are using a Lightroom export plugin which adds/removes media from and to WordPress. We found that the Spaces Sync is blocking during the sync of S3 spaces. Which leads to huge delays when adding or removing attachments through Lightroom.

This means that the CDN syncronization should be active with scheduled, or non blocking, background tasks. At this time, after testing many plugins, we had enough. Lets just create our own implementation.

First implementation

Mass upload server side solution

First we need to know how to interact with the CDN. After some research we found s3cmd. Now there seem to be some revisions: s4cmd and s5cmd which are able to move significantly more data in a shorter timeframe.

First we created a bash script which will mass process all existing local files. For the full script check the below github sync repository which will mass optimize, sync and purge local files.

#full sync script
https://github.com/opicron/cli-spaces-sync/blob/master/spaceput.sh

I wont go into detail about this script but we used it as a server only solution during the refined implementation. See it as a 1) initialization script and 2) brute force way to keep the offsite images synced. The script runs the following jobs:

  • Optimize all local images and save paths of optimized images in purgelist
  • Copy purgelist images to CDN (halt on fail, to avoid removing local images on errors in last step)
  • Purge cache from CDN in batches based on purgelist
  • Remove local images

One could add the script to your daily cron job or the script can be called from the add_attachment WordPress hook to sync the new files. Now all current, and future, images will be accessible through the CDN.

Optimizing images

The common ways to optimize media are jpegoptim and optipng. The commands to optimize and strip unused data are:

optipng -preserve -strip all "image"
jpegoptim -s -p --all-progressive "image"

Send media to CDN

The command to send a file to CDN with s3cmd is as following. Do note to add –acl-public flag to make the image available to the public instead of storing it private.

Note: s3cmd does need a configuration file set (usually ~/.s3cmd) in which the following fields are defined.

website_endpoint
secret_key
host_base
host_bucket
access_key
s3cmd put "file" s3://BUCKET/"file" --acl-public

Purge cache from CDN

In the github script you will find that the images are not only copied to the CDN but also have their cache reset. If the cache is not refreshed one will not see changes of the image (crops or develop changes) until the cache runs out.

The CDN api accepts only batches of purge requests so we needed to split these up as shown below.

#group per chunk to avoid timeout on api  
g=20                

for((i=0; i < ${#purge[@]}; i+=g))                   
do                
 part=( "${purge[@]:i:g}" )                

 #json purgelist         
 purgelist=`printf '%s\n' "${part[@]}" | jq -R . | jq -s '{"files":.}' -r`   
 echo $purgelist | jq  
                      
 #purge cdn api  
 if [ "${DRYRUN}" -eq 0 ]; then     
 curl -X DELETE -H "Authorization: Bearer <<YOUR_DO_TOKEN_HERE>>" \  
   "https://api.digitalocean.com/v2/cdn/endpoints/<<END_POINT_ID_HERE>>/cache" \  
   -s -o /dev/null -w "%{http_code}" \  
   -d "$purgelist"  
                
 fi                                              
done

Serving the correct image

Now that the media is on the CDN we need to add a function which will check if the image is on CDN.

function is_cdn_image($post_id)
 {
     //skip if attachment is not a image
     if ( ! wp_attachment_is_image( $post_id ) )
            return $false;

     // get CDN field status
     $activeCDNmeta = metadata_exists( 'post', $post_id, '_isCDN' );

     //get CDN status
     $activeCDN = get_post_meta( $post_id, '_isCDN', true);

     //if on CDN (or non existent meta)
     if ( ($activeCDNmeta && $activeCDN == true) || $activeCDNmeta == false )
     {
            return true;
     }
 }

Then some hooks to tell WordPress to make sure the CDN urls are used when images are displayed or requested. We found the following three hooks are required for a good compatibilty: wp_get_attachment_url, wp_get_attachment_image_src and wp_calculate_image_srcset.

//helper function
function to_cdn($url) {
  
                 $needle = trailingslashit( 'wp-content/uploads/' );
                 $pos = strpos($url, $needle); 
                 
                 //skip if needle is not found
                 if ($pos === false)
                                return $url;
                                
                 $filepath = substr ( $url, $pos + strlen( $needle )  );
                 $url = trailingslashit('https://bucket.cdn.digitaloceanspaces.com/') . $filepath;             
  
                 return $url;
}

add_filter('wp_get_attachment_url', 'clrs_get_attachment_url', 999, 2);
function clrs_get_attachment_url($url, $post_id) {
                 
                 if ( is_cdn_image( $post_id ) )
                 {
                                $url = to_cdn($url);
                 }
                 return $url;
}
  
add_filter('wp_get_attachment_image_src', 'test_get_attachment_image_src', 10, 4);
function test_get_attachment_image_src($image, $attachment_id, $size, $icon) {
  
                 if (  ! $image )
                                return $image;
  
                 if( is_array( $image ) ) 
                 {                                              
                                if ( is_cdn_image( $attachment_id ) )
                                {
  
                                                $src = to_cdn($image[0]); // To CDN
                                                $width = $image[1];
                                                $height = $image[2];
  
                                                return [$src, $width, $height, true];
                                }                                              
                 }
  
                 return $image;
}
  
  
add_filter('wp_calculate_image_srcset', 'test_calculate_image_srcset', 10, 5);
function test_calculate_image_srcset($sources, $size_array, $image_src, $image_meta, $attachment_id) {
                
                 if ( is_cdn_image( $attachment_id ) )
                 {
  
                                $images = [];
  
                                foreach($sources as $source) {
                                                $src = to_cdn($source['url']); // To CDN
                                                $images[] = [
                                                                'url' => $src,
                                                                'descriptor' => $source['descriptor'],
                                                                'value' => $source['value']
                                                ];
                                }
                 
                                return $images;
                 }
  
                 return $sources;              
} 

Removing images from the Media Library

When using the above script do keep in mind when images are removed from the WordPress media library they are not removed from the offsite storage. The delete_attachment hook will be used to remove the offsite images in the second implementation.

add_action('delete_attachment','schedule_delete_attachment',999 ,1);
function schedule_delete_attachment( $post_id )
{
  //remove image and all media sizes from CDN
}

Second implementation

Hybrid server side solution

While we will always need the above mass upload script to initially brute force all local images to the offsite storage, we would like process only the new or removed images accurately through scheduled cron jobs.

Also, with the above server side implementation there is a possibility that the file is not send to the offsite storage yet but WordPress does output the offsite url. This will be solved when the daily cron runs the sync or when the sync is scheduled by WordPress.

Thus we refined the synchronisation by combining wp cron schedule and two bash scripts to process the images from and to the offsite storage. We think that this is a more elegant robust method, still both methods are required to implement a custom solution.

WordPress cron jobs

delete_attachment runs, but the id is not existing when the cron job is actually ran..

Cleaning up

Back up CDN

The reason we wanted the images off the webserver is to make snapshots quicker. But now the images are not backed up anywhere. To create a backup and sync destination with origin we can use the rclone command. Of course we need an destination bucket.

Same as s3cmd the rclone command needs both destination and origin to be configured in the ~/.config/rclone/rclone.conf file.

[spacesorigin]
  type=s3
  env_auth=false
  access_key_id=
  secret_access_key=
  endpoint=
  acl=private

[spacesdest]
  ..

Then, after configuration is done we run the following command.

rclone sync spacesorigin:bucket spacesdest:bucket --progress ...

It runs pretty slow, lets increase the speed of the sync

--max-backlog= --transfers=100 --checkers=200 ..

Conclusion

None yet =)

Leave a Comment