By: Josiah Huckins - 5/21/2019
minute read
Caching is used everywhere,
from static websites to distributed cloud apps. When it comes to performance, it's still one of the best tools in your toolbox. When it comes to pushing updates live, it can seem like caches are weaponized against you.Web app devs and architects know the scene too well. You deployed and tested your code in UAT and everything was fine, then you deployed to production only to find it broken, or worse, working right for some and broken for others. You then spent time debugging only to find that somewhere along the request transaction highway, a CDN edge node or proxy server just didn't want to give up the old, out of date version of your precious code files. Maybe it didn't get the memo about invalidation, maybe its grown fond of the aging files. In any case, now you've got to try and clear it out...(sigh) again. You run a cache purge, and hope that the changes propagated this time. Sometimes, you have to repeat this process, or even get server hosting involved for a manual clear. (By the way, I'm not trying to talk smack about use of caches, they're a necessity for sure. Just want to clarify that. Anyway, moving on.)
What if there was a way to prevent this from ever happening again? What if there was a way to deploy updates and guarantee that every proxy, CDN node, browser, device and underground bunker was forced to update it's cache? Furthermore, what if it could be done targeting specific files that were updated, preventing a need to purge the entire cache? What if you could do this, even with cache-control headers, etags and other client side optimization tricks in the mix.
There is a way and it's a simple solution, employ strong file versioning. It's that easy.
You see, when it comes to HTTP based caches, if the location of the resource changes, it forces the client to grab the resource again upon the next request. This is regardless of any caching rules previously implemented and communicated to the client for that resource.
Here's a simple example.
Let's say you like cars. You have a page that serves an image of a 1969 Camaro called car.jpg, under the following path: /images/car.jpg. This is also a link target (src) on several other pages. This is a static image that doesn't change much, so you've configured the server response headers with: Cache-Control: max-age=63072000, which tells the client to use the cached version for 2 years. (You really like this picture I guess.)That has been served in the wild for a few weeks, but now suddenly, you're all about drifting. You don't even like classic muscle cars anymore. So now you want to change that car.jpg image to serve up an AE86 (yes, that AE86).
But alas, you've set the car.jpg to be cached for 2 years. You could go through the usual cache invalidation/purge dance, running into the same old issues with stuck cache and having to repeat the process. Also, if this image is linked on other pages on your site or bookmarked, just changing the file name would result in having to update broken links elsewhere and put in 301 redirects for bookmarking users. That seems like a lot of work for one little picture.
However, what if you started with this content structure:
/images/car.jpg?v=1106222509
Now, when it comes time to swap out the image, you change car.jpg to be a different car picture AND you increment the version (v) query string by 1. The result is: /images/car.jpg?v=1106222509
With that setup, any client with the cached image will be forced to grab the new image.
Updating links with the new query parameter is preferred but optional. They won't break if you do nothing.
Note that this uses an integer increment. You may alternatively want to generate a unique hash for the version parameter.
Let's look at another example. You're deploying a single page application. This application uses an SPA framework for request routing and styles the ui using a combination of javascript, SCSS and CSS.
Certain ui elements rarely change, so you've set Cache-Control's Max-Age to a high value. There is also a JSON web token, containing the logged in user's access level, so you've set the private directive for its Cache-Control header.
A number of things could go wrong here. For one, if a bug is discovered with the framework, you may have to update and clear out old versions of front-end items from caches. Another more serious scenario comes if a vulnerability is found with the web tokens. In that situation, you may need to invalidate existing tokens before their expiration. If you've versioned your code files and tokens, when critical updates are pushed, you can ensure the client is updated too. Just increment that version number.
Automatic Versioning
There are a few different options to automate the process of versioning files.
Gulp has a handy package called gulp-version-number.
Grunt offers tools like grunt-bump.
I typically use Maven to package and deploy releases, as it can be used to automate the building of both back-end java and front-end js, scss, less and other files.
Maven has a plugin called Copy Rename, that allows you to rename files during a specified phase of the build process.
Using this, you could rename individual files, or the target directory where your files are stored.
Renaming the directory provides the benefit of versioning all the files within. That way you can version all or a subset of your code, and increment it at deploy time.
To use this plugin, the following goal should be present in your module's POM.xml:
Adjust the source and destination directory names to fit your code structure. That ${project.version} string is a maven project property. It allows you to change to a new release version in one place and have it update for all plugins that use this property (more on this in a moment).
Depending on your setup, you may have to change the build phase.
With this configuration in place, "frontlibs" will have a version number appended to its name when you run the maven build. When the build is deployed, the location of all files within frontlibs will change, to reflect the new name of the parent folder.
You can run the build using the following command (where 1.0.0 is incremented with each run):
mvn clean install -Dversion=1.0.0
Note also, you can use this same maven property definition for other areas of the POM. This is often used for sharing a version number among multiple build modules.
Taking this even further, if you use a build orchestration server like Jenkins, you can configure a Maven job as a parameterized build, allowing you to inject the version number when you run the job.
Closing Thoughts
With the techniques mentioned, you can guarantee caches are updated across the board whenever you deploy updates and still maintain complex caching rulesets.
You can version individual files or entire directories.
Versioning directories is best used when you can easily update links. Certain app frameworks will rebuild the anchor links for you when they change. Without a framework there are ways to automate this as well (a discussion for another post).
Some may be thinking of using hashes as an alternative to version numbers. These could be hashes of the folder or file names being cached. The downside to this approach is that unless you constantly salt your hashes or create them based on new file names, the computed hashes will not change. If you are constantly salting, you might as well use versioning instead.
I've been using the folder versioning technique on a client project with great success. Post deployment, I haven't had to touch an invalidation or purge tool for the cached items in over a year!
This simple solution has saved so much time.
Thank you for reading! I hope you enjoyed this post and it was helpful to you.