Preventing Caching of Images, JavaScript, and CSS

If you ever use tools like YSlow or PageSpeed Insights for Chrome, you know that caching images, style sheets, and JavaScript is an important part of optimzing your website's performance. Both Yahoo and Google recommend that you set up "far-future expiration dates" via HTTP headers.

In Apache, you can accomplish this pretty easily with the following configuration in your site's vhost_example.com.conf file:

# enable expirations
ExpiresActive On
ExpiresByType image/gif      "access plus 1 month"
ExpiresByType image/png      "access plus 1 month"
ExpiresByType image/jpeg     "access plus 1 month"
ExpiresByType image/x-icon   "access plus 1 month"
ExpiresByType text/html               "now"
ExpiresByType text/css                "access plus 1 month"
ExpiresByType application/javascript  "access plus 1 month"

The directives above tell Apache to send expiration dates one month into the future for images, CSS, and JS files. We generally set expiration dates relative to either the last access or the last modification date of the file in question, and as you can tell, the format is in plain English (ExpiresByType options).

After you change the configuration and restart Apache, you get headers like this:

These are the response headers from a JavaScript file that runs a web application we developed. Note that the "Expires" date is one month after the current date (when we last accessed it). This means that, unless you clear your browser's cache, it will not attempt to reload this file again until a month from today. Subsquent requests for that particular will be served from the browser's cache, without even sending a request to the server to check to see if it has changed:

Setting these "expire" headers can greatly reduce the overhead associated with opening a new socket for every single file that a page needs—files that are often called on every single page within a site. But this can also create a problem if you need to push an update. Once the browser has an expires date for a file, until that date passes, each subsequent request of that file will come from cache, regardless of whether the file has changed on the server—the browser doesn't ask the server whether it has been modified. So if you update a JS file that has a far-future expiration date, unless your users empty their caches, they're going to be looking at old files. The result can be as innocuous as a color change on a background image sprite, or as devastating as a broken web application, due to incorrect form field ids in your JavaScript code.

If you haven't planned well (and it happens to all of us at some point), you're going to start hearing from clients because the site doesn't look or act like it's supposed to. And at this point, the only thing you can do to fix it is to rename all files that you think might be cached. For example, rename sprite.png to sprite2.png, and change every instance of that image URL in your style sheets. And since your CSS files are cahed, you'll need to rename those files, and change the HTML to every page that references the style sheet. It quickly becomes quite a pain in the butt.

Breaking the Cache

So before you jump into reconfiguring Apache, I suggest having a plan to override the cache automatically, forcing updated files to be reloaded. To do this, I use a simple PHP function and a mod_rewrite trick within a .htaccess file. The goal here is to automatically change the URL of any of our cached files that have changed. But we don't want to have to actually change the filenames on the server.

There are a couple ways that we can alter the URL of a file in order to force the browser to reload. If our file is named app.js, we can append a query string: app.js?timestamp=20150324. In this instance, I just appended a timestamp in the form YYYYMMDD. Instead of a timestamp, you could use a version number, like app.js?v=1.4.

That's pretty simple, but I don't really like relying query strings. In theory, most browsers should understand that a file with a different query string from the previous needs to be reloaded, but you don't know how many firewalls and caching servers might lie between the client (browser) and the web server (Apache). I'd rather be absolutely sure and not rely on the query string. And it really doesn't add any extra complexity, so why not?

Here are the goals of my file renaming scheme:

  • It needs to be automatic: it updates itself whenever a file has changed;
  • the actual filenames must remain the same; and
  • it doesn't use a query string.

It needs to be said that our goal here is not to add a time/version stamp that changes every time a page is loaded. That would completely defeat the purpose of caching—you might as well turn caching off at that point. What we really want is a timestamp that changes only when the file has been modified. Note that there's no relationship between the timestamp we're adding to the filename, and the "expires" date that we saw earlier in the HTTP respose headers. We're adding the timestamp to create a new, unique virtual filename, thus circumventing the browser cache completely.

Here is the filename format I've chosen: app.v1427224101.js. This same basic format will work for images and style sheets, as well.

OK, here's how to do it. First, you'll need a PHP function to alter URLs, inserting the modification string, based on the file's current modification time (filemtime() in PHP). We will call this funtion anywhere we link to our JS file within our site (built in PHP):

function append_cache_breaker ($filename, $local_path = '') {

    // added $local_path, in case the local filesystem path isn't the same as the relative url path.
    // e.g. append_cache_breaker('images/popup_arrow.png', PATH_ROOT . 'html/');
    // it doesn't affect output
    
    if (APPEND_CACHE_BREAKER) {
        // assumes file extension of 2-4 characters
        preg_match('/(.*)\.([a-zA-Z0-9]{2,4})/', $filename, $matches);
        return $matches[1] . '.v' . filemtime($local_path . $filename) . '.' . $matches[2];
    } else {
        return $filename;
    }

} // function append_cache_breaker

That function accepts the file URL (with path, relative to your current PHP page), and the local file path (if it's different from the file URL). $local_path defaults to '', and we just prepend it to the filename. It could be relative or absolute.

You would call that function from within a PHP page, like this:

<?php echo '<script src="' . append_cache_breaker('js/app.js') . '"></script>'; ?>

This assumes that our JS file is named app.js, and that it lives in the js/ directory, relative to our current page. Note that I added a constant called APPEND_CACHE_BREAKER. That constant is set in PHP, in the site config file. The config file lives outside any source control like git or svn. (This allows us to turn the cache breaker on or off, per site, without altering code.)

The output of the echo line above is:

<script src="js/min/app.v1427224101.js"></script>

Next, we'll need to add a directive to tell Apache that a filename of the format app.vXXXXXXXXX.js needs to be rewritten to app.js before serving it. We can do that via a directive with the site's .htaccess file:

<IfModule mod_rewrite.c>
    RewriteEngine on

    # break cache via mod date
    RewriteRule ^(.*).v([0-9]+).([a-zA-Z0-9]{2,4})$ $1.$3

</IfModule>

The RewriteRule above removes the timestamp via a regular expression match. I wrote the regular express to be very broad (including file extensions from 2-4 characters), so you might find you need to make it a bit more specific. But I know on this particular site that any filename with a ".vXXXXXXX" between the name and the extension is a cached file.

Other Instances, Outside PHP

There are times when I need to call a cached file from a static, non-interpreted (non-PHP) file, like a CSS file (e.g. image sprites). In this instance, I manually insert a timestamp in the CSS. So, for example:

.lang-eng #menu li a {
    background-image: url(images/sprites.v20150216.png);
    background-position: -1px -41px;
    background-repeat: no-repeat;
}

In this example, I inserted the timestamp (vYYYYMMDD) in the image URL. This is a bit of manual labor, not to mention a little taxing on my diminishing memory, but it doesn't happen too often. If it did, I'd probably come up with a better solution. But remember, if you go to the trouble of creating an interpreted CSS file, like style.css.php, that file is usually not going to be cached. So there's really no need to bother with the timetstamp in the first place.

Or, Using Compass and SASS with Inline Images

Another solution to this would be to use SASS and inline images, as follows:

#page {
    background-image: inline-image("bg_page.png");
    background-position: left top;
    background-repeat: repeat-x;
    background-color: #bbbdbc;
}

Using this method, you don't need a timestamp for the image. Instead, Compass/SASS automatically inserts the image data directly into the compiled CSS file each time the image changes. Since the compiled CSS file changes, then as long as you're appending a cache breaker to the CSS file itself, you're good to go.

One additional benefit to using inline images with SASS is that the browser never has to open a socket to get the image, even the first time you load a page.

Conclusion

I hope these hints get you started on a good caching scheme for your web application. As always, if you have any suggestions for improvements, leave a comment.

Good luck!