Manually invalidate URLs cached in Nginx's reverse proxy cache

This is part 2 of 3 in a series of blog posts:

  1. Boosting Djangos performance with Nginx reverse proxy cache
  2. Manually invalidate URLs cached in Nginx's reverse proxy cache
  3. Invalidate the whole Nginx reverse proxy cache in production

So you have now set up your Nginx' reverse proxy cache as described in my previous blog post: Boosting Djangos performance with Nginx reverse proxy cache. Very nice.

But what about invalidation? It is really great that Nginx can cache our web pages and serve them very very fast. But if you change a object in Django, Nginx will still serve the old Version of the page with the old data. Until the configured timeout of the cached item is reached and Nginx updates its cache by retrieving the latest data from the database the user will see the old data.

I will explain how you can invalidate the cached item manually. So if you change the Django object, you can tell the cache to fetch the new data for that object.

There are three steps to accomplish this:

  1. Install the ngxcachepurge module into Nginx
  2. Configure a URL for cache invalidation in Nginx
  3. Write Django function that invalidates the cached item

1. Installing the ngx_cache_purge module into Nginx

Cached items in the Nginx reverse proxy cache are identified by their URL. There is no out of the box feature in Nginx that lets us invalidate one single URL. We have to install a third party Nginx module called 'ngxcachepurge' developed by FRiCKLE. Nginx lets you add new modules only on compile time, so you have compile your own Nginx. (Have no fear, this is easy!)

You can download the module or get more information about it from FRiCKLEs website or from Github.

Here are the shell commands I used to compile my Nginx with the new module:

# first: download and extract ngx_cache_purge
cd /tmp/
wget http://labs.frickle.com/files/ngx_cache_purge-2.1.tar.gz
tar -xzvf ngx_cache_purge-2.1.tar.gz

# second: download and extract Nginx
wget 'http://nginx.org/download/nginx-1.6.2.tar.gz'
tar -xzvf nginx-1.6.2.tar.gz
cd nginx-1.6.2/

# third: set compilation configuration
# we want to install Nginx to /opt/nginx
./configure --prefix=/opt/nginx \
        --add-module=/tmp/ngx_cache_purge-2.1 

# last: compile Nginx
make -j2
sudo make install

Hint: Maybe you need more modules for your live system (like ngx_http_ssl_module etc) Please see the Nginx module documentation so you compile all the modules you need.

2. Configure a URL for cache invalidation in Nginx

Now we need to tell our Nginx when it should invalidate a given URL in the cache. For this we create a location entry in our Nginx config like this:

server {
    listen 80;
    server_name  localhost;

    # log configuration
    access_log /var/log/nginx-access.log;
    error_log /var/log/nginx-error.log;

    # set up location for cache invalidation.
    # make a GET request to                
    # http://localhost/invalidate_cached_url/?url=[your_url]
    # to remove page from nginx reverse proxy cache.
    location ^~ /invalidate_cached_url/ {
        allow 127.0.0.1;
        deny all;
        proxy_cache_purge nginx_cache $arg_url;
    }
}

This creates a location /invalidate_cached_url/ that is only available from localhost and that expects a GET parameter url containing the URL that should be invalidated in the cache.

Important: If your Nginx web server and your application server do not run on the same host, you have to change 127.0.0.1 to your application servers IP.

3. Writing a Django function that invalidates the cached item

Ok, our Nginx is set up and ready to invalidate individuals URLs from its cache. To invalidate a URL we just have to make a simple HTTP GET request to the right URL giving it the URL we want to invalidate as a parameter.

Here some Django pseudo code to do this by using the beautiful Requests library:

import requests
from django.core.urlresolvers import reverse

# the url we have defined in our Nginx conf
INVALIDATE_URL = 'http://localhost/invalidate_cached_url/'

# the absolute url (including domain) we want to invalidate
url_to_invalidate = request.build_absolute_uri(reverse('blog:detail', args=[blog.slug]))

try:
    # assemble url to call 
    call_url = u'%s?url=%s' % (INVALIDATE_URL, url_to_invalidate)

    # make HTTP GET request
    r = requests.get(call_url)

except requests.ConnectionError:
    print 'Could not invalidate cache for url: %s' % call_url

For really nice code it would probably be good to have a model function that gives you the absolute URL including domain and everything of your object.

Also you could set up a Django 'post_save' signal for this model to automatically call the code that invalidates the absolute URL of the saved model.

So whenever you call my_object.save() the cached item to this object will be invalidated.

As said, this would be nice, but beyond the scope of this blog post. So go, and write this nice code!

Further reading

This is a blog post in a series of blog posts about the Nginx reverse proxy feature:

  1. Boosting Djangos performance with Nginx reverse proxy cache
  2. Manually invalidate URLs cached in Nginx's reverse proxy cache
  3. Invalidate the whole Nginx reverse proxy cache in production