Why I like Redis

2009-10-22T10:58:21+00:00

I've been getting a lot of useful work done with Redis recently.

Redis is typically categorised as yet another of those new-fangled NoSQL key/value stores, but if you look closer it actually has some pretty unique characteristics. It makes more sense to describe it as a "data structure server" - it provides a network service that exposes persistent storage and operations over dictionaries, lists, sets and string values. Think memcached but with list and set operations and persistence-to-disk.

It's also incredibly easy to set up, ridiculously fast (30,000 read or writes a second on my laptop with the default configuration) and has an interesting approach to persistence. Redis runs in memory, but syncs to disk every Y seconds or after every X operations. Sounds risky, but it supports replication out of the box so if you're worried about losing data should a server fail you can always ensure you have a replicated copy to hand. I wouldn't trust my only copy of critical data to it, but there are plenty of other cases for which it is really well suited.

I'm currently not using it for data storage at all - instead, I use it as a tool for processing data using the interactive Python interpreter.

I'm a huge fan of REPLs. When programming Python, I spend most of my time in an IPython prompt. With JavaScript, I use the Firebug console. I experiment with APIs, get something working and paste it over in to a text editor. For some one-off data transformation problems I never save any code at all - I run a couple of list comprehensions, dump the results out as JSON or CSV and leave it at that.

Redis is an excellent complement to this kind of programming. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that's already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don't have to think for more than a few seconds about how I'm going to represent my data.

Here's a 30 second guide to getting started with Redis:

$ wget http://redis.googlecode.com/files/redis-1.01.tar.gz
$ tar -xzf redis-1.01.tar.gz
$ cd redis-1.01
$ make
$ ./redis-server

And that's it - you now have a Redis server running on port 6379. No need even for a ./configure or make install. You can run ./redis-benchmark in that directory to exercise it a bit.

Let's try it out from Python. In a separate terminal:

$ cd redis-1.01/client-libraries/python/
$ python
>>> import redis
>>> r = redis.Redis()
>>> r.info()
{u'total_connections_received': 1, ... }
>>> r.keys('*') # Show all keys in the database
[]
>>> r.set('key-1', 'Value 1')
'OK'
>>> r.keys('*')
[u'key-1']
>>> r.get('key-1')
u'Value 1'

Now let's try something a bit more interesting:

>>> r.push('log', 'Log message 1', tail=True)
>>> r.push('log', 'Log message 2', tail=True)
>>> r.push('log', 'Log message 3', tail=True)
>>> r.lrange('log', 0, 100)
[u'Log message 3', u'Log message 2', u'Log message 1']
>>> r.push('log', 'Log message 4', tail=True)
>>> r.push('log', 'Log message 5', tail=True)
>>> r.push('log', 'Log message 6', tail=True)
>>> r.ltrim('log', 0, 2)
>>> r.lrange('log', 0, 100)
[u'Log message 6', u'Log message 5', u'Log message 4']

That's a simple capped log implementation (similar to a MongoDB capped collection) - push items on to the tail of a 'log' key and use ltrim to only retain the last X items. You could use this to keep track of what a system is doing right now without having to worry about storing ever increasing amounts of logging information.

See the documentation for a full list of Redis commands. I'm particularly excited about the RANDOMKEY and new SRANDMEMBER commands (git trunk only at the moment), which help address the common challenge of picking a random item without ORDER BY RAND() clobbering your relational database. In a beautiful example of open source support in action, I requested SRANDMEMBER on Twitter yesterday and antirez committed just 12 hours later.

I used Redis this week to help create heat maps of the BNP's membership list for the Guardian. I had the leaked spreadsheet of the BNP member details and a (licensed) CSV file mapping 1.6 million postcodes to their corresponding parliamentary constituencies. I loaded the CSV file in to Redis, then looped through the 12,000 postcodes from the membership and looked them up in turn, accumulating counts for each constituency. It took a couple of minutes to load the constituency data and a few seconds to run and accumulate the postcode counts. In the end, it probably involved less than 20 lines of actual Python code.

A much more interesting example of an application built on Redis is Hurl, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. The code is now open source, and Chris talks a bit more about the implementation (in particular their use of sort in Redis) on his blog. Redis also gets a mention in Tom Preston-Werner's epic writeup of the new scalable architecture behind GitHub.

Tags: chris-wanstrath, github, guardian, hurl, interactivedevelopment, ipython, leah-culver, open-source, performance, python, redis

Simon Willison's Weblog: hurl

Why I like Redis