<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: hurl</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/hurl.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2009-10-22T10:58:21+00:00</updated><author><name>Simon Willison</name></author><entry><title>Why I like Redis</title><link href="https://simonwillison.net/2009/Oct/22/redis/#atom-tag" rel="alternate"/><published>2009-10-22T10:58:21+00:00</published><updated>2009-10-22T10:58:21+00:00</updated><id>https://simonwillison.net/2009/Oct/22/redis/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been getting a lot of useful work done with &lt;a href="http://code.google.com/p/redis/"&gt;Redis&lt;/a&gt; recently.&lt;/p&gt;

&lt;p&gt;Redis is typically categorised as yet another of those new-fangled NoSQL key/value stores, but if you look closer it actually has some pretty unique characteristics. It makes more sense to describe it as a "data structure server" - it provides a network service that exposes persistent storage and operations over dictionaries, lists, sets and string values. Think memcached but with list and set operations and persistence-to-disk.&lt;/p&gt;

&lt;p&gt;It's also incredibly easy to set up, &lt;a href="http://code.google.com/p/redis/wiki/Benchmarks" title="How Fast is Redis?"&gt;ridiculously fast&lt;/a&gt; (30,000 read or writes a second on my laptop with the default configuration) and has an interesting approach to persistence. Redis runs in memory, but syncs to disk every Y seconds or after every X operations. Sounds risky, but it supports replication out of the box so if you're worried about losing data should a server fail you can always ensure you have a replicated copy to hand. I wouldn't trust my only copy of critical data to it, but there are plenty of other cases for which it is really well suited.&lt;/p&gt;

&lt;p&gt;I'm currently not using it for data storage at all - instead, I use it as a tool for processing data using the interactive Python interpreter.&lt;/p&gt;

&lt;p&gt;I'm a huge fan of REPLs. When programming Python, I spend most of my time in an &lt;a href="http://ipython.scipy.org/"&gt;IPython&lt;/a&gt; prompt. With JavaScript, I use the &lt;a href="http://getfirebug.com/cl.html"&gt;Firebug console&lt;/a&gt;. I experiment with APIs, get something working and paste it over in to a text editor. For some one-off data transformation problems I never save any code at all - I run a couple of list comprehensions, dump the results out as JSON or CSV and leave it at that.&lt;/p&gt;

&lt;p&gt;Redis is an excellent complement to this kind of programming. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that's already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don't have to think for more than a few seconds about how I'm going to represent my data.&lt;/p&gt;

&lt;p&gt;Here's a 30 second guide to getting started with Redis:&lt;/p&gt;

&lt;pre&gt;&lt;code class="shell"&gt;$ wget http://redis.googlecode.com/files/redis-1.01.tar.gz
$ tar -xzf redis-1.01.tar.gz
$ cd redis-1.01
$ make
$ ./redis-server&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And that's it - you now have a Redis server running on port 6379. No need even for a &lt;samp&gt;./configure&lt;/samp&gt; or &lt;samp&gt;make install&lt;/samp&gt;. You can run &lt;samp&gt;./redis-benchmark&lt;/samp&gt; in that directory to exercise it a bit.&lt;/p&gt;

&lt;p&gt;Let's try it out from Python. In a separate terminal:&lt;/p&gt;

&lt;pre&gt;&lt;code class="shell"&gt;$ cd redis-1.01/client-libraries/python/
$ python
&amp;gt;&amp;gt;&amp;gt; import redis
&amp;gt;&amp;gt;&amp;gt; r = redis.Redis()
&amp;gt;&amp;gt;&amp;gt; r.info()
{u'total_connections_received': 1, ... }
&amp;gt;&amp;gt;&amp;gt; r.keys('*') # Show all keys in the database
[]
&amp;gt;&amp;gt;&amp;gt; r.set('key-1', 'Value 1')
'OK'
&amp;gt;&amp;gt;&amp;gt; r.keys('*')
[u'key-1']
&amp;gt;&amp;gt;&amp;gt; r.get('key-1')
u'Value 1'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now let's try something a bit more interesting:&lt;/p&gt;

&lt;pre&gt;&lt;code class="shell"&gt;&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 1', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 2', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 3', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.lrange('log', 0, 100)
[u'Log message 3', u'Log message 2', u'Log message 1']
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 4', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 5', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 6', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.ltrim('log', 0, 2)
&amp;gt;&amp;gt;&amp;gt; r.lrange('log', 0, 100)
[u'Log message 6', u'Log message 5', u'Log message 4']&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That's a simple capped log implementation (similar to a &lt;a href="http://www.mongodb.org/display/DOCS/Capped+Collections"&gt;MongoDB capped collection&lt;/a&gt;) - &lt;samp&gt;push&lt;/samp&gt; items on to the tail of a 'log' key and use &lt;samp&gt;ltrim&lt;/samp&gt; to only retain the last X items. You could use this to keep track of what a system is doing right now without having to worry about storing ever increasing amounts of logging information.&lt;/p&gt;

&lt;p&gt;See the documentation for a &lt;a href="http://code.google.com/p/redis/wiki/CommandReference"&gt;full list of Redis commands&lt;/a&gt;. I'm particularly excited about the &lt;samp&gt;RANDOMKEY&lt;/samp&gt; and new &lt;samp&gt;SRANDMEMBER&lt;/samp&gt; commands (&lt;a href="http://github.com/antirez/redis/commit/2abb95a9a849453eeb864e919ea0b8d6495a6a2a"&gt;git trunk only&lt;/a&gt; at the moment), which help address the common challenge of picking a random item without &lt;code class="sql"&gt;ORDER BY RAND()&lt;/code&gt; clobbering your relational database. In a beautiful example of open source support in action, I &lt;a href="http://twitter.com/simonw/status/5027987857"&gt;requested SRANDMEMBER on Twitter&lt;/a&gt; yesterday and &lt;a href="http://twitter.com/antirez" title="Salvatore Sanfilippo"&gt;antirez&lt;/a&gt; committed just 12 hours later.&lt;/p&gt;

&lt;p&gt;I used Redis this week to help create &lt;a href="http://www.guardian.co.uk/news/datablog/2009/oct/19/bnp-membership-list-constituency" title="BNP membership where you live"&gt;heat maps of the BNP's membership list&lt;/a&gt; for the Guardian. I had the leaked spreadsheet of the BNP member details and a (licensed) CSV file mapping 1.6 million postcodes to their corresponding parliamentary constituencies. I loaded the CSV file in to Redis, then looped through the 12,000 postcodes from the membership and looked them up in turn, accumulating counts for each constituency. It took a couple of minutes to load the constituency data and a few seconds to run and accumulate the postcode counts. In the end, it probably involved less than 20 lines of actual Python code.&lt;/p&gt;

&lt;p&gt;A much more interesting example of an application built on Redis is &lt;a href="http://hurl.it/"&gt;Hurl&lt;/a&gt;, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. The &lt;a href="http://github.com/defunkt/hurl"&gt;code is now open source&lt;/a&gt;, and Chris talks a bit more about the implementation (in particular their use of sort in Redis) &lt;a href="http://ozmm.org/posts/sort_in_redis.html"&gt;on his blog&lt;/a&gt;. Redis also gets a mention in Tom Preston-Werner's &lt;a href="http://github.com/blog/530-how-we-made-github-fast"&gt;epic writeup&lt;/a&gt; of the new scalable architecture behind GitHub.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/chris-wanstrath"&gt;chris-wanstrath&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/guardian"&gt;guardian&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hurl"&gt;hurl&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/interactivedevelopment"&gt;interactivedevelopment&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ipython"&gt;ipython&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/leah-culver"&gt;leah-culver&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/redis"&gt;redis&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="chris-wanstrath"/><category term="github"/><category term="guardian"/><category term="hurl"/><category term="interactivedevelopment"/><category term="ipython"/><category term="leah-culver"/><category term="open-source"/><category term="performance"/><category term="python"/><category term="redis"/></entry></feed>