Simon Willison's Weblog: nosql

Using S3 triggers to maintain a list of files in DynamoDB

2025-02-19T22:07:32+00:00

Using S3 triggers to maintain a list of files in DynamoDB

I built an experimental prototype this morning of a system for efficiently tracking files that have been added to a large S3 bucket by maintaining a parallel DynamoDB table using S3 triggers and AWS lambda.

I got 80% of the way there with this single prompt (complete with typos) to my custom Claude Project:

Python CLI app using boto3 with commands for creating a new S3 bucket which it also configures to have S3 lambada event triggers which moantian a dynamodb table containing metadata about all of the files in that bucket. Include these commands

create_bucket - create a bucket and sets up the associated triggers and dynamo tables

list_files - shows me a list of files based purely on querying dynamo

ChatGPT then took me to the 95% point. The code Claude produced included an obvious bug, so I pasted the code into o3-mini-high on the basis that "reasoning" is often a great way to fix those kinds of errors:

Identify, explain and then fix any bugs in this code:

code from Claude pasted here

... and aside from adding a couple of time.sleep() calls to work around timing errors with IAM policy distribution, everything worked!

Getting from a rough idea to a working proof of concept of something like this with less than 15 minutes of prompting is extraordinarily valuable.

This is exactly the kind of project I've avoided in the past because of my almost irrational intolerance of the frustration involved in figuring out the individual details of each call to S3, IAM, AWS Lambda and DynamoDB.

(Update: I just found out about the new S3 Metadata system which launched a few weeks ago and might solve this exact problem!)

Tags: aws, lambda, nosql, prototyping, s3, ai, generative-ai, chatgpt, llms, ai-assisted-programming, claude, uv, o3

How Discord Stores Trillions of Messages

2023-03-08T19:07:08+00:00

How Discord Stores Trillions of Messages

This is a really interesting case-study. Discord migrated from MongoDB to Cassandra back in 2016 to handle billions of messages. Today they're handling trillions, and they completed a migration from Cassandra to Scylla, a Cassandra-like data store written in C++ (as opposed to Cassandra's Java) to help avoid problems like GC pauses. In addition to being a really good scaling war story this has some interesting details about their increased usage of Rust. As a fan of request coalescing (which I've previously referred to as dogpile prevention) I particularly liked this bit:

Our data services sit between the API and our ScyllaDB clusters. They contain roughly one gRPC endpoint per database query and intentionally contain no business logic. The big feature our data services provide is request coalescing. If multiple users are requesting the same row at the same time, we’ll only query the database once. The first user that makes a request causes a worker task to spin up in the service. Subsequent requests will check for the existence of that task and subscribe to it. That worker task will query the database and return the row to all subscribers.

Via lobste.rs

Tags: cassandra, dogpile, nosql, scaling, rust, discord

How Discord Stores Billions of Messages

2021-08-24T21:31:36+00:00

How Discord Stores Billions of Messages

Fascinating article from 2017 describing how Discord migrated their primary message store to Cassandra (from MongoDB, but I could easily see them making the same decision if they had started with PostgreSQL or MySQL).

The trick with scalable NoSQL databases like Cassandra is that you need to have a very deep understanding of the kinds of queries you will need to answer - and Discord had exactly that.

In the article they talk about their desire to eventually migrate to Scylla (a compatible Cassandra alternative written in C++) - in the Hacker News comments they confirm that in 2021 they are using Scylla for a few things but they still have their core messages in Cassandra.

Via Hacker News

Tags: cassandra, nosql, scaling, discord

NoSQL: What is the "best" solution for storing high volumes of structured data?

2013-11-01T18:45:00+00:00

My answer to NoSQL: What is the "best" solution for storing high volumes of structured data? on Quora

On the right setup, PostgreSQL can handle petabytes. There are also commercial vendors such as Greenplum that offer data warehouse solutions built on a modified version of PostgreSQL.

You should also take a look at Hadoop and Hive. Hive lets you use a language based heavily on SQL to construct map reduce queries which can then run against a Hadoop cluster.

Tags: databases, nosql, postgresql, quora

How was FriendFeed's schema less db faster than pure MySQL?

2013-10-30T16:27:00+00:00

My answer to How was FriendFeed's schema less db faster than pure MySQL? on Quora

The principle reason they switched to a schemaless DB was to work around the challenges of having to make schemes changes in MySQL, which can lock the table and take hours if bit days to complete in large tables.

The performance improvement shown in the graph is almost certainly because they almost entirely eliminated joins and complex queries when they switched to the new mechanism. This meant that all it their database traffic was now simple queries, which have much more predictable performance characteristics. MySQL (in fact all databases) are extremely fast at primary key lookups and index scans.

Tags: databases, friendfeed, mysql, nosql, quora, rdbms

How could we using couchbase with binary document as value?

2013-10-20T15:17:00+00:00

My answer to How could we using couchbase with binary document as value? on Quora

There's a system called cbfs that acts as a distributed blobstore on top of Couchbase server - https://github.com/couchbaselabs... - it looks like it is currently under active development.

Tags: nosql, quora, couchbase

Any source available to download sample data (in 10+ GB) for testing?

2012-10-15T13:21:00+00:00

My answer to Any source available to download sample data (in 10+ GB) for testing? on Quora

Wikipedia has some pretty interesting dumps, in both XML and SQL format: http://meta.wikimedia.org/wiki/I...

It's pretty easy to generate 10GB of random data for testing though, which may be a better option as you could better approximate the kind of data your application will be dealing with. There's a neat Ruby module for doing this called Faker (itself a port of the Perl module of the same name): http://faker.rubyforge.org/ - and here's a Python port of the Ruby one: https://github.com/threadsafelab...

Tags: mysql, nosql, programming, web-development, quora, rdbms

NoSQL: Whats the simplest on disk key-value storage?

2012-10-04T15:15:00+00:00

My answer to NoSQL: Whats the simplest on disk key-value storage? on Quora

Surprisingly there doesn't seem to be an obvious answer to this. Here are a few options:

MemcacheDB provides a Berkeley DB storage layer with a memcache protocol compatible interface. http://memcachedb.org/ - it hasn't been updated since 2008 though.
Tokyo Cabinet used to be a contender here, but by its own admission has now been superseded by Kyoto Cabinet. I don't know how widely used or mature Kyoto Cabinet is: http://fallabs.com/kyotocabinet/
Google's leveldb library is an extremely fast, stable key-value storage library (it's used by Riak)... but on its own it doesn't come with a server. http://code.google.com/p/leveldb/ - there is a leverdb-server project on github that adds the server layer but it doesn't look like it's particularly mature or actively maintained: https://github.com/srinikom/leve...
It might be worth looking at Riak - it may be over-kill for what you need, but it's definitely actively maintained and has an excellent reputation: http://wiki.basho.com/

Tags: nosql, quora

What is the best NoSQL database to store unstructured data?

2012-02-11T15:06:00+00:00

My answer to What is the best NoSQL database to store unstructured data? on Quora

Any of the document stores are worth a look - I'd suggest investigating MongoDB, Riak and CouchDB.

Tags: nosql, quora

NoSQL: On a shared server, what are the alternatives to using SQL?

2012-02-04T18:57:00+00:00

My answer to NoSQL: On a shared server, what are the alternatives to using SQL? on Quora

You could probably run Redis on a shared server - it doesn't need to be installed as root, but it does require a process to run all the time which shared hosts may not allow.

There are a bunch of companies out there that offer hosted NoSQL as a service - https://mongohq.com/ for example. You could talk to those from shared hosting.

Your best bet though would be to upgrade from shared hosting to something more flexible. The bar for building an interesting web app has gone up, and you can get a basic VPS for $10/month - or use a platform like Heroku, ep.io or dotCloud if you don't want to have to manage your own server.

Tags: nosql, quora

Benchmarks for scalability in NoSQL systems?

2012-01-11T10:28:00+00:00

My answer to Benchmarks for scalability in NoSQL systems? on Quora

NoSQL systems are enormously varied which makes it hard (and not particularly constructive) to benchmark them against each other. How would you compare the performance of Redis, an in-memory data structure server, with Cassandra, a distributed redundant column store?

By far the most useful benchmarks are the ones you run yourself against your own test data designed to reflect the specific workload of your own application.

Tags: databases, nosql, quora

What are the best blogs about NoSQL?

2011-01-07T10:57:00+00:00

My answer to What are the best blogs about NoSQL? on Quora

myNoSQL is excellent: http://nosql.mypopescu.com/

Tags: nosql, quora

What are the pros and cons of switching from MySQL to one of the NoSQL databases?

2011-01-06T16:48:00+00:00

My answer to What are the pros and cons of switching from MySQL to one of the NoSQL databases? on Quora

Pro: If your own benchmarks tell you you need to switch to a specific NoSQL solution, you'll know exactly what the pro is.

Pro: If you're doing something that's hard to model in a regular schema you might find it easier to use a document database such as CouchDB or MongoDB.

Pro: Depending on how you approach the problem, you may find NoSQL makes schema modifications a LOT less painful than using a relational database.

Con: For many projects, losing out on the relational model is a big disadvantage. Most NoSQL solutions require you to design your data storage with your queries in mind. When you are building a product you often don't know what kind of queries you are going to run. This has bitten me with AppEngine projects in the past. See also Edmond Lau's answer to What did Marissa Mayer mean when she said that Orkut failed because of "infrastructure issues"?

For my money, the smart way of taking advantage of NoSQL is in conjunction with a relational engine. Use a regular database for your core data, but take advantage of Redis or MongoDB for things like counters, smart caches, rolling log storage etc. Polyglot persistence is the way to go.

Tags: mysql, nosql, quora

What are the advantages and disadvantages of using MongoDB vs CouchDB vs Cassandra vs Redis?

2010-12-01T12:54:00+00:00

My answer to What are the advantages and disadvantages of using MongoDB vs CouchDB vs Cassandra vs Redis? on Quora

I see Redis as a different category from the other three - kind of like you wouldn't say "what are the advantages of MySQL v.s. Memcached". Redis makes an excellent complement to pretty much any other persistent storage mechanism. I expanded on this here: http://simonwillison.net/2009/Oc...

Tags: cassandra, couchdb, mongodb, nosql, redis, quora

Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server

2010-10-27T23:10:00+00:00

Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server

Very interesting approach: much of the speed difference between MySQL/InnoDB and memcached is due to the overhead involved in parsing and processing SQL, so the team at DeNA wrote their own MySQL plugin, HandlerSocket, which exposes a NoSQL-style network protocol for directly calling the low level MySQL storage engine APIs—resulting in a 7.5x performance increase.

Tags: mysql, nosql, scaling, recovered

Will Redis support per-database persistence configuration?

2010-09-27T10:37:00+00:00

My answer to Will Redis support per-database persistence configuration? on Quora

I don't know if that's on the roadmap (you'd need to ask antirez on the mailing list or Twitter), but it should be easy enough to run multiple Redis instances with different settings - especially on a multi core machine.

Tags: nosql, programming, redis, software-engineering, quora

What is the largest production deployment of CouchDB for online use?

2010-08-25T09:23:00+00:00

My answer to What is the largest production deployment of CouchDB for online use? on Quora

The BBC have a pretty big CouchDB cluster, which they use mostly as a replicated key-value store. It's used by their new identity platform which includes customisation features for iPlayer.

Tags: couchdb, databases, nosql, scaling, quora

reddit's May 2010 "State of the Servers" report

2010-05-18T18:37:00+00:00

reddit's May 2010 "State of the Servers" report

An interesting Cassandra war story: Cassandra scales up, but it doesn’t scale down very well: running with just three nodes can make recovery from problems a lot more tricky.

Tags: cassandra, nosql, reddit, recovered

Comprehensive notes from my three hour Redis tutorial

2010-04-25T22:36:16+00:00

Last week I presented two talks at the inaugural NoSQL Europe conference in London. The first was presented with Matthew Wall and covered the ways in which we have been exploring NoSQL at the Guardian. The second was a three hour workshop on Redis, my favourite piece of software to have the NoSQL label applied to it.

I've written about Redis here before, and it has since earned a place next to MySQL/PostgreSQL and memcached as part of my default web application stack. Redis makes write-heavy features such as real-time statistics feasible for small applications, while effortlessly scaling up to handle larger projects as well. If you haven't tried it out yet, you're sorely missing out.

For the workshop, I tried to give an overview of each individual Redis feature along with detailed examples of real-world problems that the feature can help solve. I spent the past day annotating each slide with detailed notes, and I think the result makes a pretty good stand-alone tutorial. Here's the end result:

Redis tutorial slides and notes

In unrelated news, Nat and I both completed the first ever Brighton Marathon last weekend, in my case taking 4 hours, 55 minutes and 17 seconds. Sincere thanks to everyone who came out to support us - until the race I had never appreciated how important the support of the spectators is to keep going to the end. We raised £757 for the Have a Heart children's charity. Thanks in particular to Clearleft who kindly offered to match every donation.

Tags: brightonmarathon, guardian, marathon, nosql, redis, running, my-talks, highlights, annotated-talks

Redis weekly update #3 - Pub/Sub and more

2010-03-30T15:15:39+00:00

Redis weekly update #3 - Pub/Sub and more

Redis is now a publish/subscribe server—and it ended up only taking 150 lines of C code since Redis internals were already based on that paradigm.

Tags: c, nosql, pubsub, redis

VMware: the new Redis home

2010-03-16T11:26:22+00:00

VMware: the new Redis home

Redis creator Salvatore Sanfilippo is joining VMWare to work on Redis full time. Sounds like a good match.

Tags: keyvaluestores, nosql, redis, salvatore-sanfilippo, vmware

Redis weekly update #1 - Hashes and... many more!

2010-03-13T00:06:22+00:00

Redis weekly update #1 - Hashes and... many more!

Hashes were the big missing data type in Redis—support is only partial at the moment (no ability to list all keys in a hash or delete a specific key) but at the rate Redis is developed I expect that to be fixed within a week or two.

Tags: hashes, keyvaluestores, nosql, redis

A Collection Of Redis Use Cases

2010-02-16T15:04:19+00:00

A Collection Of Redis Use Cases

Lots of interesting case studies here, collated by Mathias Meyer. Redis clearly shines for anything involving statistics or high volumes of small writes.

Tags: mathiasmeyer, nosql, redis

FleetDB

2010-01-05T11:21:35+00:00

FleetDB

Yet Another Key-Value Store: Schema-free, JSON protocol, everything cached in RAM, append-only log for durability, multi-record transactions... but what’s really interesting about this one is that it’s written in Clojure and takes full advantage of that language’s concurrency primitives. The prefix operators used by the select API hint at its Lisp heritage.

Via Hacker News

Tags: clojure, databases, fleetdb, keyvaluestore, lisp, nosql

New Redis ZINCRBY command

2009-12-22T20:38:25+00:00

New Redis ZINCRBY command

Just added to Redis, a command which increments the “score” for an item in a sorted set and reorders the set to reflect the new scores. Looks ideally suited to real time stats, and I’m sure there are plenty of other exciting uses for it.

Via antirez

Tags: nosql, redis, salvatore-sanfilippo, sortedsets, zincrby

Crowdsourced document analysis and MP expenses

2009-12-20T12:07:53+00:00

As you may have heard, the UK government released a fresh batch of MP expenses documents a week ago on Thursday. I spent that week working with a small team at Guardian HQ to prepare for the release. Here's what we built:

http://mps-expenses2.guardian.co.uk/ Updated March 2021: all links now go to the Internet Archive

It's a crowdsourcing application that asks the public to help us dig through and categorise the enormous stack of documents - around 30,000 pages of claim forms, scanned receipts and hand-written letters, all scanned and published as PDFs.

This is the second time we've tried this - the first was back in June, and can be seen at mps-expenses.guardian.co.uk. Last week's attempt was an opportunity to apply the lessons we learnt the first time round.

Writing crowdsourcing applications in a newspaper environment is a fascinating challenge. Projects have very little notice - I heard about the new document release the Thursday before giving less than a week to put everything together. In addition to the fast turnaround for the application itself, the 48 hours following the release are crucial. The news cycle moves fast, so if the application launches but we don't manage to get useful data out of it quickly the story will move on before we can impact it.

ScaleCamp on the Friday meant that development work didn't properly kick off until Monday morning. The bulk of the work was performed by two server-side developers, one client-side developer, one designer and one QA on Monday, Tuesday and Wednesday. The Guardian operations team deftly handled our EC2 configuration and deployment, and we had some extra help on the day from other members of the technology department. After launch we also had a number of journalists helping highlight discoveries and dig through submissions.

The system was written using Django, MySQL (InnoDB), Redis and memcached.

Asking the right question

The biggest mistake we made the first time round was that we asked the wrong question. We tried to get our audience to categorise documents as either "claims" or "receipts" and to rank them as "not interesting", "a bit interesting", "interesting but already known" and "someone should investigate this". We also asked users to optionally enter any numbers they saw on the page as categorised "line items", with the intention of adding these up later.

The line items, with hindsight, were a mistake. 400,000 documents makes for a huge amount of data entry and for the figures to be useful we would need to confirm their accuracy. This would mean yet more rounds of crowdsourcing, and the job was so large that the chance of getting even one person to enter line items for each page rapidly diminished as the news story grew less prominent.

The categorisations worked reasonably well but weren't particularly interesting - knowing if a document is a claim or receipt is useful only if you're going to collect line items. The "investigate this" button worked very well though.

We completely changed our approach for the new system. We dropped the line item task and instead asked our users to categories each page by applying one or more tags, from a small set that our editors could control. This gave us a lot more flexibility - we changed the tags shortly before launch based on the characteristics of the documents - and had the potential to be a lot more fun as well. I'm particularly fond of the "hand-written" tag, which has highlighted some lovely examples of correspondence between MPs and the expenses office.

Sticking to an editorially assigned set of tags provided a powerful tool for directing people's investigations, and also ensured our users didn't start creating potentially libelous tags of their own.

Breaking it up in to assignments

For the first project, everyone worked together on the same task to review all of the documents. This worked fine while the document set was small, but once we had loaded in 400,000+ pages the progress bar become quite depressing.

This time round, we added a new concept of "assignments". Each assignment consisted of the set of pages belonging to a specified list of MPs, documents or political parties. Assignments had a threshold, so we could specify that a page must be reviewed by at least X people before it was considered reviewed. An editorial tool let us feature one "main" assignment and several alternative assignments right on the homepage.

Clicking "start reviewing" on an assignment sets a cookie for that assignment, and adds the assignment's progress bar to the top of the review interface. New pages are selected at random from the set of unreviewed pages in that assignment.

The assignments system proved extremely effective. We could use it to direct people to the highest value documents (our top hit list of interesting MPs, or members of the shadow cabinet) while still allowing people with specific interests to pick an alternative task.

Get the button right!

Having run two crowdsourcing projects I can tell you this: the single most important piece of code you will write is the code that gives someone something new to review. Both of our projects had big "start reviewing" buttons. Both were broken in different ways.

The first time round, the mistakes were around scalability. I used a SQL "ORDER BY RAND()" statement to return the next page to review. I knew this was an inefficient operation, but I assumed that it wouldn't matter since the button would only be clicked occasionally.

Something like 90% of our database load turned out to be caused by that one SQL statement, and it only got worse as we loaded more pages in to the system. This caused multiple site slow downs and crashes until we threw together a cron job that pushed 1,000 unreviewed page IDs in to memcached and made the button pick one of those at random.

This solved the performance problem, but meant that our user activity wasn't nearly as well targeted. For optimum efficiency you really want everyone to be looking at a different page - and a random distribution is almost certainly the easiest way to achieve that.

The second time round I turned to my new favourite in-memory data structure server, redis, and its SRANDMEMBER command (a feature I requested a while ago with this exact kind of project in mind). The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the SDIFFSTORE command) to create a set of unreviewed pages for the current assignment and then SRANDMEMBER to pick one of those pages.

This is where the bug crept in. Redis was just being used as an optimisation - the single point of truth for whether a page had been reviewed or not stayed as MySQL. I wrote a couple of Django management commands to repopulate the denormalised Redis sets should we need to manually modify the database. Unfortunately I missed some - the sets that tracked what pages were available in each document. The assignment generation code used an intersection of these sets to create the overall set of documents for that assignment. When we deleted some pages that had accidentally been imported twice I failed to update those sets.

This meant the "next page" button would occasionally turn up a page that didn't exist. I had some very poorly considered fallback logic for that - if the random page didn't exist, the system would return the first page in that assignment instead. Unfortunately, this meant that when the assignment was down to the last four non-existent pages every single user was directed to the same page - which subsequently attracted well over a thousand individual reviews.

Next time, I'm going to try and make the "next" button completely bullet proof! I'm also going to maintain a "denormalisation dictionary" documenting every denormalisation in the system in detail - such a thing would have saved me several hours of confused debugging.

Exposing the results

The biggest mistake I made last time was not getting the data back out again fast enough for our reporters to effectively use it. It took 24 hours from the launch of the application to the moment the first reporting feature was added - mainly because we spent much of the intervening time figuring out the scaling issues.

This time we handled this a lot better. We provided private pages exposing all recent activity on the site. We also provided public pages for each of the tags, as well as combination pages for party + tag, MP + tag, document + tag, assignment + tag and user + tag. Most of these pages were ordered by most-tagged, with the hope that the most interesting pages would quickly bubble to the top.

This worked pretty well, but we made one key mistake. The way we were ordering pages meant that it was almost impossible to paginate through them and be sure that you had seen everything under a specific tag. If you're trying to keep track of everything going on in the site, reliable pagination is essential. The only way to get reliable pagination on a fast moving site is to order by the date something was first added to a set in ascending order. That way you can work through all of the pages, wait a bit, hit "refresh" and be able to continue paginating where you left off. Any other order results in the content of each page changing as new content comes in.

We eventually added an undocumented /in-order/ URL prefix to address this issue. Next time I'll pay a lot more attention to getting the pagination options right from the start.

Rewarding our contributors

The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn't want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time.

For the new version, we tried to provide a much better feeling of activity around the site. We added "top reviewer" tables to every assignment, MP and political party as well as a "most active reviewers in the past 48 hours" table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.

Most importantly, we added a concept of discoveries - editorially highlighted pages that were shown on the homepage and credited to the user that had first highlighted them. These discoveries also added valuable editorial interest to the site, showing up on the homepage and also the index pages for political parties and individual MPs.

Light-weight registration

For both projects, we implemented an extremely light-weight form of registration. Users can start reviewing pages without going through any signup mechanism, and instead are assigned a cookie and an anon-454 style username the first time they review a document. They are then encouraged to assign themselves a proper username and password so they can log in later and take credit for their discoveries.

It's difficult to tell how effective this approach really is. I have a strong hunch that it dramatically increases the number of people who review at least one document, but without a formal A/B test it's hard to tell how true that is. The UI for this process in the first project was quite confusing - we gave it a solid makeover the second time round, which seems to have resulted in a higher number of conversions.

Overall lessons

News-based crowdsourcing projects of this nature are both challenging and an enormous amount of fun. For the best chances of success, be sure to ask the right question, ensure user contributions are rewarded, expose as much data as possible and make the "next thing to review" behaviour rock solid. I'm looking forward to the next opportunity to apply these lessons, although at this point I really hope it involves something other than MPs' expenses.

Tags: crowdsourcing, django, guardian, innodb, memcached, mpsexpenses, mysql, nosql, politics, projects, python, redis

Node.js is genuinely exciting

2009-11-23T12:50:22+00:00

I gave a talk on Friday at Full Frontal, a new one day JavaScript conference in my home town of Brighton. I ended up throwing away my intended topic (JSONP, APIs and cross-domain security) three days before the event in favour of a technology which first crossed my radar less than two weeks ago.

That technology is Ryan Dahl's Node. It's the most exciting new project I've come across in quite a while.

At first glance, Node looks like yet another take on the idea of server-side JavaScript, but it's a lot more interesting than that. It builds on JavaScript's excellent support for event-based programming and uses it to create something that truly plays to the strengths of the language.

Node describes itself as "evented I/O for V8 javascript". It's a toolkit for writing extremely high performance non-blocking event driven network servers in JavaScript. Think similar to Twisted or EventMachine but for JavaScript instead of Python or Ruby.

Evented I/O?

As I discussed in my talk, event driven servers are a powerful alternative to the threading / blocking mechanism used by most popular server-side programming frameworks. Typical frameworks can only handle a small number of requests simultaneously, dictated by the number of server threads or processes available. Long-running operations can tie up one of those threads - enough long running operations at once and the server runs out of available threads and becomes unresponsive. For large amounts of traffic, each request must be handled as quickly as possible to free the thread up to deal with the next in line.

This makes certain functionality extremely difficult to support. Examples include handling large file uploads, combining resources from multiple backend web APIs (which themselves can take an unpredictable amount of time to respond) or providing comet functionality by holding open the connection until a new event becomes available.

Event driven programming takes advantage of the fact that network servers spend most of their time waiting for I/O operations to complete. Operations against in-memory data are incredibly fast, but anything that involves talking to the filesystem or over a network inevitably involves waiting around for a response.

With Twisted, EventMachine and Node, the solution lies in specifying I/O operations in conjunction with callbacks. A single event loop rapidly switches between a list of tasks, firing off I/O operations and then moving on to service the next request. When the I/O returns, execution of that particular request is picked up again.

(In the talk, I attempted to illustrate this with a questionable metaphor involving hamsters, bunnies and a hyperactive squid).

What makes Node exciting?

If systems like this already exist, what's so exciting about Node? Quite a few things:

JavaScript is extremely well suited to programming with callbacks. Its anonymous function syntax and closure support is perfect for defining inline callbacks, and client-side development in general uses event-based programming as a matter of course: run this function when the user clicks here / when the Ajax response returns / when the page loads. JavaScript programmers already understand how to build software in this way.
Node represents a clean slate. Twisted and EventMachine are hampered by the existence of a large number of blocking libraries for their respective languages. Part of the difficulty in learning those technologies is understanding which Python or Ruby libraries you can use and which ones you have to avoid. Node creator Ryan Dahl has a stated aim for Node to never provide a blocking API - even filesystem access and DNS lookups are catered for with non-blocking callback based APIs. This makes it much, much harder to screw things up.
Node is small. I read through the API documentation in around half an hour and felt like I had a pretty comprehensive idea of what Node does and how I would achieve things with it.
Node is fast. V8 is the fast and keeps getting faster. Node's event loop uses Marc Lehmann's highly regarded libev and libeio libraries. Ryan Dahl is himself something of a speed demon - he just replaced Node's HTTP parser implementation (already pretty speedy due to it's Ragel / Mongrel heritage) with a hand-tuned C implementation with some impressive characteristics.
Easy to get started. Node ships with all of its dependencies, and compiles cleanly on Snow Leopard out of the box.

With both my JavaScript and server-side hats on, Node just feels right. The APIs make sense, it fits a clear niche and despite its youth (the project started in February) everything feels solid and well constructed. The rapidly growing community is further indication that Ryan is on to something great here.

What does Node look like?

Here's how to get Hello World running in Node in 7 easy steps:

git clone git://github.com/ry/node.git (or download and extract a tarball)
./configure
make (takes a while, it needs to compile V8 as well)
sudo make install
Save the below code as helloworld.js
node helloworld.js
Visit http://localhost:8080/ in your browser

Here's helloworld.js:

var sys = require('sys'), 
  http = require('http');

http.createServer(function(req, res) {
  res.sendHeader(200, {'Content-Type': 'text/html'});
  res.sendBody('<h1>Hello World</h1>');
  res.finish();
}).listen(8080);

sys.puts('Server running at http://127.0.0.1:8080/');

If you have Apache Bench installed, try running ab -n 1000 -c 100 'http://127.0.0.1:8080/' to test it with 1000 requests using 100 concurrent connections. On my MacBook Pro I get 3374 requests a second.

So Node is fast - but where it really shines is concurrency with long running requests. Alter the helloworld.js server definition to look like this:

http.createServer(function(req, res) {
  setTimeout(function() {
    res.sendHeader(200, {'Content-Type': 'text/html'});
    res.sendBody('<h1>Hello World</h1>');
    res.finish();
  }, 2000);
}).listen(8080);

We're using setTimeout to introduce an artificial two second delay to each request. Run the benchmark again - I get 49.68 requests a second, with every single request taking between 2012 and 2022 ms. With a two second delay, the best possible performance for 1000 requests 100 at a time is 1000 requests / (1000 / 100) * 2 seconds = 50 requests a second. Node hits it pretty much bang on the nose.

The most important line in the above examples is res.finish(). This is the mechanism Node provides for explicitly signalling that a request has been fully processed and should be returned to the browser. By making it explicit, Node makes it easy to implement comet patterns like long polling and streaming responses - stuff that is decidedly non trivial in most server-side frameworks.

djangode

Node's core APIs are pretty low level - it has HTTP client and server libraries, DNS handling, asynchronous file I/O etc, but it doesn't give you much in the way of high level web framework APIs. Unsurprisingly, this has lead to a cambrian explosion of lightweight web frameworks based on top of Node - the projects using node page lists a bunch of them. Rolling a framework is a great way of learning a low-level API, so I've thrown together my own - djangode - which brings Django's regex-based URL handling to Node along with a few handy utility functions. Here's a simple djangode application:

var dj = require('./djangode');

var app = dj.makeApp([
  ['^/$', function(req, res) {
    dj.respond(res, 'Homepage');
  }],
  ['^/other$', function(req, res) {
    dj.respond(res, 'Other page');
  }],
  ['^/page/(\\d+)$', function(req, res, page) {
    dj.respond(res, 'Page ' + page);
  }]
]);
dj.serve(app, 8008);

djangode is currently a throwaway prototype, but I'll probably be extending it with extra functionality as I explore more Node related ideas.

nodecast

My main demo in the Full Frontal talk was nodecast, an extremely simple broadcast-oriented comet application. Broadcast is my favourite "hello world" example for comet because it's both simpler than chat and more realistic - I've been involved in plenty of projects that could benefit from being able to broadcast events to their audience, but few that needed an interactive chat room.

The source code for the version I demoed can be found on GitHub in the no-redis branch. It's a very simple application - the client-side JavaScript simply uses jQuery's getJSON method to perform long-polling against a simple URL endpoint:

function fetchLatest() {
  $.getJSON('/wait?id=' + last_seen, function(d) {
    $.each(d, function() {
      last_seen = parseInt(this.id, 10) + 1;
      ul.prepend($('<li></li>').text(this.text));
    });
    fetchLatest();
  });
}

Doing this recursively is probably a bad idea since it will eventually blow the browser's JavaScript stack, but it works OK for the demo.

The more interesting part is the server-side /wait URL which is being polled. Here's the relevant Node/djangode code:

var message_queue = new process.EventEmitter();

var app = dj.makeApp([
  // ...
  ['^/wait$', function(req, res) {
    var id = req.uri.params.id || 0;
    var messages = getMessagesSince(id);
    if (messages.length) {
      dj.respond(res, JSON.stringify(messages), 'text/plain');
    } else {
      // Wait for the next message
      var listener = message_queue.addListener('message', function() {
        dj.respond(res, 
          JSON.stringify(getMessagesSince(id)), 'text/plain'
        );
        message_queue.removeListener('message', listener);
        clearTimeout(timeout);
      });
      var timeout = setTimeout(function() {
        message_queue.removeListener('message', listener);
        dj.respond(res, JSON.stringify([]), 'text/plain');
      }, 10000);
    }
  }]
  // ...
]);

The wait endpoint checks for new messages and, if any exist, returns immediately. If there are no new messages it does two things: it hooks up a listener on the message_queue EventEmitter (Node's equivalent of jQuery/YUI/Prototype's custom events) which will respond and end the request when a new message becomes available, and also sets a timeout that will cancel the listener and end the request after 10 seconds. This ensures that long polls don't go on too long and potentially cause problems - as far as the browser is concerned it's just talking to a JSON resource which takes up to ten seconds to load.

When a message does become available, calling message_queue.emit('message') will cause all waiting requests to respond with the latest set of messages.

Talking to databases

nodecast keeps track of messages using an in-memory JavaScript array, which works fine until you restart the server and lose everything. How do you implement persistent storage?

For the moment, the easiest answer lies with the NoSQL ecosystem. Node's focus on non-blocking I/O makes it hard (but not impossible) to hook it up to regular database client libraries. Instead, it strongly favours databases that speak simple protocols over a TCP/IP socket - or even better, databases that communicate over HTTP. So far I've tried using CouchDB (with node-couch) and redis (with redis-node-client), and both worked extremely well. nodecast trunk now uses redis to store the message queue, and provides a nice example of working with a callback-based non-blocking database interface:

var db = redis.create_client();
var REDIS_KEY = 'nodecast-queue';

function addMessage(msg, callback) {
  db.llen(REDIS_KEY, function(i) {
    msg.id = i; // ID is set to the queue length
    db.rpush(REDIS_KEY, JSON.stringify(msg), function() {
      message_queue.emit('message', msg);
      callback(msg);
    });
  });
}

Relational databases are coming to Node. Ryan has a PostgreSQL adapter in the works, thanks to that database already featuring a mature non-blocking client library. MySQL will be a bit tougher - Node will need to grow a separate thread pool to integrate with the official client libs - but you can talk to MySQL right now by dropping in DBSlayer from the NY Times which provides an HTTP interface to a pool of MySQL servers.

Mixed environments

I don't see myself switching all of my server-side development over to JavaScript, but Node has definitely earned a place in my toolbox. It shouldn't be at all hard to mix Node in to an existing server-side environment - either by running both behind a single HTTP proxy (being event-based itself, nginx would be an obvious fit) or by putting Node applications on a separate subdomain. Node is a tempting option for anything involving comet, file uploads or even just mashing together potentially slow loading web APIs. Expect to hear a lot more about it in the future.

Quoting Matt Brubeck

2009-10-04T09:50:33+00:00

When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it's very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.

— Matt Brubeck

Tags: amazon, matt-brubeck, nosql, reliability, replication, scaling, sql

Looking to the future with Cassandra

2009-09-09T21:26:52+00:00

Looking to the future with Cassandra

Digg are now using Cassandra for their “green badge” (one of your friends have dugg this story) feature—the resulting denormalised dataset weighs in at 3 TB and 76 billion columns.

Tags: cassandra, denormalisation, digg, nosql

Simon Willison's Weblog: nosql

Using S3 triggers to maintain a list of files in DynamoDB

How Discord Stores Trillions of Messages

How Discord Stores Billions of Messages

NoSQL: What is the "best" solution for storing high volumes of structured data?

How was FriendFeed's schema less db faster than pure MySQL?

How could we using couchbase with binary document as value?

Any source available to download sample data (in 10+ GB) for testing?

NoSQL: Whats the simplest on disk key-value storage?

What is the best NoSQL database to store unstructured data?

NoSQL: On a shared server, what are the alternatives to using SQL?

Benchmarks for scalability in NoSQL systems?

What are the best blogs about NoSQL?

What are the pros and cons of switching from MySQL to one of the NoSQL databases?

What are the advantages and disadvantages of using MongoDB vs CouchDB vs Cassandra vs Redis?

Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server

Will Redis support per-database persistence configuration?

What is the largest production deployment of CouchDB for online use?

reddit's May 2010 "State of the Servers" report

Comprehensive notes from my three hour Redis tutorial

Redis weekly update #3 - Pub/Sub and more

VMware: the new Redis home

Redis weekly update #1 - Hashes and... many more!

A Collection Of Redis Use Cases

FleetDB

New Redis ZINCRBY command

Crowdsourced document analysis and MP expenses

Asking the right question

Breaking it up in to assignments

Get the button right!

Exposing the results

Rewarding our contributors

Light-weight registration

Overall lessons

Node.js is genuinely exciting

Evented I/O?

What makes Node exciting?

What does Node look like?

djangode

nodecast

Talking to databases

Mixed environments

Further reading

Quoting Matt Brubeck

Looking to the future with Cassandra