Simon Willison's Weblog: rdbms

How was FriendFeed's schema less db faster than pure MySQL?

2013-10-30T16:27:00+00:00

My answer to How was FriendFeed's schema less db faster than pure MySQL? on Quora

The principle reason they switched to a schemaless DB was to work around the challenges of having to make schemes changes in MySQL, which can lock the table and take hours if bit days to complete in large tables.

The performance improvement shown in the graph is almost certainly because they almost entirely eliminated joins and complex queries when they switched to the new mechanism. This meant that all it their database traffic was now simple queries, which have much more predictable performance characteristics. MySQL (in fact all databases) are extremely fast at primary key lookups and index scans.

Tags: databases, friendfeed, mysql, nosql, quora, rdbms

What tools and techniques are used for relational database version control (structure and data)?

2012-12-24T12:29:00+00:00

My answer to What tools and techniques are used for relational database version control (structure and data)? on Quora

The term you are looking for is database migrations (sometimes called database change scripts).

The basic concept is pretty straight forward: you set up a table in the database that records which change scripts have already been applied. When you need to make a change (adding a column, adding a table, denormalising some data for performance reasons, adding an index etc) you write a change script that applies the change - in raw SQL or in another programming language, depending on how your migration system is set up.

These change scripts (let's call them migrations from here) are numbered so they can be applied in the correct order. Then you run a command which checks for scripts that have not yet been applied and runs them in the correct order - then records that they have been run to the relevant database table.

The setup I've described above is a pretty good start. Some systems let you have reversible migrations: each migration includes instructions for reversing its effect (removing the index that was added, moving data back to its old location) which lets you run a command to revert back to a previous database state. In practise this is a nice-to-have but not essential: many migrations are by their nature irreversible, but it can make development faster if you can easily try out and then revert a database structure change within your development environment.

Really clever migration systems can even introspect your database, figure out what has changed and attempt to generate the migration scripts automatically! South, the most popular migration system for Django, does this with surprisingly good results for many cases.

If you're interested in learning more, it's worth reading through the South documentation: http://south.readthedocs.org/en/...

Tags: databases, mysql, oracle, postgresql, sql, quora, rdbms

Any source available to download sample data (in 10+ GB) for testing?

2012-10-15T13:21:00+00:00

My answer to Any source available to download sample data (in 10+ GB) for testing? on Quora

Wikipedia has some pretty interesting dumps, in both XML and SQL format: http://meta.wikimedia.org/wiki/I...

It's pretty easy to generate 10GB of random data for testing though, which may be a better option as you could better approximate the kind of data your application will be dealing with. There's a neat Ruby module for doing this called Faker (itself a port of the Perl module of the same name): http://faker.rubyforge.org/ - and here's a Python port of the Ruby one: https://github.com/threadsafelab...

Tags: mysql, nosql, programming, web-development, quora, rdbms

Is a relational database with many-to-many relationships difficult to develop into a web app?

2011-02-08T18:28:00+00:00

My answer to Is a relational database with many-to-many relationships difficult to develop into a web app? on Quora

Many to Many tables can be a bit of a pain to deal with using regular SQL, but a good ORM can abstract away any potential complexity almost entirely. I find using the Django ORM means I'm much less likely to shy away from a design that involves a many-to-many relationship because I know it won't increase the complexity of the application. I imagine the Rails ORM has the same effect.

Tags: databases, google, mysql, webapps, quora, rdbms