Why you think databases don't scale - In a few words: you’re doing it wrong!

wanderingmarker · on April 11, 2008

The author makes a decent argument, but I don't think he addresses the entirety of the issue.

Scaling a database has two components: reliability, and performance under heavy load. Accomplishing these goals typically requires some form of redundancy, but few RDBMSes give developers and administrators any tools to accomplish anything except dumb read-only replicas. Even Oracle RAC, a high-end commercial product, provides no data redundancy; it just lets multiple database servers talk to the same physical database. This means that, with lots of money to throw at RAC backed by a SAN, not to mention a team of admins to run the whole thing, you can scale up and not worry about data loss. Hopefully.

A team which doesn't have 1M USD a year to spend on storage software, hardware, and support, but still must store large datasets, must spend a lot of time partitioning the data and working around the problems inherent in querying it out of multiple database servers. And frankly, that stinks. That's why I personally think RDBMSes don't scale: because, as a tool, they utterly fail to save me time and effort in achieving scalability.

I'm very impressed with how fragmented Mnesia tables work: you tell Mnesia how many replicas and fragments you want, add enough nodes to fulfill your requirements, and it takes care of pushing enough copies of your data to enough nodes for things to fly. That's how a database should work these days.

I agree with the author that simplifying data access and reducing the number of ridiculous joins (I've worked on an app which normalized its data to the point that it needed an 8-way join before anything would work at all) is important. However, even once you do that, operating on anything where all the code which talks to the database has to traverse the entire cluster and aggregate results at the application level is not good. That's one reason I think GAE is quite attractive.

Let's see if MySQL 5.1 brings any improvements to the table, and if they come with strings attached.

astrec · on April 11, 2008

To be fair, Oracle RAC is not a complete solution and doesn't really pretend to be. For HA you really need RAC and Dataguard.

Btw, I'm quite the Mnesia fan too.

hendler · on April 11, 2008

Great topic, I think.

Scaling has a lot of non-technical components - often people doing it wrong. That's true, sometimes I don't have time to do it right - creating a demand for an out of the box solution.

Currently I'm experimenting with http://vertica.com 's VerticaDB. It's easy to install, uses regular SQL and other RDBMS concepts, has some newer techniques for optimization out of the box (compression, joins, etc). I can't speak to scalability, but since it's designed for just these issues I'm looking for, I have high hopes.

I also looked into Hadoop/HBase - and there's promise there. Other DHTs are known to have issues with certain kinds of work.

Hadn't heard of Mnesia.