Sunday, January 17, 2010

MongoDB

Gloria Jacobs told me about MongoDB at PyOhio, but I was too busy conference-chairing to see her talk, and time has flown by. Her enthusiasm prompted me to see Mike Dirolf's MongoDB talk at CodeMash, though, and wow. Thanks, Gloria and Mike! I like it, I really like it!

My frustrating experience with BigTable had given me a "Bah, humbug!" attitude toward the NoSQL fad, but it really looks like MongoDB is the cure for that. It surrenders much less query capability than the other NoSQL contenders do. The simplest of those are useful only if you already have the key for your desired record in hand, and BigTable's limitations make it feel only moderately better to me. But MongoDB's query capabilities are really rich, good enough for many (though of course not all) real query needs.

Now, don't get me wrong; there are a *whole lot* of tasks for which an RDMBS is still very much the answer. When you need transactions, or child items that aren't tucked neatly under single parents, or complex queries - and how often do you really *know* that you'll never need complex queries? - it's safer to use an RDMBS.

I think that, when the database is an enduring construct, important in itself - when multiple applications may be written against it, and new applications yet unforeseen may appear in the future - then a good RDBMS is the only way to go. In such cases, it's just impossible to safely predict what you'll need to do with the data one day, so you need database software that can do virtually anything.

But when the database will play a supporting role to a single, well-defined application, and will not outlive the application. then a non-relational database could be very convenient, and MongoDB looks to me like a fantastic choice.

Let's call this Devlin's Doggy Directive of Databases:
If the application is the dog, and the database is the tail, consider a non-relational database.
If the database is the dog, and the application is the tail, stick with a relational database.
If you doubt that I'm qualified to go naming rules of thumb after myself, let me remind you that have ten years of relational database experience, a sparse smattering of non-relational experience along the way, and that my parents owned a boarding kennel when I was young.

2 comments:

Unknown said...

Good post and I really like your ROT, although I might be a little more extreme:

If the application is the dog, and the data is the tail, consider a non-relational database.
If the data is the dog, and the application is the tail, stick with a relational database.

schmichael said...

Good post, but I use MongoDB because the database of my application is the tail.

RDBMSes are great at querying complexly related data, but all I want to store is user profiles and a few other small bits of data.

PyMongo is thread-safe, has it's own thread pooling, and it's API is incredibly intuitive and easy to use. The amount of code between me & the database is tiny compared to using an ORM on top of a RDBMS.

The amount of boiler plate code needed to use PyMongo is so small it barely needs wrapping:


import pymongo
db = pymongo.Connection()[database_name]
user = db.users.find_one({'username':username})
# user is now a nice dict containing everything
# from complex permissions to first & last name


RDBMSes are great if you have highly relational data and complex or dynamic querying, but I hope the idea that they should be the default option for new applications dies off. It's like everyone driving an SUV even if all they need is moped.