CouchDB and ORMs

Alex did a good introduction talk to CouchDB at Scotland on Rails. Towards the end of the talk he did an overview of the current ruby plugins/gems available for interfacing with CouchDB, one of which was my own CouchFoo. Alex's opinion was that any ORM for CouchDB should be as thin as possible just wrapping the Ruby to JSON object translation. I raised my opinion in the question section at the end by saying that I didn't agree and thought the ORM should map the level of functionality available in ActiveRecord. This sparked a debate both in the talk and via Twitter of the best approach for an ORM for CouchDB to take. As a result I agreed to write this blog post to outline my views.

CouchDB is a document orientated database with a HTTP interface amongst other features. When I first started using it I played with the database a lot via simple interactions through CURL. In the same way I feel it is important to know SQL before using any higher level API to store and retrieve objects in a relational database, I feel it is important to understand how CouchDB works before using a library to interact with it. As with most areas of computing you will find a range of opinions over what level you interact with the database - there are the purists who like to write SQL queries for each database query performed and those who are willing to sacrifice a bit of performance (maybe not having the optimum query run each time) for the time efficiencies realized whilst developing. I align quite well with the Rails mantra on this one - I'm willing to sacrifice perfect SQL each time for the efficiency gains made whilst developing. Part of Alex's argument was that you should be as close to the database as possible because the Ruby to JSON conversion is much less than the Ruby to SQL conversion. Whilst I don't disagree that it's important to know how CouchDB works, I do disagree on the level at which any Ruby library should sit. I'm happy to pay a small price in terms of extra ruby code executed because I want as clean as DSL as possible.

Whilst developing CouchDB I tried all the existing ruby libraries and as I worked through them I ran into several issues. After using ActiveRecord's save and find methods it was particularly annoying to use a library that used different method names for the same conceptual operations (eg get instead of find). This wasn't a major issue of course I just forked the library and made changes. But as time went on there were features that I missed from ActiveRecord. Validations, callbacks, finders and associations were the prime contenders. Then dynamic finders and named scopes got added to the list. In the end changing the existing libraries became so much work I decided to start with ActiveRecord and work from there.

Of the features in ActiveRecord Associations are perhaps the most controversial on whether they should apply to Document orientated databases or not. The argument goes that if you're trying to use associations you don't understand how CouchDB should be used. I disagree on this point - a simple counter argument is presented by having a document that allows comments. Those comments could be stored inline in the document itself or in separate documents that have a reference to their parent. This is association whichever way you look at it. Which approach you decide to use will depend on your application and the characteristics of it. Incidentally Alex's gem did a great job of this letting the user specify in the association whether they wanted the object stored inline or not. This has since been removed from his gem but is something that's definitely on the TODO list for CouchFoo.

For me CouchDB lends itself well to two distinct domains. Firstly domains where documents are used - that is an object where the fields that are stored to the database change depending on the object. Secondly domains where you wish to take advantage of some of CouchDB's features not present (or poorly implemented) in relational databases - a HTTP interface, fantastic scaling ability due to bi-directional replication, and schema free nature (see this excellent article on friendfeed experience with MySQL) are just a few that spring to mind. People may use CouchDB for the second set of criteria even though their database design could be considered quite structured, and I fully expect this group of people to rise as CouchDB reaches 1.0. However that wasn't why I wrote CouchFoo, my project fell into the first domain. Whilst I provided a way to use ActiveRecord's higher level API I also provided access to a database object that allows simple storage and retrieval of documents by id. If that is all the functionality you require then I would expect CouchREST would be a better choice. However I believe in reality you will quickly find you need to add validations to a field, or maybe add an association or two. And as soon as you start on that slope I believe CouchFoo to be a better choice.

Ultimately I created CouchFoo as I missed the richness of the ActiveRecord API. Whilst I don't believe my library will be perfect for everyone it has received a lot of good feedback. To paraphrase DHH I didn't create the perfect framework for everyone else, I created it for me. I only hope that other people find it useful.

SXSW

Thanks to a lucky draw at dConstruct last year I bagged two free tickets to this years SXSW. I decided to invite Jim along for no other reason that he was likely to be the closest to the event. I'd never been to Texas before and despite hearing many bad reports, word was Austin really wasn't quite as bad.

And what a surprise it was - a laid back city with fairly liberal attitudes. Once I got over the English-American language barrier (swap line for queue, register for till and give me for can I have) things seemed to go well. The line up of talks was amazing - Gary Vaynerchuk was awesome although sadly I only caught the last 20 minutes (good video of him here at FOWA), Brian Brushwood did an excellent talk based on his scam school series, James Powderly gave a fascinating talk of his grafitti art and getting detained in China, and there was an extremely useful panel on how to give good presentations. That's one of the parts I enjoyed the most - the sheer diversity of talks. In addition there were more informal talks where the presenter started off for 10 minutes before opening up to the room - going freelance and becoming productive were two of my favourites in this format. Of course due to the sheer volume of talks many good ones were missed - Larry Lessig seems the prime candidate here. They're going to make all the talks available for download so I'm looking forward to catching what I missed.

The talks are only half of Southby though. The night life is great and there's loads of parties with free drink and beer flowing. These seemed quite hit and miss with the Digg party being awful and the queue to get a signature from Kevin Rose a really quite distressing sight. But for every flop there were some good ones that provided entertainment as wide ranging as Burlesque and live photoshop drawing.

I met plenty of new interesting people, and bumped into quite a few from England although I know of least two out there who I didn't bump into all week. Other highlights included the weather, free wifi everywhere, a film called Burma VJ we randomly caught and England destroy France in the Rugby. More random things included drive through banks and a gig featuring a hip-hop group I strangely enjoyed. And the downers? Well I can't finish without digging just how awful the all american diet is (suprisingly I didn't want my meal in a sea of melted cheese but gee thanks). Overall though a great experience and well worth it if you've never made the trip.

Using objects in models (with CouchFoo)

ActiveRecord allows you to serialize objects into text columns through YAML. This seems useful but in my experience is under-used. One of the primary reasons for this is it's not possible to use the data that the object encapsulates without the ruby model. For example it's not possible to find on the contents of that object or for that matter, modify the object with languages that lack YAML support. With CouchDB all data is stored in JSON so this is not an issue.

The project I wrote CouchFoo for used complex ACLs and I wanted to encapsulate this all in an object rather than use several many-many relationships and construct an ACL object based on their contents. So how do you this with CouchFoo? Simple, any object can be assigned as a property in a CouchFoo model as long as it has a .to_json method and a class .from_json method. The methods do what you'd expect, for example:

class DataObjectAttributeList

  attr_accessor :attributes

  # Constructs the object from JSON
  def self.from_json(json)
    DataObjectAttributeList.new(json)
  end

  # Converts the object to JSON
  def to_json
    @attributes
  end

  def initialize(initials = {}, *args)
    @attributes = initials
  end
end

This is just a simple example storing a hash but the structure could be as complex as you'd like. In the future I plan to add inline associations to CouchFoo, so rather than have a one-to-many association where the many are accessed via a second database query you could have the objects stored as part of the parent contents. Performance wise, this is normally much more efficient (although not in all situations - eg heavy write and low read).

Overall, this becomes a very addictive way of developing and in the same way you start to question whether you need a relational database, you start to question whether you should store associated objects inline or separately.