Friday, April 28, 2006

Convention, Configuration, and Communication Entropy

The ever more popular web framework, Ruby on Rails, has at its core an interesting idiom: Convention over Configuration. The idea goes as follows: for the most part a data model will map directly to a database schema, and for the most part the display and editing of the data will map directly onto the model, therefore: don't specify the mapping in configuration, assume the mapping exists 1-1.

The long established ideas of Algorithmic Information Theory back up these design choices very well, in particular the idea of Communication Entropy applies directly to this situation. The entropy associated with a given communication relates to the amount of information that one learns upon receiving the communication. So for instance when one learns that a table PERSON has a data model equivalent Person and a view person.rhtml then one learns very little because this 1-1 type mapping is very common.

In communication theory one uses the entropy of a message to compress the message, so for instance in English the letter E crops up very often, one learns very little from a message from the presence of E (you can test this by removing all the E's from this text, or indeed all the vowels, you will mostly likely find the text completely readable without them), therefore when we want to send an 'E' down a communication line we send the smallest encoding: 0, leaving the whole of the rest of the alphabet to begin with 1 (t = 101, a = 110, etc.). When one encodes English like this on average a letter can be communicated with about 2 bits. Without encoding the average would be around 5 bits, over twice as large.

Perhaps if the same ideas were applied to software we could also see a reduction in message size. In fact this idea underpins Domain Specific Languages (like those Rails uses): don't communicate that which you can take as a given.

No comments: