After Ross’s post, I thought I’d better follow up on my format registry thoughts and show you all my response to Adam’s challenge. Using my weapon of choice, I was able create and populate a web site for collaboratively editing PRONOM data in just over one week’s worth of my spare time (six days FTE).
About half the time was spent configuring the web interface. No coding was required in order to do this – I just installed off-the-shelf Drupal modules and configured them. Perhaps the most critical module is the Content Construction Kit, as this allows custom content types to be built on top of the basic Drupal ‘node’ type. For example, I needed a ‘file extension’ field for each record, and this was implemented by creating a taxonomy field called ‘file_extensions’ and adding it to my Format content type. This makes it easy to discover known extensions and to group format records by extension.
The rest of the time was spent creating the script to upload the data from the PRONOM files into the site over XML-RPC. It’s not that much code, but I’ve never used XML-RPC before and don’t use Python all that often, so it took me a little while to make it work. The code is on github if you want to have a look. It’s not production-ready, but I think that’s okay for a proof-of-concept prototype. As I say on the site, please use the register link if you want an account so you can have a look at the content editing interface. Of course, it may not be precisely the interface one might choose to design, but I think this is a point worth compromising on. In return, we get a lot of stuff for free:
- A browsing and editing interface with no coding required. Note that the data schema can be edited through the web interface almost as easily as the content itself.
- User sessions and account management, OpenID, authentication, authorisation, user roles etc.
- Content management and workflow tools (drafts, editorial control, notification, etc.).
- User comments (e.g. this one), user content rating.
- Easy access to the latest additions (RSS) and the latest edits (RSS).
- Faceted search and an index. I’m particularly please with the faceted search.
- Content with nice URLs, tagging, versioning and version comparison.
- Direct export as XML or as RDF.
- A programmatic service interface to the content and some site features (e.g. search).
It’s not a complete, polished implementation, but I just wanted to show that web interfaces are not that much work if you use the right tool for the job. In my opinion, if you find yourself writing things like user session management or URL aliasing code, you are probably Doing It Wrong. This stuff has been coded thousands of times, and there are multiple implementations on every platform (and available under open licenses) that will help you get started. By relaxing some of our more superficial constraints, we can get a long way very quickly by standing on the shoulders of the giants of web content management.