A couple of people have asked me if my experiments with Pronom and Fido would have been easier if Pronom had been available as RDF or LinkedData. The short answer to this question is ‘no’. Let me explain why.
Parsing the Pronom XML is actually very easy. The schema is straightforward and easy to understand. Having an entire format specification in one XML document ensured that I didn’t miss any bits – and it made exploring and understanding the underlying conceptual model very easy. I didn’t need the parts; I only wanted the whole. In addition, the whole is quite small. At the moment, all of Pronom can be contained in about 730 XML documents – less than 700KB in a Zip file. This is smaller than the PDF report that documents the signature syntax and algorithm.
There are two trivial changes that could be made to Pronom and Droid that would have made the exercise much easier.
- Provide access to the Droid Signature file via a simple HTTP request. Doing this would make it easier to fetch by hand or automatically. It is hidden behind a web services interface, which requires several extra layers of technology and no discernable benefit.
- Provide access to the Pronom XML documents via a single simple HTTP request, returning them as a Zip file. Alternatively, a single page listing all of the format identifiers or URLs would be almost as good.
If the Droid signature information or Pronom format information had been available as LinkedData, I would have had easier access than using web services. That would have been a small improvement. It would have been harder, however, to retrieve all of the relevant parts and assemble them back together again. It would also have been much harder to understand how all of the parts worked together and, ironically, it would have been much harder to understand the underlying conceptual model.
The LinkedData approach would be a much better fit if the Pronom information was larger (thousands of formats), or changed more rapidly (daily or weekly). It might also be a reasonable fit for combining format information from multiple sources. I do love LinkedData, but not for every job.