In my previous post on formats, I ended up leaning towards a wait-and-see approach to format registry design. Unfortunately, I don’t really have that luxury. The SCAPE project needs to collect more format information to assist preservation planning and other processes. We even have some effort available to help build and/or fill a registry. But which registry should we try fill? Or should we go it alone and make a new one?
I don’t really want to recommend that SCAPE makes a new format registry if we can contribute efficiently to an existing one instead. But which registry? There are so many to choose from…
- PRONOM & Linked-Data PRONOM
- GDFR & UDFR
- Library of Congress Format Registry
- The CASPAR/DCC Representation Information Repository
- My Drupal Prototype
- Portions of The Software Ontology
…and that’s just the ones developed by our own community! Some of the most well-used ones belong to the wider world, and contain almost exactly the same information…
- MIME Media Types
- The File command ‘magic registry’ a.k.a. libmagic.
- W3C’s Ontology for Media Resources
While our registries do contain a reasonable amount of good data, we know we don’t have enough. Why are our own repositories not brimming with the information we need? Is it uncertainty about what the right information is? Are we unsure where best to invest our efforts, leading to a kind of well-meaning deadlock? Or are we simply failing to assign enough time and effort to this task?
The OPF’s proposed solution is to remove any publication bottlenecks via an entirely de-centralised ‘ecosystem’ approach. This implies that SCAPE should go ahead and do it’s own thing, but publish the information as linked data so it can be merged with the other sources. But given we already have an ecosystem of incompatible registries, I’m not convinced this really the best way forward. Perhaps it would make more sense to try and bridge the gaps between them?