I was recently re-reading a Planets project report from 2008, “Representation Information Registries” by Adrian Brown (PC/3-D7 if that means anything to you). It has a nice summary of the purpose of representation information registries that I thought was worth sharing: (extract in italics).
Efficiency of description: Representation information forms a substantial element of the technical metadata required to describe digital objects in a repository. It would be redundant and inefficient for a repository to duplicate the complete representation information network for every object in its custody: instead, it can simply store pointers to the appropriate records in an RIR. The Planets PUID scheme is an example of how this can be achieved in practice. A repository can simply store the PUID for the representation information which applies to an object in its local metadata, and use this to point to comprehensive technical information about that representation information in the Planets Characterisation Registry. An equivalent function is provided by the DCC’s concept of information labels, which attach to a digital object, and contain pointers to the appropriate information in a RIR.
Knowledge sharing: It is generally acknowledged that no single organisation has the resources or expertise to create and maintain information about every conceivable representation information network. RIRs have the potential to allow many organisations to collaborate in this activity. This possibility is being explored within a number of projects, most notably Planets and GDFR (see 5 and 4.8 respectively).
Sustainability and redundancy: The creation of dedicated repositories for representation information can enhance the sustainability of that knowledge base, by decoupling it from the content repositories which it supports. Distributed registry structures can also provide additional redundancy, and increase confidence within the user community that it is secure to rely upon external RIRs to support local preservation infrastructures.
Also, a recent discussion with Andy Jackson had prompted me to try to write down how we at the National Archives of the Netherlands actually use the information from representation information registries – either now or in the near-ish future. Here’s a very rough first attempt:
1) Check the file format and other characteristics of incoming digital objects, to see if they match agreements with the data suppliers (ministries etc). (We are already doing this). For this we need to know about signatures for identification tools, plus validation and characterization tools for the relevant formats. We need to be able to process large numbers of objects, so it has to be automated and it has to be efficient.
2) Help with management of ‘technical environments’ and their relation to the digital objects – hence whether we are maintaining the capability to access every object in our digital repository. (We are not using a format registry for this at the moment, because the range of formats in the repository is currently very small, so the problem is very simple – but it won’t stay that way). For this we need to know which software applications can work with which formats, and what the dependencies of those applications are. In practice I think we will end up actively managing a controllably small set of technical environments. I like the idea of a documented ‘Institutional Technology Profile’ as used (or at least proposed: not sure of latest status) by the National Library of New Zealand.
3) If a new format of object is provided to us, decide what to do with it. This currently happens relatively rarely because the stream of digital records to the archive is just starting to flow. But each time requires a lot of analysis work. If we can look up information on what others do with this format, that could save a lot of time. So sharing of preferred rendering application/environment per format will be valuable.
Currently, the identification process is reasonably well supported by file format registries and the signatures they hold (we use PRONOM together with DROID at the moment), though identification tools for compound objects is definitely an area that needs more work.
There is some information in PRONOM linking software applications to file formats, but this is an area where a lot more work could usefully be done to collect and share information.
The third use case I’ve listed suggests that sharing of ‘policy’ information, such as an institution’s preferred technical environment per format is going to be valuable. That matches some of the feedback I’ve had on our work so far on registries.
I’d welcome any comments on these main use cases based on what other organisations are doing.