Representation Information Registries: what are they for?

PDF Eh? – Another Hackathon Tale

I was recently re-reading a Planets project report from 2008, “Representation Information Registries” by Adrian Brown (PC/3-D7 if that means anything to you). It has a nice summary of the purpose of representation information registries that I thought was worth sharing: (extract in italics).

Efficiency of description: Representation information forms a substantial element of the technical metadata required to describe digital objects in a repository. It would be redundant and inefficient for a repository to duplicate the complete representation information network for every object in its custody: instead, it can simply store pointers to the appropriate records in an RIR. The Planets PUID scheme is an example of how this can be achieved in practice. A repository can simply store the PUID for the representation information which applies to an object in its local metadata, and use this to point to comprehensive technical information about that representation information in the Planets Characterisation Registry. An equivalent function is provided by the DCC’s concept of information labels, which attach to a digital object, and contain pointers to the appropriate information in a RIR.

Knowledge sharing: It is generally acknowledged that no single organisation has the resources or expertise to create and maintain information about every conceivable representation information network. RIRs have the potential to allow many organisations to collaborate in this activity. This possibility is being explored within a number of projects, most notably Planets and GDFR (see 5 and 4.8 respectively).

Sustainability and redundancy: The creation of dedicated repositories for representation information can enhance the sustainability of that knowledge base, by decoupling it from the content repositories which it supports. Distributed registry structures can also provide additional redundancy, and increase confidence within the user community that it is secure to rely upon external RIRs to support local preservation infrastructures.

Also, a recent discussion with Andy Jackson had prompted me to try to write down how we at the National Archives of the Netherlands actually use the information from representation information registries – either now or in the near-ish future. Here’s a very rough first attempt:

1) Check the file format and other characteristics of incoming digital objects, to see if they match agreements with the data suppliers (ministries etc). (We are already doing this). For this we need to know about signatures for identification tools, plus validation and characterization tools for the relevant formats. We need to be able to process large numbers of objects, so it has to be automated and it has to be efficient.

2) Help with management of ‘technical environments’ and their relation to the digital objects – hence whether we are maintaining the capability to access every object in our digital repository. (We are not using a format registry for this at the moment, because the range of formats in the repository is currently very small, so the problem is very simple – but it won’t stay that way).  For this we need to know which software applications can work with which formats, and what the dependencies of those applications are. In practice I think we will end up actively managing a controllably small set of technical environments. I like the idea of a documented ‘Institutional Technology Profile’ as used (or at least proposed: not sure of latest status) by the National Library of New Zealand.

3) If a new format of object is provided to us, decide what to do with it. This currently happens relatively rarely because the stream of digital records to the archive is just starting to flow. But each time requires a lot of analysis work. If we can look up information on what others do with this format, that could save a lot of time. So sharing of preferred rendering application/environment per format will be valuable.

Currently, the identification process is reasonably well supported by file format registries and the signatures they hold (we use PRONOM together with DROID at the moment), though identification tools for compound objects is definitely an area that needs more work.

There is some information in PRONOM linking software applications to file formats, but this is an area where a lot more work could usefully be done to collect and share information.

The third use case I’ve listed suggests that sharing of ‘policy’ information, such as an institution’s preferred technical environment per format is going to be valuable. That matches some of the feedback I’ve had on our work so far on registries.

I’d welcome any comments on these main use cases based on what other organisations are doing.

Leave a Reply

You might also like…

Post icon

Representation Information Registries: OPF and UDFR

On 13 and 14 April 2011 I took part in the UDFR Stakeholder meeting in Washington DC.  The UDFR team had invited around 25 people, mainly from the institutions represented on the UDFR governing body, plus a few others including me, representing the National Archives of the Netherlands and OPF. 


The UDFR team presented their progress and plans for the project and invited feedback and what turned out to be a lively discussion.

Post icon

Digital Preservation Summit: workshop on format registries

At the Goportis Digital Preservation Summit in Hamburg last week, I had the pleasure on behalf of the Open Planets Foundation of chairing a very…

Post icon

BBC Domesday Reloaded needs emulation to finish the job

Today marks the launch (timed to coincide with mass observation day) of BBC Domesday Reloaded, which both recycles the iconic 1986 BBC Domesday Project and…

Join the conversation