Last year I blogged about my frustrations related to digital preservation tool registries. Rather than pooling all of our knowledge in one place and creating a valuable community resource, we've spread our knowledge about tools thinly across the web. Instead of seeing collaboration between organisations working in digital preservation, we're actually seeing competition! Virtually every organisation involved in the field promotes it's own registry or tool list. This is a ridiculous state of affairs. As I observed at IPRES last year in my least eloquent but most frequently quoted moment, it's a big fail for our community.
Two weeks ago I presented a proposal for the creation of a community owned tool registry to the latest workshop on Aligning National Approaches to Digital Preservation, graciously hosted by the lovely people at IDCC. I'm pleased to say that the proposal was one of four key areas prioritised for further action, and I'm now leading some initial activities to take things forward with backing from ANADP (note that a full report from ANADP on the workshop outcomes will be available here shortly).
However, I'd like to get even broader support for this community proposal from everyone who has their own registry or tool list, whether it's a quick blog post or a full on registry. If that applies to you/your organisation then I'd like you to participate in the following way:
- Provide your requirements for a community tool registry (the call for requirements will appear shortly)
- Merge your own tool registry data with the new community registry
- Link to, expose (a view onto) and promote the community registry from your website
- Delete your own registry and agree not to set up any new project owned registries/lists
- Contribute any effort you have in adding new tools over time, to the community registry
Exactly where the new registry will be hosted and maintained is yet to be decided (quite possibly a "neutral" URL/location. Whatever meets our requirements!). This will require some practical work to establish but is certainly not insurmountable. The key issue is to get buy in from the community. As I note in the proposal, we already have support in principle from the Library of Congress, the Digital Curation Centre and the Open Planets Foundation. This is a great start, but for this to be a success we need a lot more organisations to get involved.
Over the next couple of weeks I'll be putting together an outline and roadmap as an initial talking point for comment and requirements and sharing it via this blog. So this is my call to arms for COPTR: a Community Owned digital Preservation Tool Registry. Who would like to voice their support and commitment, create a valuable tool registry for us all, and kick off some vital community collaboration in the process?
paul
February 7, 2013 @ 2:29 pm CET
David and Jay,
Dclipsham
February 7, 2013 @ 1:16 pm CET
Hi Jay,
I think it is entirely fair and accurate to suggest that The National Archives’ current focus for PRONOM is very much geared towards it being a file format identification/description registry.
It is true that PRONOM was originally built with the concept in mind of being an all-encompassing registry, that would describe software tools in detail and would cover similar ground that Paul is anticipating that COPTR will cover. However over the years, we have come to realise that this is simply too big a task for an organisation like ourselves.
Our focus has therefore shifted firmly towards the file format space, because this sits naturally with our own internal priorities and practice.
We welcome other organisations joining the registry space and very much hope that COPTR will grow to fill many of the gaps that we cannot and we are prepared to help in any way we can.
I believe the best-case scenario would be to have a handful of registries that focus or specialise on different aspects of digital preservation needs, that can cross-reference and offer natural links between each other.
To achieve this goal I think that Linked Data/Semantic Web technology would be the optimum paradigm, and I would certainly recommend that COPTR considers this approach.
I also think that having distinct, geographically and institutionally-separated registries offers a greater degree of robustness than having a sole registry that ‘does everything’ and that everybody relies on
I think some degree of overlap between registries is entirely inevitable, but I would say too much information is better than too little.
David
Jay Gattuso
February 6, 2013 @ 8:39 pm CET
Hi Both,
Paul, thanks for the clarifications – it's very clear that you see the 'wiki' / community contributions aspect of Coptr as being central to achieving this aim, and I really don't disagree with that view.
It would be very cool to have a constrained data model at the centre of the concept – picking up on David's comment about the chaining of things together, it would be hugely desirable to have a fixed record structure that could be linked to / exported out with a good degree of confidence in the long term consistency of the individual record structure.
It's probably very early in the development to be considering the long term 'locking down' of data structures, but I think it would be a good thing to have registered as a long term intention of Coptr to help aid sustainable buy in / use. Related, I would love to see more about the various user actors (people user classes & system based users) that you envisage as being the consumers of Coptr data, and what views into the data you expect to be providing (e.g. html, xml, sparql) and what sort of granular level you want to support (simple overview of each tool, full view of all tool data, classes of related tools, etc)
I'm really looking forward to seeing the concept demonstrator, I think this will be a really useful resource once it's up and running!
David, noted! given Paul's reply I can see how there are a number of features that he wants that PRONOM could not support.
That said, there is a risk of some overlap between the Coptr and the PRONOM software list – is there a plan to tie these things together in anyway, or does TNA see Coptr as being a distinctly different beast? (I can see for example that a boundary between to the two registries is that PRONOM is describing software that is solely used to render / 'perform on' files, and Coptr is looking at software that functions on sets of files in a descriptive capacity, which is somewhat different to a file performance.
I'm very interested in your comments about chaining registry entities together to form a 'complete' registry… in my view, this would certainly have a hugely positive impact on the sector.
Dclipsham
February 5, 2013 @ 3:50 pm CET
Just to add, while I'd very much love PRONOM to be a complete all-in-one registry that suits everybody's needs, it isn't possible given current resource to push PRONOM down the route Jay suggests at this stage.
I would hope however, that a Linked Data PRONOM would be a step in the right direction, so we could focus on what we do, and other registry providers could focus on their interests and together we chain together something approaching a 'complete' registry.
David
Jay Gattuso
February 3, 2013 @ 10:58 pm CET
Hi Paul,
This looks a useful registry to keep up to date and centralised.
I have a couple of questions for you that will help me to understand what you are proposing.
(1) From a distance, it seems like the records for each tool could be held by the PRONOM Software list (http://www.nationalarchives.gov.uk/PRONOM/Software/proSoftwareSearch.aspx?status=listReport), or something very similar in construction. If the idea is to reduce replication and record set diversity, would it be viable to host the tool set in PRONOM rather than in another new location? There are probably a few tweaks that would need to be made to the base data model and of course there would need to be some buy-in from TNA, but these things are surmountable, and it does seem to me that PRONOM is the natural "home" for this kind of thing…
(2) Further to the first question, it would be useful to know what sort of data you are looking to hold about each tool. Are you interested in holding a descriptive listing of tools (e.g. name, capability, website) or are you looking to include binaries – either of the tool itself, or of any supporting documentation (e.g. FAQ, install notes, quick start guide etc). Is there also any desire to hold implementation examples and / or user experience information?
Best,
Jay