Harmony in the field of Digital Preservation
By Ross Spencer and Bernhard Hampel-Waffenthal
Cover Photo by @mischievous_penguins on Unsplash.
How many digital preservation developers does it take to change a lightbulb? Two. One to change the lightbulb, and the other to change the lightbulb using a different standard.
We’re in an interesting place in the field of digital preservation, we have a lot of capable people producing incredible systems and tooling. The UK Web Archive, and Rockefeller are two standout organizations doing work in this area.
- British Library UK Web Archive Update 2021:
- Rockefeller Archives Project Electron Update 2018:
The lines between archivist and digital archivist or developer haven’t existed in a profound way for a long time:
Though, we are also still quite siloed.
The UKWA and Rockefeller are just two projects with vastly different technical stacks – though both are on a large scale and so perhaps this is anticipated.
At more granular levels, we also have different types of tooling.
The question was posed to the audience at iPRES2022 how do we make sure an Andy Jackson isn’t alone in standing up a digipres.org? And it can be extended, to a Tessa Walsh standing up METSFlask? Or Johan van der Knijff and the first versions of Jpylyzer? Or one of the many other solo efforts out there? We’d like to add to that first question – how can those new to writing code be on-boarded to a standard methodology that provides a jumping board to understand a bigger ecosystem of projects?
We have a framework that enables collaboration in GitHub which seems to be popular, but for each project, governance isn’t automatic and roadmaps aren’t necessarily clear. Before then (not necessarily for any of the projects we’re talking about here) application design and interfaces are set before others have had the opportunity to help validate the approach. Design patterns and programming languages are selected that aren’t necessarily easy for others to engage with or extend.
Though wonderfully varied, massively useful, and incredibly important to the community at different stages of the archival workflow, whether rightly or wrongly, projects feel hermetically individual. Those lone script-writers doing their best for their institution and without the time or energy to engage on a broader community level, even more so.
Ross had one way of looking at it, and a lot of strategic compromises he was willing to make as an individual maintainer and experimenter – Bernhard on the other hand had new, and fresh ideas, that looked a lot like a tool like Demystify, but didn’t necessarily fit into Ross’ design goals.
Why should it be so hard for Bernhard to take this approach and customise something useful from something that’s already out there though?
This is a really difficult topic to pin-down, and also, to do so without alienating anyone. What we are hoping to do is to kick-off a discussion about how to create a greater sense of harmony in the field when it comes to the genesis of projects and then growing them. At the same time ensuring that individual expression is represented and innovation isn’t hindered.
But what else do we mean by harmony? Can we make this term more precise?
Primarily, we feel it’s, how do we work together to a standard that generates economies of scale?
Economies of scale when we talk about the development of digital preservation software and tooling may look like contributions, in their tens or hundreds from a diverse group of people to:
- Report issues.
Among other things that can happen when something is consistent and easier to understand for a larger group of people.
In a short time frame, maintenance keeps things going and these activities all contribute to that. On a longer time frame, projects may occasionally fall into disrepair or underuse, and the standards we may begin to talk about here also impact the ability to resurrect a project which is something we may yet be unable to do, easily at least.
It was back in 2013 Ross had a conundrum at Archives New Zealand, trying to resurrect the tool c3po.
The tool looked good, and looked like it would begin to satisfy some of the pre-ingest requirements of our nascent digital transfer work.
The tool, however, was written as a Java Play project which was going to be difficult to deploy at the organization through the different layers of IT bureaucracy. Ross found the framework out of his comfort zone and difficult to set up at home at the time, even without administrative restrictions on his PC; let alone trying to take it into work to share with colleagues and run.
He discusses this some more on his follow up blog to his Maintainer’s talk in 2022: Information Maintainers talk: Something something twenty years open source…
In that same talk Ross asks if there is a perfect programming language or framework that can be adopted in digital preservation? Something with the most buy in from folks capable of writing code. And if so, what would it need to support?
- ease of installation,
- ease of translation,
- ease of writing tests,
- and ease of writing documentation.
More technically than that, are there common interfaces that we can satisfy that enables tooling to be developed that is discrete, and modular, and can be integrated into systems, such as those needed for digital preservation? Or an approach that makes tooling easy to use, and the outputs used by the individual, or organisation, to contribute to different workflows and processes.
- One interesting approach, that uses microservices for integration is that of the Rockefeller Archives:
- Another interesting approach, using web APIs is the Preservation Action Registries (PAR) project:
Trying to create a collaborative framework conjures memories of PLANETS, but like a lot of projects in the past; such projects are not accessible to all (gatekept), and are not self-organising, or necessarily all that socially harmonious..
And so, perhaps, to begin, socially, what would we need to do to arrive at something, i.e. a standard, a methodology, a new tool, or set of tools?
That’s a difficult question, and more difficult for a developer to maintain perspective on let alone have a sensible one. But hopefully y’all have ideas?
Perhaps you have good examples of projects that are more successful at having more than one developer? We might be able to learn for example from Apache Tika, or FFMProvisr.
- Apache Tika has 107 listed contributors today.
- FFMProvisr is one that comes to mind more easily as a standout collaborative project in the community with 22 listed contributors today.
We’d like to explore these questions further by running a survey to see where our commonalities and our differences may be. We’d like to find out what your interests are, and how you might want to contribute to future projects.
The survey consists of a maximum of 26 questions and the answers are anonymous. Before the results are published they will also be filtered for any identifying information.
The results will be presented in another blog in the new year.
Link to the survey
Google Forms: https://forms.gle/3FEpWeue5gDCBxta6
Sign-in is not required, and email addresses are not recorded.
A collaborative project
Over the years, the iPRES bake-off has proven an interesting event to observe. This year it presented a possible framework for looking at digital preservation tooling:
- The Starter Course includes all the actions that happen before a digital object is transferred to or ingested into an archive repository;
- The Main Course includes actions once a collection has been transferred into an archive repository;
- The Dessert Course concerns all aspects of access and user engagement with data that has been preserved.
Following the presentations this year, we noticed there may be room for other tooling (potential gaps not covered by existing tools). Perhaps you did too? We want to know what they looked like to you? And we’d like to use this blog as an opportunity to ask your thoughts on them.
We’d like to do that using the linked-to survey, with two potentials
- An opportunity to use the survey results to discuss synergies in the community, and where there are areas that we could bring closer together.
- An opportunity to start a community owned project, to allow contributors to work towards a common outcome, e.g. plugging one of the gaps in our tooling capabilities.
We are happy with one or both possibilities. Our first priority is understanding the community in 2022 a little better.
If we are able to garner enough interest in a collaborative project, we would like to use a webinar to discuss that further. And if we get as far as a joint project, we hope that it will offer those involved an opportunity to gain experience in the project lifecycle and its different roles; and the end result is anticipated to demonstrate a methodology and set of tools for working together, as well as provide documentation of the many learnings made along the way. With the added bonus of a piece of software that we can all use in our respective organizations.
We’d like to run this survey until the end of the year, so please share, far and wide. And we’ll try to present the results by the end of January 2023.