OPF Reference Toolset
Our products address common issues facing many organisations. Together, they form a reference toolset for digital preservation which can be adapted for use in different organisational workflows. We have defined a common workflow for digital preservation and mapped out where our open source products can fit within it. The workflow includes standard preservation actions, but the details, order of processing, and policies may differ by organisation. We focus on the ingest/pre-ingest subset of the OAIS model because our current products address these areas.
A common workflow for digital preservation
Once a user has decided that a digital object should be preserved, the item needs to be catalogued and classified. This is as true with a digital object as a physical object. Each digital object has associated metadata stored alongside it, which must also be catalogued, classified, and stored. The process of cataloguing and classifying has several stages, including identification, validation and characterisation.
File format identification is a typical first step in a workflow. It answers the question: “I have a digital object; what format is it?”
The object may be a single file or a complex structure (such as a ZIP file) that contains other structured files and objects.
The process of identification will provide information about the object which may be as simple as a statement of the file format, variety and version. The information may also be more complex, such as a complete map of a complex container. All of this information can be recorded as metadata and added to the associated collection of metadata with the object.
Format validation is the process of determining an object’s level of compliance with the relevant format specification. It answers the question: “I have an object purportedly of format X; is it?” Format validation conformance is determined at three levels:
- Well-formedness – A digital object is well-formed if it meets the purely syntactic requirements for its format
- Validity – An object is valid if it is well-formed and it meets the higher-level semantic requirements for format validity
- Consistency – An object is consistent if it is valid and its internally extracted representation information is consistent with externally supplied representation information
At the end of the validation process, the metadata that accompanies the digital object will be updated to include details of the validation, such as the policy choices made and the results of the conformance checking.
Format characterisation is the process of determining the format-specific significant properties of an object of a given format: “I have an object of format F; what are its salient properties?”
Characterisation also involves taking information about the digital object to build a richer context beyond its purely technical, structural details. For example, a word document usually records the author, the time of creation, the editing time, word count, and more, while a JPEG image file may carry information about the camera used to take it, time and day, and often even geolocation information. This information and the policy for this information to be extracted and duplicated outside the file is a key piece of metadata to be stored alongside the digital object.
Packaging, Cross-Check, Quality Assurance, Review
Finally, the digital object and the metadata gathered during identification, validation, and characterisation is prepared for storage. In this final stage of packaging, cross-checking, QA, and review, the digital object is formed and wrapped into an archival information package (AIP).
To help us manage a growing number of products and ensure they are fit for purpose for the long term, we have developed a programme of work to enhance the OPF reference toolset. The goal is to enable users to easily adopt our products and incorporate them into their workflow.
We plan to consolidate the number of different elements in our reference toolset and provide a single user interface for our products. Our approach is to create consistency across the toolset and provide a solid foundation on which to build new functionality. In the future, we will offer users a flexible choice of modules and make it more straight-forward to integrate other tools.
By reducing the amount of redundant code and separating out old project infrastructure, we aim to minimise maintenance overheads while taking forward functional, sustainable products.
This work is overseen by our Product Board.