OPF Archive Interest Group meet in Copenhagen

PDF Eh? – Another Hackathon Tale

Two weeks ago our archive interest group (AIG) held their first face to face meeting. We were kindly hosted by the Danish National Archives in Copenhagen. Currently the AIG comprises participants from our three national archive members in Denmark (Rigsarkivet) the Netherlands (Nationaal Archief), and Estonia (Rahvusarhiiv) (new members are welcome). The group has been running for around six months via teleconferences and email.

AIG, Copenhagen February 2017
Archives Interest Group, Rigsarkivet Copenhagen, February 2017

 

The aim of the AIG is to discuss challenges and share knowledge on digital preservation from an archives perspective. To agree on goals and aims, the group made use of the Catalogue of Preservation Policy Elements. They discussed each of the sections and how it related to their work, then selected a shortlist of topics to focus on.

One of the first tasks they are working on is a joint report on accepted and preferred formats in their organisations, and the reasoning behind these. Each archive has a mandate to preserve data from government agencies for the long term but is subject to different laws and restrictions. In advance of the meeting, the AIG shared their preferred format documentation and then presented the reasoning behind their decisions at the meeting.The following discussion considered questions such as:

  • How well do we need to understand a format?
  • How do we ensure it conforms to the standard?
  • What national laws do we need to comply to?

Their approaches had several elements in common. They reviewed national and international guidance about accepted formats for digital preservation (e.g. Recommended Formats Statement by the Library of Congress) and they had defined format categories by type e.g. document, image, audio, database etc.

The Dutch National Archief published their new list of accepted and prefered formats (in Dutch) in late 2016. Law requires that the they should accept formats that conform to open standards where possible (comply or explain). However they also accept common proprietary formats such as the Microsoft Office formats. Other formats require a dialogue between the archives and the provider. They are currently undertaking impact analysis projects and meet with ministry representatives and an internal team of records managers, metadata experts, file format experts, technical staff to assess the process.

The Rigsarkivet, Denmark, has a stricter list of formats. They have adopted a migration strategy and government agencies must migrate their files to these formats before submitting them to the archives. The Rahvusarhiiv provides guidance and advice to the agencies to support them. Their list of formats was created from a list of criteria which consider comprehensiveness, storage, migration, support and monitoring.

The Rahvusarhiiv, Estonia, has the same legal position as in Denmark i.e. they provide a strict list of accepted formats to the agencies. If there is a situation where migration may cause damage to a file, they may agree to accept the original.  They have a couple of AV specialists within the archives and so their requirements for film and audio formats are very detailed. They aim to increase their support for agencies to help them think about archiving and preservation so it is embedded in their workflow.

Resources

The archives can only perform preservation in relation to the resources that are available. The scale between them is different, but they need to perform the same functions. The archives are under pressure to accept a wider range of formats, including some specialist formats such as x-rays with embedded medical information. Without a thorough understanding of a format, it is problematic for an archive to accept it and to preserve it in the long-term.

Spreadsheets

Spreadsheets pose a big challenge for archives. Simple spreadsheets e.g. a meeting agenda in a table format, or other documents specifically designed to be printed, can be migrated to TIFF or PDF/A (or other formats) without too many issues. However, many spreadsheets are not meant to be printed and information is lost once they are migrated. Different viewers mean files render differently. Different versions and open vs proprietary formats also cause issues. The lack of interoperability between the open formats (ODF and OOXML) is one of the issues that makes the Danish Rigsarkivet and Estonian Rahvusarhiiv hesitant about accepting those formats for long term preservation.

Significant properties

The AIG discussed about significant properties. It is their mandate to archive digital documents and so they do not ask all the different government agencies and departments what aspects or functionality of the documents they want to preserve, or ask about significant properties. It would be impossible to meets everyone’s requests. They need to decide what is most important: does it render so it looks exactly the same? Or is it enough that the content remains the same? Or are there any formulas that we must not lose?

Day 2

The second day of the Archives Interest Group began with a discussion about risk management and reliability.

The Rigsarkivet carry out a risk analysis once a year. They use a standard model to identify threats, probability of the threats and assess consequences.

The group discussed how archives present themselves to the outside world in terms of reliability. Archives are responsible for personal information and government documents. From some perspectives a national archive has implicit trust. Do they really need a certificate to prove it?

There are several means of certification available. The AIG briefly discussed the processes and costs required for TRAC, the Data Seal of Approval and a few others.

It was agreed there are benefits in certification. Firstly the tasks and documentation required to become certified may help to improve internal processes. Some archives carry out external work. Certification, particularly of an ISO standard, shows they are reliable, and can be trusted with private records.

It was noted that useful documenting of processes is extremely valuable, especially if this knowledge lies with only one or two people. It can be a big risk if they move jobs or retire. However, asking people to write documentation is expensive. The day to day work still needs doing.

The Nationaal Archief is working towards the Data Seal of Approval. It is not as time/resource heavy as TRAC. They have formed a working group and have asked different departments for information and will then write up the submission. They see this as a first step to improving their internal processes. In Denmark, the Danish Data Archive has recently become integrated in the Danish National Archives. They had started the Data Seal of Approval before the move and the process will continue, possibly with a broader scope including not only research data.

Technology watch

The Rigsarkivet have developed an in-house solution based on SIARD and the principle of distributed digital preservation. Copies of data are both stored off-line and in the “Bitrepository”, developed together with the Royal Library. The Nationaal Archief and Rahvusarhiiv both use a vendor solution. All three make use of open source tools including outputs from the E-ARK project, and those wrapped in vendor solutions.

JHOVE error messages are difficult to understand. The OPF Document Interest Group has been documenting error messages with the aim to create a wiki to explain what they actually mean so organisations can assess the potential preservation impact. Together with community contributors they took some big steps towards achieving this in the first online hack day. A second hack day is being planned for the end of March.

The AIG plans to monitor available tools and discuss how they use them. The PREFORMA open source validators are of interest to the group, as are the E-ARK solutions. Natural language processing tools are important for access.

Open source software means there is more chance of continued support. Open standards and the logic behind them is important to the archives.

Next steps

The AIG will produce a shared status report about their preferred and accepted formats, with a special focus on spreadsheets, and significant properties. Following this they plan to draft a report on reliability and risk management.

To find out more about becoming an OPF member and participating in the archive interest group see: https://openpreservation.org/about/join/

Leave a Reply

Join the conversation