Shoulder to Shoulder: Studying significant properties of spreadsheets with the OPF Archives Interest Group

Shoulder to Shoulder: Studying significant properties of spreadsheets with the OPF Archives Interest Group

It really feels good to win an award! Especially in a situation where you initially started the [award-winning] work just because you faced this particular challenge in your everyday job, having no public acknowledgement in mind.

The OPF Archives Interest Group (AIG) won the Best Poster Audience Award at iPRES 2019 for reporting on the ongoing investigation of significant properties of spreadsheets. It all started three years ago when three OPF member organisations, the national archives of Denmark, the Netherlands and Estonia, decided to approach some of our everyday challenges in a cooperative manner. As national archives, we all had spreadsheets in our born-digital collections and the question of how best to preserve them was common among us. Furthermore, some of us were still struggling with archival formats regulations that required migrating something as multidimensional and dynamic as a spreadsheet into a static PDF or TIFF file. 

We decided we would take one topic at a time (spreadsheets), agreed upon monthly online calls to report on progress and agree on tasks for the next month. Becky set up the calls and Remco chaired them keeping us on the track of applying the “InSPECT methodology”. At some point in time Preservica joined the group. The AIG topics are driven by the archives, but all OPF members are welcome to join our work. Since AIG is neither bound to any funded project nor have we any other form of external stimulus – only inner ones! – the work was done slow and steady, free of any negative stress. This recipe for good results was proven at the last iPRES poster session.

Has there been any practical benefit of AIG work? Sure, it’s been enormous and I have seen it manifesting itself on different levels: me, my organisation and the digital preservation community in general. Firstly, for me, the obvious outcome was that by delving into the wide world of spreadsheet properties in general, trying out various property extraction tools and analyzing the spreadsheets in our collection I developed a much deeper understanding of the born-digital tabular records in my custody. 

Secondly, some extremely important developments in my organisation were initiated and not all of these were my motivation in the beginning. In the National Archives of Estonia, we critically reviewed our preservation policy as well as archival formats requirements (the latter listed as an Appendix of the Archival Rules) resulting in loosening the policy and taking up research to draft a new, longer list of archival formats. And this was not enough. Since the Appendix was to be amended, the temptation was to review the Archival Rules in their entirety! Soon a motion to amend the Rules will be proposed to the government. Lesson learned – be careful what you embark upon: the little snowball that starts as “What are our common problems?” discussion in rainy Copenhagen might end up on the table of Estonian Prime Minister one day in (potentially also rainy) Tallinn.

Thirdly, and I love this: the digital preservation community now has a new tool – the Spreadsheet Complexity Analyser (SCA) published under a CC0 licence. SCA is a command line tool to measure spreadsheet-specific properties like number of worksheets, cells used, formulas, comments, hyperlinks, revision history, possible macros etc. in Excel files. After running it on the XLS and XLSX files in my collection the “nature” and characteristics of these files were quickly presented to me. Know what you have – one of the rules in digital preservation.

In the course of listing behaviours that are expected of a spreadsheet, it became crystal clear that the significance really is in the eye of stakeholder. National archives can represent only one type of stakeholders. In order to understand the perspectives of the record creators and – if at all possible – of (near-)future users, it is beneficial to compile stakeholder analyses. This is the next step of AIG work and therefore we call the work ongoing. What next after we have completed the Significant Properties of Spreadsheets Report? We will pick up the next topic. We made a longlist and a short list three years ago, but we will have a brainstorming session to see if there is anything new to add to the list, or if our priorities have changed.

Without international cooperation, digital preservation might sometimes feel like lonely work. There was one more outcome of AIG that cannot be measured. We share all kinds of news and information during the monthly calls and turn to each other for advice. Often we get the advice and even if not – it is oh so valuable to feel shoulder to shoulder with virtual colleagues.


1 Comment

  1. BeMattmann
    October 22, 2019 @ 8:38 am CEST

    Great idea! At the moment we are looking for such a tool for our Ingest. We have the idea that we can semi-automatically decide whether a spreadsheet should be converted to PDF/A or retained in its original format (or at least in one of the ODF or OOXML family formats) – without having to accept too much loss of information and functionality.

Leave a Reply

Join the conversation