OPF Diary: June 2017 Part One

OPF Diary: June 2017 Part One

This blog diary captures what the OPF team have been upto. We aim to post updates at least once a fortnight, more regularly when there’s a lot going on.


It’s been a busy month, starting off with our Annual General Meeting on 7-8 June. This year it was kindly hosted by the Bibliothèque nationale de France in Paris. I’ve been busy writing the minutes, collecting the slides and editing the recordings from all the presentations. We plan to publish them as member content on our website next week. We were delighted to welcome two new members at the AGM: Ex Libris and Arcsys Software. Find out more about the benefits of becoming a member and how to join.

We left it a little late in the day for a group photo, but here are some of our members outside the BnF in sunny Paris.

Our monthly Archive Interest Group call was held on 12 June. Our archive members are currently working on a collaborative report investigating significant properties for spreadsheets.

June’s webinar, ‘PRONOM in practice’ was fully booked with a waiting list and all 50 places were full on the day. We were joined by David Clipsham from the National Archives of the UK, Jenny Mitcham from the University of York and Justin Simpson from Artefactual Systems. The recording and slides are available online to Foundation members (login required).

The Document Interest Group (DIG) has been working on assigning IDs to JHOVE error messages and providing explanations about what they mean, alongside sample files. Peter May from the British Library recently published a blog calling on digital archivists, writers and coders to help support this community effort to improve JHOVE format validation. It also include a short survey to help us understand how people are using JHOVE. Take a look at the DIG’s workplan here.


During the OPF AGM I learned that the next meeting of the Rosetta Advisory Group was in Sheffield on 13th June. After a single post-AGM day in the office I caught an early train to Sheffield on Tuesday. The meeting was held at The Edge, a nice University of Sheffield facility. There was a selection of interesting presentations. Adi Alter, of Ex Libris used live polling of participants to gather instant feedback on prospective Rosetta developments. When canvassing opinion on new tool integrations both veraPDF and FIDO featured pretty highly on the list, hats off to Siegfried which proved the most popular choice:

Michelle Lindlar presented the really good work her and her colleague Yvonne Tunnat have been carrying out to testing JHOVE. In short they’ve handcrafted PDF documents designed to test JHOVE’s validation of selected PDF entities, like the document header and trailer. Needless to say that they’ve found a few problems, quite enough to top up the JHOVE issue tracker after the Hack Day left it looking a little bare.

Wednesday saw me pick up the work I’ve been doing with the Wikidata team, based at Yale University. Their idea is to use WikiData as the basis for a community registry of file format and digital preservation metadata, there’s more details in this blog post by Euan Cochrane. I’ve been helping their developer create a portal that help domain experts, i.e. digital preservation staff, to contribute without having to get to wrestle too hard with Wikidata. The first prototype should be available by mid-July if all goes to plan.

Thursday was OPF Tech Clinic day. I had a single session working with the veraPDF policy checker, helping a new user to create a policy schematron document to detect encrypted PDFs and documents containing a digital signature. Internet issues meant the session took a little longer than it should but after 90 minutes we produced a suitable schema.

I spent Thursday afternoon session was spent with a developer working for the German government on ZUGFeRD, an open standard for electronic invoices. He’s working with XML invoices embedded in PDF/A-3 files as attachments. We explored the sample plugin that deals with attached files. Over a couple of hours we could find and extract the embedded XML invoices, ready for further processing. There’s a nice synergy between the projects and it’d be good to work a little more closely with the team there.

Leave a Reply

Join the conversation