johan's Blog

Digital Preservation Researcher at KB / National Library of the Netherlands

This is the second and final instalment of a 2-part blog on the use of PDF/A validators for identifying preservation risks in PDF. You can read the first part here. In Part 1 I showed how PDF/A validators can be used to identify preservation risks in a PDF. I illustrated this with an example that […]

By johan, posted in johan's Blog

8th Jul 2015  1:32 PM  4407 Reads  No comments

This is the first instalment of a 2-part blog. It was prompted by the upcoming Digital Preservation Coalition briefing When is a PDF not a PDF?, for which I was asked to prepare a presentation. My initial idea was to give an overview of the work we did on PDF preservation risk assessment using a […]

By johan, posted in johan's Blog

7th Jul 2015  12:45 PM  4840 Reads  No comments

While browsing ArchiveTeam's File Formats Wiki earlier this week, I came across some entries I created there on Quattro Pro spreadsheets two years ago. At the time I had also contributed some old Quattro Pro for DOS spreadsheets (here and here) from my personal archives to the OPF format corpus. Seeing those files again, I […]

By johan, posted in johan's Blog

29th Oct 2014  2:59 PM  25069 Reads  2 Comments

Earlier this week I had a discussion with some colleagues about the archiving of mobile phone and tablet apps (iPhone/Android), and, equally important, ways to provide long-term access. The immediate incentive for this was an announcement by a Dutch publisher, who recently published a children's book that is accompanied by its own app. Also, there […]

By johan, posted in johan's Blog

23rd Oct 2014  11:33 AM  13579 Reads  2 Comments

Some time ago Will Palmer, Peter May and Peter Cliff of the British Library published a really interesting paper that investigated three different JPEG 2000 codecs, and their effects on image quality in response to lossy compression. Most remarkably, their analysis revealed differences not only in the way these codecs encode (compress) an image, but […]

By johan, posted in johan's Blog

26th Sep 2014  1:06 PM  17958 Reads  3 Comments

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As […]

By johan, posted in johan's Blog

27th Aug 2014  3:47 PM  19136 Reads  9 Comments

One of my first blogs here covered an evaluation of a number of format identification tools. One of the more surprising results of that work was that out of the five tools that were tested, no less than four of them (FITS, DROID, Fido and JHOVE2) failed to even run when executed with their associated […]

By johan, posted in johan's Blog

31st Jan 2014  12:58 PM  1681839 Reads  6 Comments

This blog follows up on three earlier posts about detecting preservation risks in PDF files. In part 1 I explored to what extent the Preflight component of the Apache PDFBox library can be used to detect specific preservation risks in PDF documents. This was followed up by some work during the SPRUCE Hackathon in Leeds, […]

By johan, posted in johan's Blog

27th Jan 2014  3:08 PM  22984 Reads  7 Comments

My previous blog Assessing file format risks: searching for Bigfoot? resulted in some interesting feedback from a number of people. There was a particularly elaborate response from Ross Spencer, and I originally wanted to reply to that directly using the comment fields. However, my reply turned out to be a bit more lengthy than I […]

By johan, posted in johan's Blog

8th Oct 2013  4:24 PM  16077 Reads  4 Comments

Last week someone pointed my attention to a recent iPres paper by Roman Graf and Sergiu Gordea titled "A Risk Analysis of File Formats for Preservation Planning". The authors propose a methodology for assessing preservation risks for file formats using information in publicly available information sources. In short, their approach involves two stages: Collect and […]

By johan, posted in johan's Blog

30th Sep 2013  3:49 PM  25645 Reads  9 Comments