johan's Blog

Digital Preservation Researcher at KB / National Library of the Netherlands

Like many other organisations that are using JPEG 2000, the KB produces two representations of most of its digitised content (newspapers, books, periodicals): a high-quality, losslessly compressed JP2 that is the archival master; a lesser-quality, lossily compressed JP2 that is used as an access image (this is used for e.g. our newspapers website). The majority […]

By johan, posted in johan's Blog

19th Aug 2013  11:22 AM  14791 Reads  No comments

Last winter I started a first attempt at identifying preservation risks in PDF files using the Apache Preflight PDF/A validator. This work was later followed up by others in two SPRUCE hackathons in Leeds (see this blog post by Peter Cliff) and London (described here). Much of this later work tacitly assumes that Apache Preflight […]

By johan, posted in johan's Blog

25th Jul 2013  12:57 PM  23964 Reads  12 Comments

It’s been more than two years now since I wrote my D-Lib paper JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format. From time to time people ask me about the status of the issues that are mentioned in that paper, so here’s a long overdue update. Issues addressed in the 2011 paper The […]

By johan, posted in johan's Blog

1st Jul 2013  4:44 PM  19580 Reads  2 Comments

Last year (2012) the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report's findings and conclusions have become outdated. This applies in particular to the observations on EPUB 3, and the support […]

By johan, posted in johan's Blog

23rd May 2013  2:23 PM  17475 Reads  No comments

About a year ago, work started on packaging SCAPE tools. Jpylyzer was the first SCAPE tool that was turned into a Debian package. Some time later, the OPF set up a couple of machine images at Amazon Web Services, which can be used to create packages repeatedly using a virtual machine. Even though I’ve used […]

By johan, posted in johan's Blog

23rd Apr 2013  10:53 AM  14420 Reads  No comments

The most important new feature of the recently released PDF/A-3 standard is that, unlike PDF/A-2 and PDF/A-1, it allows you to embed any file you like. Whether this is a good thing or not is the subject of some heated on-line discussions. But what do we actually mean by embedded files? As it turns out, […]

By johan, posted in johan's Blog

9th Jan 2013  1:42 PM  128836 Reads  16 Comments

The PDF format contains various features that may make it difficult to access content that is stored in this format in the long term. Examples include (but are not limited to): Encryption features, which may either restrict some functionality (copying, printing) or make files inaccessible altogether. Multimedia features (embedded multimedia objects may be subject to […]

By johan, posted in johan's Blog

19th Dec 2012  3:15 PM  16410 Reads  1 Comment

I've already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and […]

By johan, posted in johan's Blog

4th Sep 2012  11:04 AM  16565 Reads  2 Comments

The purpose of this post is to give a brief introduction to creating, editing and submitting format signatures (or ‘magic‘ entries) for the well-known File tool. The occasion for this was some work I did last week on improving File‘s identification of the JPEG 2000 formats. I had some difficulty finding any easy-to-follow documentation that […]

By johan, posted in johan's Blog

9th Aug 2012  11:53 AM  32756 Reads  1 Comment

In this blog post I'll be dusting off some old stuff for a change. The occasion for this is the following question, posted by Paul Wheatley on the Libraries and Information Science Stack Exchange website a few days ago: What preservation risks are associated with the PDF file format? This reminded me of a report […]

By johan, posted in johan's Blog

26th Jul 2012  9:48 AM  17876 Reads  3 Comments