Paper on JPEG 2000 for preservation

Challenges of Dumping/Imaging old IDE Disks

The JPEG 2000 compression standard is steadily becoming more and more popular in the archival community. Several large (national) libraries are now using the JP2 format (which corresponds to Part 1 of the standard) as the master format in mass digitisation projects. However, some aspects of the JP2 file format are defined in ways that are open to multiple interpretations. This applies to the embedding of ICC profiles (which are used to define colour space information), and the definition of grid resolution. This situation has lead to a number of interoperability issues that are potential risks for long-term preservation.

I recently addressed this in a paper that has just been published in D-Lib Magazine. An earlier version of the paper was used as a ‘defect report’ by the JPEG committee. The paper gives a detailed description of the problems, and shows to what extent the most widely-used JPEG 2000 encoders are affected by these issues.

The paper also suggests some possible solutions. Importantly, none of the found problems require any changes to the actual file format; rather, some features should simply be defined slightly differently. In the case of the ICC profile issue this boils down to allowing a widely used class of ICC profiles that are currently prohibited in JPEG 2000. The resolution issue could be fixed by a more specific definition of the existing resolution fields.

Both issues will be addressed in an amendment to the standard. Rob Buckley provides more details on this (along with some interesting background information on colour space support in JP2) in a recent blog entry on the Wellcome Library’s JPEG 2000 blog. As Rob puts it:

The final outcome of all this will be a JP2 file format standard that aligns with current practice; supports RGB spaces such as Adobe RGB 1998, ProPhoto RGB and eci RGB v2; and provides a smooth migration path from TIFF masters as JP2 increasingly becomes used as an image preservation format.”

So, some relatively small adjustments to the standard could result in a significant improvement of the suitability of JP2 for preservation purposes.

Since various institutions are using JPEG 2000 now, the paper also provides some practical recommendations that may help in mitigating the risks for existing collections.


Link to paper: JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format


Johan van der Knijff

KB / National Library of the Netherlands


  1. ecochrane
    February 13, 2012 @ 3:37 am CET

    One more note:

    The benefit of the representative environment approach is that creating/capturing the environments would be a one-off process done at point of ingest (and would be a real action we can take now to safegaurd our digital objects). In the future all that would be needed on an ongoing basis would be to migrate the emulation/virtual machine environment to new architectures. The relative complexity & cost of that migration (compared to the number of files it would apply to) may well be (I suggest: would likely be) much less than multiple file-migrations for multiple formats with validation of each combination of source and result file/format. Each Emulator/VM tool would provide preservation functionality for potentially millions or billions of files which would otherwise each have to be migrated every x years.


    There are detriments also of course. One being that information can get trapped in environments and made difficult to use. However there are ways to solve that if the will is there (you can already copy text and files easily out of Emulated/Virtualised machines and print from them). 

  2. ecochrane
    February 13, 2012 @ 3:25 am CET

    I agree about the idea that not all languages are as ambiguous as others. However I’d add that at some level they all have a minimum level of ambiguity related to how words mean things (theories of meaning try to address this). 

    By “representative environment” I did not mean to say that a single representative environment would be enough for all objects from an era, rather that every object might be able to be associated with a representative environment from that era. We may need many representative environments from each era/architecture type.

  3. andy jackson
    February 8, 2012 @ 10:09 pm CET

    To come back to this old thread, I always meant to say that I don’t think it is fair to imply that all languages are as ambiguous as each other. I do not believe it is reasonable to lump boolean logic, regular languages, Turing-complete languages, the entirety of mathematics, and all natural languages together in to one big ambiguous void.

    Normal prose buries its ambiguity in almost every word, even a simple word like ‘blue’ will invoke a slightly different shade in every mind that reads it.

    The whole point with formal languages, including software itself, is that they push the ambiguity to the edges. A computer program will execute precisely the same thing each time it is run, as long as the technical environment can be constructed correctly. The ambiguity is only in the context, and the challenge lies in understanding how much context we really need to maintain.

    I hope a ‘representative environment’ can be found. However, I fear that the combinatoric possibilities of computer installations (which OS and version with which language packs and which Office version and which fonts and which JVM and which DB connections and and and…) means that no single environment will cover a majority of formats. Nevertheless, it will be interesting to find out!

Leave a Reply

Join the conversation