Part of my work on the SCAPE testbeds involves producing a workflow for the large scale migration of TIFF to JP2 files, with validation. The tests I have run all involve the lossy compression of files.
Two tools that could be used for the validation of image payload, and therefore success of a migration, are Matchbox, developed for SCAPE by AIT, and ImageMagick’s “compare” tool. One of Matchbox’s tests gives a result of SSIM, a value between 0 and 1. The metric I chose to use from “compare” was PSNR, a value in decibels.
I ran some tests using thirty master TIFF files (approximately 28 mega-pixels in size) to see how effective Matchbox and “compare” were when calculating the sameness of a TIFF and an altered version of that TIFF, for example, with added noise, blur, pixellation and horizontally shifted pixels.
Baseline figures for a high quality lossy compression of a TIFF to JP2 using what is essentially the BL newspaper profile, gives a PSNR value of 52dB (good) and Matchbox SSIM result of 0.996 (good).
The tests showed that Matchbox successfully identified that the files were similar, despite the alterations. Mean results were greater than 0.995, indicating a good match. For the same comparisons using “compare”, the mean average PSNR was between 29-39dB, indicating that “compare” was better able to identify noise within the files, i.e. corruption in this use case, and that they were not identical. Runtime was a major difference between the tools, Matchbox took about five times as long for its comparison than ImageMagick.
Using Matchbox as a means of ascertaining whether the images are exactly the same is not quite what the tool is designed for. It is designed to identify whether the image content within files is the same, for example, in two scans of the same document: “near duplicates”. There is a presentation about its impressive abilities. ImageMagick’s “compare”, on the other hand, cannot compare two files that have different dimensions. Because of the lower PSNR scores in these tests, which are an attempt to identify success of an image migration (sameness), a better metric may be PSNR as it is more sensitive to smaller differences between images.
If a migration did not involve the lossy compression of files there are other tests that could be used, such as direct comparison of pixel values.