Improving JHOVE Through International Collaboration

PDF Eh? – Another Hackathon Tale

OPF Hack Week Summary

It’s now over two weeks since the OPF JHOVE Hack Week finished, though following the Easter break it doesn’t feel that long ago. The OPF are currently reviewing and merging the contributions we received. The review seems a good opportunity to write a quick blog post to give an overview of the Hack Week contributions and detail an upcoming fix to the JHOVE TIFF module.

The OPF would like to thank all of the organisations and individuals who contributed. As the pull requests are merged we’re closing the associated issues. We estimate that the hack week contributions amount to between one and two months effort. The tasks undertaken were generally long standing technical debt issues that haven’t been prioritised for releases. Getting on top of these consolidates the code quality improvements released in v1.22. While many of the changes don’t add new features to JHOVE, they improve reliability. This makes the development of new features easier.

The issues addressed during Hack Week were:
– unnecessary instantiation of Boolean types [424] fixed by nvanderperren from PACKED;
– error prone checking of string literals [415] fixed by nvanderperren from PACKED;
– externalisation of error messages in the PNG module [134] fixed by samalloing from The National Library of the Netherlands and jacobtakema from The National Archives of the Netherlands;
– use of local variables as opposed to class members [420] fixed by rgfeldman from Quotient;
– enhancements to JPEG2000 MIX output [106] provided by tledoux of Bibliothèque nationale de France;
– unhelpful offsets in PDF validation [268] & repetition in PDF Module error reporting [165] provided by the Rosetta Development Group from ExLibris; and
– a selection of JavaDoc and code maintenance fixes [450] provided by david-russo of The British Library.

There’s a few minor merge conflicts requiring careful review which is progressing steadily. We anticipate that this will be finished by w/e 10th May 2019.

On a less positive note we’re aware of an issue in the TIFF Module v1.9.1 relases in JHOVE v1.22. It’s a side effect of TIFF information reporting changes combined with alterations of to the error message structure. The issue reported here and here causes TIFF validation to fail on occasion with a stack trace which resembles:

Apr 29, 2019 3:31:54 PM edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.IllegalArgumentException: Cannot format given Object as a Number
at java.text.DecimalFormat.format(Unknown Source)
at java.text.Format.format(Unknown Source)
at java.text.MessageFormat.subformat(Unknown Source)
at java.text.MessageFormat.format(Unknown Source)
at java.text.Format.format(Unknown Source)
at java.text.MessageFormat.format(Unknown Source)
at edu.harvard.hul.ois.jhove.module.tiff.IFD.addIntegerProperty(IFD.java:417)
at edu.harvard.hul.ois.jhove.module.tiff.ExifIFD.getProperty(ExifIFD.java:513)
at edu.harvard.hul.ois.jhove.module.jpeg.JpegExif.readExifData(JpegExif.java:164)
at edu.harvard.hul.ois.jhove.module.JpegModule.readAPP1(JpegModule.java:1112)
at edu.harvard.hul.ois.jhove.module.JpegModule.parse(JpegModule.java:650)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:788)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:560)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:432)
at edu.harvard.hul.ois.jhove.viewer.JhoveWindow.openAndParse(JhoveWindow.java:618)
at edu.harvard.hul.ois.jhove.viewer.JhoveWindow.pickAndAnalyzeFile1(JhoveWindow.java:391)
at edu.harvard.hul.ois.jhove.viewer.JhoveWindow$ParseThread.run(JhoveWindow.java:877)

We’re currently testing a fix for the TIFF module. This will be released as a stand-alone TIFF module update and will be available by 2nd May. It will be the first time that a fix to a module doesn’t require a full JHOVE release for distribution following the decoupling of module and core code build and versioning in JHOVE v1.22. This makes it much easier and quicker for OPF to release module fixes and JHOVE users should benefit from this in the future.

Thank you again to everyone who got involved in the Hack Week – it was a great demonstration of open source digital preservation collaboration!

Leave a Reply

Join the conversation