Loading Events

Putting JHOVE to the acid test: A PDF test-set for well-formedness validation in JHOVE

Nov 21 : 10:00 AM - 11:00 AM

In digital preservation we rely on automation and tools for some of our most crucial tasks like format identification and validation. One of the most widespread tools for format validation is JHOVE. As there is no other validation tool which checks the well-formedness and validity of plain PDF files, the quality and infallibility of JHOVE’s PDF module is especially important. Unfortunately, as there are no other tools, checking JHOVE’s PDF skills via tool-benchmarking is not an option.

As of today, there is not a ground-truth data set which can be used to understand and test PDF validation at the structural level. In this webinar, we present a corpus of light-weight files designed to test the validation criteria of JHOVE’s PDF module against well-formedness. Based on the findings of checking this data set with JHOVE, we give an overview of how reliable JHOVE is, what works well and where still are inconsistencies.

Session leads

Yvonne Tunnat, Deutsche Zentralbibliothek für Wirtschaftswissenschaften
Michelle Lindlar, Technische Informationsbibliothek


Registration is open to OPF members.


Nov 21
10:00 AM - 11:00 AM
Event Categories:
Event Tags:
, , ,


Open Preservation Foundation