On June 8th and 9th, I attended the PDF Days Europe 2015 Conference organised by the PDF Association in Cologne. It was a chance for me to meet people from the PDF industry who’ve shown particular interest in the veraPDF project. The veraPDF project also gave a 40-minute presentation on the first day.

Monday was Education day aimed at raising awareness of industry developments and issues. The day began with a large audience, guestimate of 350, for the welcome session. Olaf Drummer and Thomas Zellerman talked a little about the PDF Associations history and mentioned veraPDF as a presentation of interest. The presentations that made up the rest of the day were split across three concurrent tracks. Tracks 1 & 2 were German speaking so I chose track 3, helpfully titled “In English”.

The first session was a personal but informative history of the PDF format from a user but also industry insider’s perspective, Matt Kuznicki of Datalogics. Matt’s message was that the PDF format has been at its best when the industry has concentrated on its core concerns, i.e. accurate, device independent presentation, document security and encryption. He argued that some of PDF’s niche features were introduced as Adobe leveraged their reader to compete with other applications, e.g. browsers and media players.

After lunch, Bernard Wild talked about the possible effects of the European Commission’s Electronic identification and trust services (eIDAS) regulations on PDF. My own observation is that producing digital signature systems that are: easy to use; cross device (particularly mobile); as well as been secure/trustworthy, is currently a focus point for many businesses in the PDF industry.

Next Boris Dubrov of Dual Labs gave an introduction to veraPDF, the PDF/A validation project led by OPF and PDF Association. Boris described:

  • the project’s history and structure, highlighting that it’s a consortium funded by the PREFORMA project that’s committed to producing open source software;
  • the technical approach taken to validation:
    • the use of a domain specific language to describe a PDF validation model;
    • atomic XML validation profiles with JavaScript evaluations; and
    • atomic test copora of validation cases;
  • the minimal viable product release of a validation GUI; and
  • the project’s aims and schedules.

The day’s final presentation was given by Maruan Sahyoun of the Apache PDFBox project. PDFBox is an open source PDF Toolkit written in Java that offers content and metadata extraction as well as programmatic creation and manipulation for PDF documents. We’re currently using the unreleased version 2 of the PDFBox parser to extract document information on the veraPDF project. Maruan and I briefly discussed cross-project collaboration between PDFBox and veraPDF, as we’ve introduced some validation specific parsing functionality we’d like to incorporate into PDFBox itself. These changes will be offered back to the PDFBox project as pull requests.

I’ll cover the second day’s presentations alongside a summary of the upcoming Preserving documents forever: PDF Day, that I’ll be attending for veraPDF on 15th July.

By Carl Wilson, posted in Carl Wilson's Blog

