While JHOVE does the job a lot of the time, it is no secret that there are tools out there that pick up on PDF validation errors that JHOVE doesn’t. One of my personal PDF-goto tools is pdfcpu – a powerful and no-fuzz pdf processor with fairly solid validation capabilities.
On pdfcpu’s github repo, I stumbled upon an issue related to the page attribute Tabs. In the comments to that issue you can see that Peter Wyatt of the PDF Association pinpointed the error to a bug in LibreOffice which will be fixed in LibreOffice 7.5.4. Since it’s a bug in a somewhat widely used software suite, it peaked my interest and I asked myself three questions:
- What are /Tabs?
- Should I care?
- Does JHOVE pick up on the error?
What are Tabs?
A quick look in the PDF spec tells us that Tabs is an entry in a PDF’s page object amd “a name specifying the tab order that shall be used for annotations on the page”. Say what?
Let’s start from the beginning. Annotations are things that can be added to a page to help clarify or augment the content. Examples for annotations are the sticky notes you can add to PDFs or highlights to text or stamps. A good introduction to Annotations can be found in this blog post by Kieran France. In addition, interactive forms use so-called widget annotations to define the visual appearance and user interactions of fields.
While the Tabs key is not required for annotations (the exception being PDF/UA, we will get to that in a minute), it does serve a very specific and important purpose. It controls the order in which the user can navigate to the annotations – as the name suggests – using the Tab key (or through screen readers).
There are 5 different ordering possibilites represented by values for the key Tabs:
|R||Row||Navigation is row by row. The direction within the row is determined by the Direction entry in the viewer prefences (left to right or right to left).|
|C||Column||Navigation is column by column. The direction within the column is determined by the Direction entry in the viewer prefences (top to bottom or bottom to top).|
|S||Structure||Navigation is based on the order in which the different annotation objects appear in the logical structure of the pdf.|
|A||Annotation Array Order||Navigation is based on the order in which the annotations are listed within the Annots array.|
|W||Widget Order||Navigation is based on order within the Annots array, where all widget annotations are visited first based on their order within the array, followed by all other types of annotations in their order within the array.|
To understand and better illustrate this, I created a sample file with 3 form fields. The native structure I created is shown below in “native structure order /S”. The Annotation Array contains obj 50 (Textfeld 3), obj 59 (Textfeld 1) and obj 61 (Textfeld 2). After testing and documenting the tab order, I changed the Tabs value to row order /R and subsequently to column order /C, re-testing and documenting navigation behavior in both cases. In all 3 files the only thing that was changed was the one value – the file size remains identical, yet the behavior in form of navigation has changed.
As an additional test, I used Adobe Acrobat Pro’s function to set the tab order manually (Prepare Form -> Edit -> Order Tabs Manually). While the previous order was Textfeld 3 – Textfeld 1 – Textfeld 2, I changed it to ascending order Textfeld 1 – Textfeld 2 – Textfeld 3. After saving the PDF in Adobe, the navigation had changed as expected. Inspecting the file with itext RUPS I could see that Adobe had changed the Tabs value to W. The order of the objs in the Annot array, however, had not changed – it was [50 0 R 59 0 R 61 0 R] in both cases. What had unexpectantly changed was the assignment of the form fields to obj numbers. While it was previously obj 50 (Textfeld 3), obj 59 (Textfeld 1) and obj 61 (Textfeld 2) it was now obj 50 (Textfeld 1), obj 59 (Textfeld 2) and obj 61 (Textfeld 3). In other words: Adobe Acrobat appears to have re-assigned the widget Annotations to different object ids.
The test files I created are currently available here, if you would like to play around with the navigation yourself.
As a magical mystery side note – and hopefully someone with a lot more PDF knowledge than me will read this and have an explanation for this: re-changing the W value in the Adobe-modified file (Tabs_W_after_Adobe_mod.pdf) to S brings the original navigation structure (Textfeld 3 – Textfeld 1 – Textfeld 2, which is now obj 61 – obj 59 – obj 50) back to life. My expectation was that S would navigate on a tree-structure basis by obj ids – so navigate from Textfeld 1 -> Textefeld 2 -> Textfeld 3 and not 3 -> 1 -> 2).
Long story short: What are /Tabs? A way to steer how the user can navigate annotations.
Should I care?
In case you are not already convinced that functioning navigation is important let me add one thing: while the /Tabs key entry is optional for plain PDF, it is mandatory for PDF/UA. Whenever an Annotation is present in a PDF, the page dictionary has to have a Tabs key set to S. The logical structure of the tagged PDF/UA is the prerequisite for universal accessibility.
Does JHOVE pick up on the error?
In order to answer that, let’s go back to the original probelm which lead us down this rabbit hole: the pdfcpu github issue. The LibreOffice test file attached to the issue throws the pdfcpu error:
validation error (obj#:10): pdfcpu: validateNameEntry: dict=pagesDict entry=Tabs invalid type types.StringLiteral
And looking at the test files’s internals (or reading Peter Wyatt’s LibreOffice bug report) we can see that the problem is a simple one: instead of the expected /Tabs/S (or /Tabs /S … the space is optional) the LibreOffice generated PDF contains a /Tabs(S). That’s all there’s to it.
So does JHOVE pick up on the error?
The answer is: no. The file is declared well-formed and valid.
That brings us back to the “Do I care” question. As per ISO 32000 Tabs is not a mandatory key in the Pages dictionary. So the error shouldn’t result in a “Not well-formed” result since JHOVE is checking against “plain PDF” and not PDF/UA. But – if a key is present it should be used correctly. As this isn’t the case with “/Tabs(S)” I would expect the outcome “Well-formed, but not valid”.
And thus, we’ve come full-circle with one github issue leading to another ; -) The request to add sanity check for present Tabs keys has been opened as JHOVE Issue 854.
I’m curious to hear from others how they would rate the problem and am thankful for any pointers to things I have interpreted wrongly or missed!