FMT 7,8,9,10

PDF Eh? – Another Hackathon Tale

I wonder if many people have had a chance to dig around in the latest signature release for DROID?

There is an interesting format change that I think warrants some discussion.

As I understand it, historically tif has been a problem for DROID based ID. Pronom has records for TIFF versions 3, 4, 5, and 6, but the signatures all matched the same hex strings, meaning that the best DROID can do is to return a multiple match to the corresponding PUIDs.

There has been some discussion over the past 12 months (probably more) about how to derive a single PIUD.

It looks like the answer has been to depreciate fmt 7/8/9/10 in favour of a single tiff based PIUD that uses the single signature used across fmt 7/8/9/10 previously.

I can see the logic, and actually the approach is something we have been doing in New Zealand for a while, resolving the multiple PUIDs into a single identifier.

But I think it was a mistake to depreciate fmt 7/8/9/10, and hopefully these graphics will explain why. 

Prior to the change, we can visualise the TIFF PRONOM descriptors like this: http://imgur.com/799FQ.jpg

There is a parent class with no PRONOM label, and hierarchical links progressing through the format versions (via the [Related file formats] tags in PRONOM).

The move to depreciate fmt 7/8/9/10 means we can visualise the TIFF PRONOM descriptors like this: http://imgur.com/eRqmG.jpg

 

 

There is now a single class, with no lower level descrpitor for TIFF version.

 

I propose that the change should have been made along the lines of: http://imgur.com/9wafA.jpg

 

 

The change I propose is simple. We ‘re-preciate’ fmt 7/8/9/10, but keep the new higher class of general TIFF descriptor, fmt/353.

 

I justify this proposal by saying that the change to PRONOM seems to be more related to the act of reliably identifying discrete PUIDs contained as a record in PRONOM.

I would argue that we still need the ability to describe TIFF objects at a version level, but are not able to do so anymore via the ‘language’ of PRONOM.

This is a loss of information, and I hope is something that is revisited. There are some large implications for many of us who use both TIFF files, and PRONOM as the language of format description.

There is much work to be done with the TIFF format, no least to understand how we describe ‘side data’ sufficiently, and where we draw a boundary around format variation. I feel that to simply move to a single descriptor for a whole class of format that we all use is troublesome, and is moving further away from a adequate level of format description.

I really don’t want this to appear as a rant against PRONOM, I have huge amount of respect and appreciate for the efforts that go into making this data available to us, however I do think we should be having a slightly broader conversation about the persistence of PRONOM records…

 

 

Leave a Reply

Join the conversation