Help Wanted! Digital archivists, writers and coders needed to improve format validation

PDF Eh? – Another Hackathon Tale

April’s JHOVE hack day was another great success covering a range of development and non-development tasks; issues and pull requests were closed, sample files were found, user documentation was reviewed, and our knowledge on JHOVE errors was expanded upon. We don’t want the great work achieved this day to stop there though! There’s still plenty that needs to be, and can be, done before our next hack day.

But we need your help! So if you’re a digital archivist with digital files to test, a developer in need of a challenge, someone struggling to use JHOVE, or just someone with a keen eye for punctuation, we want to hear from you.

We have several aims we’re working to achieve. Firstly, we want to make sure JHOVE is robust and the validation results accurate through improved testing. We also want to make sure that the validation results are clear and understandable and that we know what errors are important for long term preservation. Finally, we want to make JHOVE friendlier for first time users through enhanced documentation.

So how can you help move things along? Carl’s post alluded to a few ways in which you could contribute:

Whether you’re a developer or not, if you can spare some time to work on these – perhaps trying to find sample files or reviewing documentation – then we want to hear from you. Don’t worry if you’ve never been involved in the hack days before, just drop myself or Carl a line and we’ll get you started.

Finally, if you want to support JHOVE improvements but cannot commit to any of the above, you can always show your support through a financial donation.

Together we can make JHOVE reliable, easy to understand and transparent for everyone. Together we can make JHOVE awesome!

19
reads

3 Comments

  1. [email protected]
    June 28, 2017 @ 2:06 pm CEST

    Hi,

    I am about to contribute a test file but am not sure how to do this. Jhove identifies the format of this file (https://www.dropbox.com/s/iooxgtrml1t87us/HtmFile634075222388527922.htm?dl=0) as UTF-8, well-formed and valid. Droid finds it is Hypertext Markup Language, text/html, fmt/96. Where should I upload the file and where should I write why I think Jhove misbehaves when analyzing this file?

    PS. The file is shareable – no access rights.

  2. Peter May
    June 27, 2017 @ 1:31 pm CEST

    Hi Jay,

    Real world are perhaps better, however synthetic files are useful too!

    We (at BL, at least) have created some sythentic files for some of the errors (which I believe are linked to in the google spreadsheet linked to above)

  3. jaygattuso
    June 25, 2017 @ 8:14 pm CEST

    re sample creation. Do you want real world examples only, or do you want some synthetically broken examples that might trigger the error cases? I’m thinking it might be viable to put together variable changing file creation tool for some of the formats, resulting in possibly invalid but useful files

Leave a Reply

Join the conversation