Blogs: Characterisation

Blog posts filtered by the Characterisation subject tag.

Browse blogs by subject

Course Overview If you have a digital preservation strategy that involves digital files, you’ll know how important it is to understand the file formats in which your data is encoded. To do this comprehensively involves at least three main operations: identifying the format, characterising the format, and validating the format. To put it another way, […]

By Becky, posted in Becky's Blog

18th Jul 2017  12:00 AM  0 Reads  No comments

Earlier this year I blogged about Isolyzer, a tool designed to help the detection of broken ISO images. Today I released a shiny new beta version that adds a significant amount of new functionality. Below is an overview of the main changes, followed by some warnings and caveats. Support of more file systems Where previous […]

By johan, posted in johan's Blog

12th Jul 2017  3:06 PM  373 Reads  No comments

In my previous blog post I addressed the detection of broken audio files in an automated workflow for ripping audio CDs. For (data) CD-ROMs and DVDs that are imaged to an ISO image, a similar problem exists: how can we be reasonably sure that the created image is complete? In this blog post I will […]

By johan, posted in johan's Blog

13th Jan 2017  3:30 PM  5468 Reads  5 Comments

At the KB we have a large collection of offline optical media. Most of these are CD-ROMs, but we also have a sizeable proportion of audio CDs. We’re currently in the process of designing a workflow for stabilising the contents of these materials using disk imaging. For audio CDs this involves ‘ripping’ the tracks to […]

By johan, posted in johan's Blog

4th Jan 2017  2:38 PM  1504 Reads  3 Comments

On 11th October we held our first JHOVE online hack day. Our aim was to catalogue error messages produced by JHOVE to get a better understanding of their meaning and potential preservation impact. Background: organising an online hack day We have been considering running online hackathons because attending face-to-face events has become more difficult as […]

By Becky, posted in Becky's Blog

19th Oct 2016  10:06 AM  1353 Reads  No comments

For anyone dealing with a relatively small number of records, compared to say an internet or data archive, a reasonable process for ingest of material into your digital preservation system might be: 1. Process files with a file format identification tool 2. Per 1. process files with a file format validation tool 3. Per 1. […]

By ross-spencer, posted in ross-spencer's Blog

13th Mar 2016  5:27 AM  1988 Reads  No comments

Hi, this is my first blog post in which I want to introduce the project I am currently working on: Flint. history Flint (File/Format Lint) has developed out of DRMLint, a lightweight piece of Java software that makes use of different third party tools (Preflight, iText, Calibre, Jhove) to detect DRM in PDF-files and EPUBs. […]

By alecs, posted in alecs's Blog

2nd Jul 2014  12:53 PM  11108 Reads  No comments

I have been working on some code to ensure the accurate and consistent output of any file format analysis based on the DROID CSV export, example here. One way of looking at it is an executive summary of a DROID analysis, except I don't think executives, as such, will be its primary user-base.  The reason for pushing […]

By ross-spencer, posted in ross-spencer's Blog

3rd Jun 2014  7:20 AM  11380 Reads  1 Comment

Well over a year ago I wrote the ”A Year of FITS”(http://www.openpreservation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit […]

By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog

28th May 2014  9:30 PM  13896 Reads  1 Comment

This post covers two main topics that are related; characterising web content with Nanite, and my methods for successfully integrating the Tika parsers with Nanite. Introducing Nanite Nanite is a Java project lead by Andy Jackson from the UK Web Archive, formed of two main subprojects: Nanite-Core: an API for Droid    Nanite-Hadoop: a MapReduce […]

By willp-bl, posted in willp-bl's Blog

21st Mar 2014  1:58 PM  13820 Reads  No comments