Call for a Test Set of Files

Call for a Test Set of Files

Memory institutions planning to realise a digital preservation strategy and setup suitable systems face the problem of missing evaluation components. A number of tools for object characterization, migration or rendering in emulated original environments are available or being developed at the moment. But, to evaluate or compare them a proper set of sample objects is required. Those objects could be taken from each organizations individual holdings, but this strategy has some shortcomings:

  1. The objects might be classified and restricted from being removed from the organisations premises and not all staff might be allowed access to them.
  2. Significant backlogs in reading deprecated media to direct storage or in the delivery of objects might draw a biased picture of the relevant object types or lead to some being completely overlooked.

Additionally, it might be favourable to hand out sample sets to developers or contractors. This could offer an opportunity to save time, offload tasks and direct the development as desired. Those sets should be free from any restrictions and privacy concerns as they are to be made publically available to everyone.

A public sample set of numbers of objects of different filetypes is relevant for a couple of reasons:

  1. Checking and extending existing filetype detection libraries or creating new tools (like suggested in Bills Blog),
  2. Definition of a core test set for migration tools and emulated original environments,
  3. Defining a minimum set for a software archive of creating and rendering applications.

Create those files with the original applications might not serve the purpose as those artificial objects might lack the complexity or features found in the original material.

The kind of content sought after could be hosted/curated by the OPF …

24
reads

1 Comment

  1. Maurice van den Dobbelsteen
    February 17, 2011 @ 12:04 pm CET

    Excellent idea, Dirk, such a thing exists in the form of the PLANETS corpus. It can now be accessed through the Testbed. We could make an effort to make it more accessible and to expand upon it.

Leave a Reply

Join the conversation