Digital Forensics and Emulation for Preservationista

Many of the tools and practises developed for the digital forensics field can be integrated into digital preservation techniques. This is particularly true regarding:

  • processes for securing, analysing, and appraising material prior to ingest into a repository or digital archive.
  • donated personal digital archives, where physical hardware and media are acquired, rather than digital content.

Digital Forensic best practises deal with:

  • Acquisition: activities to secure and preserve the state of physical and digital evidence. These include disk imaging, metadata creation, and producing authentic copies for examination. These techniques can also be used to “secure and preserve the state” of physical media and digital content.
  • Examination: a rigorous, systematic examination of data to locate information of interest to the investigation. Methods here include duplicate detection, identifying system files from Operating Systems, programs, etc., detecting encryption, detecting personal data, and time line analysis. These techniques can complement existing characterisation methods in the digital preservation field.
  • Analysis: an often manual analysis of extracted data, evaluating it for relevance to the investigation. A digital preservation practitioner’s activities might include assessing the relevance digital material to the collection, finding or removing personal information, or creating a content profile.

These forensic tools and methods combined with established digital preservation tools and techniques can provide a pre-ingest workflow that:

  • secures data at the point of acquisition.
  • allows tools to be run on imaged copies protecting the source media and data.
  • provides the metadata required to make informed decisions regarding the content.

What those decisions are in practise depends on variable factors including the type of access to be provided, permissions granted by the rights holder, and institutional policy.

One scenario might be that a subset of content is extracted from the image and ingested into a repository. Access to this material could be provided through an emulated environment, the choice of environment and rendering software informed through metadata gathered during the pre-ingest process.

When requirements and permissions allow however a potentially exciting emulation opportunity may present itself. It is possible to virtualise some of disk images created from the original media. The image must contain a supported working operating system and the process isn’t certain to be successful. When this approach works access to the material can be provided through an emulated version of the original machine, which may have belonged to an author or researcher. This article in The Atlantic, and this one from The New York Times describe a nice example of the use of these or similar techniques to preserve the personal digital archive of Salman Rushdie.

The OPF is hosting two hackathons focussing on these themes:



Stay informed