Automation in Preservation Action and Object Access

After having emulators conceptually established in digital preservation and having proved to run them actually for different original environments, the next logical step is their preparation for main-stream. To deploy emulation in e.g. Migrate workflows is a bit more complex than to wrap a simple converter like ps2pdf but might earn a much wider covering of different object types than a number of stand-alone converter tools. Same is true for emulation based Create View.

As emulation becomes a more and more mature strategy in digital preservation, the integration into actual preservation frameworks is the next big challenge (Seven Steps for Reliable Emulation Strategies …). Emulators reproduce original software and/or hardware environments on different layers. They help at least on the hardware and operating system level to run the original dynamic object or the original applications objects were created or are usually viewed with. This works very well in the traditional user interaction mode as most of the applications and emulators were meant to be directly operated by humans. This old paradigm generates a new set of challenges in digital preservation:

  • Today’s users are less and less acquainted and comfortable with user interfaces that are 20+ years old.
  • When using emulation for migration on a large scale, all relevant workflows have to be automated to be feasible.
  • The same is true for “mass views” (produce hundreds Create View in a row), e.g., for manual comparisons: If an archivist in charge of visually checking the rendering results of original and migrated versions of a number of artefacts, he shouldn’t do all the object loading by hand.

Solving these issues will also require solutions to a number of related challenges:

  • Find ways to automate interactive user sessions.
  • Define proper interfaces an emulator has to provide to allow framework integration (A Future Emulation and …)
  • Archive secondary software objects to be able to reproduce original environments (see this blog or the OR2010 paper). The PLANETS services defined two major workflows for which emulation could be relevant: Migration and Create View.

Approaches for Automation

The need to automate repeated workflows is not new to applications and operating systems. Nevertheless, this could be tackled on different levels and with different approaches:

  1. Built-in OS Methods: Using the built-in functionality of the operating systems. The autoexec.bat (in DOS) or system.ini (in early Windows) are examples of the older methods to fire up a certain application after the initialization of the operating system. Additionally provide the filename(s) of the artefact to load with the application. This triggers the automatic opening of the desired objects in the application. Both methods imply the modification of the emulators, virtualization tools system images, before starting the original environment. It could be done directly by modifying the files in question directly on the virtual hard disk or by referencing to another file sitting on the medium the artefact is on in order to be transported into the original environment, like running “a:\run.bat” from autoexec.bat sitting on the C: drive. This approach is being researched at the moment in Freiburg in the context of an ongoing Bachelor thesis.
  2. Assisted Emulation for Legacy Executables: Kam Woods and Geoffrey Brown suggest adding a helper application to the original environments: “The approach we advocate is to preserve necessary contextual information through scripts designed to control the legacy environment, and created during the preservation workflow. We describe software designed to minimize dependence on this knowledge by offering automated configuration and execution of emulated environments. We demonstrate that even simple scripts can reduce impediments to casual use of the digital objects being preserved. We describe tools to automate the remote use of preserved objects on local emulation environments.”
  3. VNC Interface: Both methods presented above imply the modification of the original environments and are dependent on the actual operating systems and applications running. A much more abstract approach is just to look at the hardware user interface: What is typed on the keyboard, input via mouse actions, and put out to the screen? This is abstracted by the VNC interface available for a wide range of very different operating systems and hardware platforms (ICDL 2010, ercim news). Some emulators already implement a VNC interface like QEMU and Virtual Box. The capability was added to Dioscuri by the author of a (now completed) bachelor thesis in Freiburg. There are some more theses open to find someone to research into and add VNC to other emulators, too, like MESS. VNC has some shortcomings, too: In Create View an integrated audio channel might be useful for some object types. Even the object transported into the virtual machines is part of alternative remote desktop/screen protocols, which are, however, not as open and widespread as VNC.

The described methods are not exclusively for emulation but could be deployed for any migration workflow using interactive tools running in a controlled environment, like a virtual machine.

Assisted Create View and Migration-by-Emulation

A first prototypical approach to (assisted) Create View was undertaken during PLANETS with the GRATE tool. It wrapped a number of different emulators by running them on a Linux platform and exporting the user interface via X session made accessible via VNC. This had the advantage of not requiring that a VNC interface be available in every emulator included, but implied a number of complexities complexities in running the system. It was available during PLANETS as a simple Create View service from testbed. It implemented just some fixed view paths, which were dependent on the original environments deployed to GRATE. To allow reliable automation of interactive sessions a modified approach was taken with GRATE-R (demonstration services running in Freiburg; access upon request, ref. Automation of Flexible Migration Workflows). Both tools focus on remote emulation, which ought to be discussed a bit more in some other post; the same for local emulation access systems (aka “emulation workstation” or “reading room system”).

Assisted Create View is being explored in two bachelor theses at the moment, using the built-in operating system and application capabilities and using the high level VNC approach as described above. Migration-by-Emulation is being researched at the moment at a more in-depth level in the context of a master thesis on “Emulation workflows in Digital Preservation”. The goal of this thesis is to check on the requirements and prove the feasibility of emulation approaches in (large scale) preservation workflows like format migration. The existing early PLANETS case studies (paper, IJDC) for such workflows are extended and integrated into the Open Planets Foundation framework and testbed. The aim is an atomic flexible migration service which accepts a wide range of input formats and can produce various output formats depending on the original environments provided. The results are to be verified and the performance of different workflows is to be measured.

At the moment we face two major challenges: One is the missing tool registry, as discussed here at the OPF blog in several posts. The other is the unclear licensing state. Thus we did not announce our test services more publicly (GRATE and GRATE-R). IPR issues are discussed as a risk for the digital heritage by Susan Corbett, Victoria University (“Digital Heritage: Legal Barriers …”, 2007) and by the KEEP project (Legal Aspects of Emulation, iPRES 2010). Besides this, the software archiving remains an important issue as well.


Leave a Reply

Join the conversation