“Format” Identification/Characterization for Hard Drive Disk Images

“Format” Identification/Characterization for Hard Drive Disk Images

Emulation requires disk images to be provided and used for the main hard drive of emulated computers. These disk images can be captured from old hard drives as discussed here. In order to automate aspects of running these disk images in an emulator a tool is needed that tells us which emulators would be compatible with the image and how to configure them. In this post I identify some of the requirements of such a tool and seek feedback from the community about them and the concept in general.

A tool for identifying and configuring emulators that are compatible with captured hard drive images would need to do the following:

  1. identify hard drive image formats (in order to know how to read their contents to identify other important information)
  2. identify if an operating system was installed on them (in order to know whether they can be directly executed in an emulator or would need to be attached to an emulator running from another disk image)
  3. identify what operating system was installed on them (including versions) (in order to know what emulators and emulated hardware should be compatible with the images)
  4. identify the hardware the system has been running on previously (in order to identify the hardware requirements of the emulated hardware)

It would also be good to be able to identify other software installed on the operating system environment stored on the disk image in order to identify whether any hardware it relied on could be supplied by emulated hardware but couldn’t be identified in step 4.

This information could then be mapped to emulators and the hardware available in them. This would enable the automatic configuration and execution of the appropriate emulator to load the disk image on emulated hardware.

Disk images could theoretically be assigned identifiers to be mapped to emulators and emulated hardware configurations/profiles based on either:

  1. The combination of software they have on them
  2. The Operating System (OS) they have on them
  3. The hardware configuration they were previously running on that the operating system has drivers configured for
  4. Generic emulated hardware profiles that the disk images are compatible with

There is huge potential complexity in this approach. Every variant of installed hardware could require a new “format”, every variant of installed software profile could require a new “format” and every combination of installed software profile and hardware/driver configuration could require a new format.

In order to simply the process it may be possible in many cases to just identify which OS is installed on the disk image. This may often be enough to configure an emulated environment to successfully execute the software installed on the disk image. This could be achieved by using a generic configuration that is known to work with that OS.

In other cases the environment may require a specialised hardware configuration that would require more extensive information to identify. Identifying generic emulated hardware profiles that are compatible with the hardware configured on the disk image will require identifying that previously configured hardware (i.e. 3. above). The concept of this approach would be to first identify the hardware components previously used to run the software installed on the disk image. These hardware components could then be compared against the set of available emulated hardware and the previously used components could be matched with compatible emulated components.  

Developing a tool to identify 2. the installed operating system, seems like a quick-win piece of work. Developing a tool to identifying 4. generic emulated hardware profiles that the disk images are compatible with, seems like a much harder piece of work that may require a lot of documentation, but would potentially be much more useful. The work to develop a schema for documenting hardware and software environments (and to document them) being carried out by the Totem Project  may be able to be utilized to help realise such a tool. 

There are analogies between this approach and the approaches used by file format identification tools. It is interesting to consider hard drive images as files that require applications (i.e. emulators) to execute them. As such there is potential to repurpose format identification or characterisation tools for the purposes outlined above.

The purpose for this post was threefold:

  1. To raise the idea of such a tool within the OPF community
  2. To seek feedback on the value of such a tool, is such a tool/approach work pursuing?
  3. To seek advice on whether there are any tools out there currently that perform this role or could be repurposed to perform such a role (both JOHVE and DROID have potential here).

 Any comments or feedback would be greatly appreciated.




  1. ecochrane
    February 13, 2012 @ 3:08 am CET

    Your post has lead me to conclude that this could form the basis of a student project. The project could involve :

    1. producing a linux distro to use for automated disk imaging for digital preservation.

    2. Producing a workflow management tool/gui that would enable:

    a) Auto Identifying the disk image format using available tools.

    b) Auto convert the image to emulator/virtual machine compatible formats (e.g. using qemu-convert)

    c) Automatically add drivers for emulated/virtualised hardware.

    It does seem that most/many of the functional parts of such a project are already available they just need to be combined and/or tweaked for these purposes and given a nice user-friendly interface. 

  2. ecochrane
    February 13, 2012 @ 3:00 am CET

    Thanks for the suggestion, i’ll have to try this out. 

Leave a Reply

You might also like…

Post icon

EaaS: Image and Object Archive — Requirements, Implementation and Example Use-Cases

bwFLA’s Emulation-as-a-Service makes emulation widely available for non-experts and could prove emulation as a valuable tool in digital preservation workflows. Providing these emulation services to access preserved and archived digital objects poses further challenges to data management. Digital artifacts are usually stored and maintained in dedicated repositories and object owners want to – or are required to – stay in control over their intellectual property. This article discusses the problem of managing virtual images, i.e. virtual harddisks bootable by an emulator, and derivatives thereof but the solution proposed can be applied to any digital artifact.

Post icon

Spinning up the first 8″ disk

Proceeding with our 8″ recovery project: After a short while the 24V power supply and the cable adapter board arrived from the U.S. Both have…

Post icon

Weirder than old: The CP/M File System and Legacy Disk Extracts for New Zealand’s Department of Conservation

We’ve been doing legacy disk extracts at Archives New Zealand for a number of years with much of the effort enabling us to do this work being done by colleague Mick Crouch, and former Archives New Zealand colleague Euan Cochrane – earlier this year, we received some disks from New Zealand’s Department of Conservation (DoC) which we successfully imaged and extracted what was needed by the department. While it was a pretty straightforward exercise, there was enough about it that was cool enough to warrant that this blog be an opportunity to document another facet of the digital preservation work we’re doing, especially in the spirit of being another war story that other’s in the community can refer to. We do conclude with a few thoughts about where we still relied on a little luck, and we’ll have to keep that in mind moving forward.

Join the conversation