Archives New Zealand and the University of Freiburg are cooperating on a data recovery project. The archive received a set of 5.25 inch floppy disks from the early 1990s that contained records of a public organization dating back to the mid 1980s. These floppies were not readable in standard X86 machines with a 5,25 inch floppy drive attached to it. The general information on those floppy disks is sparse and the organization wanted to retrieve any files that could be read off of the disks and get any information possible from those files.
This is a very nice use case for a national archive as it often receives objects quite some time after they have been created (20 years++). To be able to recover raw bit streams from obsolete floppies the archive purchased a special hardware device with the ability to make images of floppy disks.
First Step – Bit Stream Recovery
The digital continuity team at Archives New Zealand thought it would be a great opportunity to demonstrate the practical use of the Kryoflux device and gather more information on the work required to incorporate it into archival processes. The first step in the process was to visually examine the disks to find out any technical metadata that was available. The disks had labels that identified them as DS QD 96 tpi, which refers to Double Sided, Quad Density, with 96 tracks per inch. A 5.25 inch drive was attached to the KryoFlux which itself was connected to a modern Windows X86 machine using a USB connection. The KryoFlux works by reading the state of the magnetic flux on the disk and writing that signal into a file on the host computer. Different output options are possible: A proprietary KryoFlux stream image formatted file, a RAW formatted file, and a MFM sector (BTOS/CTOS) formatted image file were all created from the disks.
A major component beside the hardware device is the interpretation software to translate the recorded signal into image files that are structured according to various floppy disk-formatting standards. After recovering a couple of disks it became clear that they were not following any known filesystem standard supported by today’s operating systems. Thus it was impossible to directly mount them into the host filesystem and read the files from them. But nevertheless it was possible to analyse them with a hex editor. This showed that the reading process was producing some meaningful data. A couple of “words” like sysImage.sys were repeated over all readable disk images, thus seeming to represent some structural filesystem data. By searching the internet for this string and others it was possible to deduct that the disks were likely created on a computer running the Burroughs Technologies Operating System (BTOS) or its successor the Convergent Technologies Operating System (CTOS). Fortunately more in depth information could still be found at some site describing the file system. After some more research it was concluded that there is currently no software available to properly interpret disks or disk images formatted with this file system aside from the original software and its (obsolete) successors. As they are no emulators available for this system either so an emulation approach was not a viable option too. Thus a bachelor thesis at the computer science department of the Freiburg University was offered to dig into the problem and finally create an application to interpret the file system on the disks using the information available on the internet.
Second Step: The Interpreter
The preservation working group in Freiburg was able to attract a bachelor student for the task to write an interpreter and file extractor for the images files. This is a nice challenge for a computer scientist as knowledge of operating systems and filesystem concepts are required and could be used practically. As there is no demand for a whole filesystem driver for any modern operating system a bitstream interpreter is sufficient. The Python script programming language was used to write a first prototype of the interpreter as there were no performance requirements and it is very well suited for rapid development. By the end of the year a tool was produced that is able to read the filesystem headers and produce directory listings from them. A partial output looks like:
------------------------Name------------------------------ Image: 1_10.img ------------------------Status---------------------------- VHB Checksum Error File Header Block: OK ---------------------------------------------------------- Directory Filename Size sys fileHeaders.sys 49152 Bytes sys mfd.sys 512 Bytes sys log.sys 0 Bytes sys sysImage.sys 0 Bytes sys badBlk.sys 512 Bytes sys crashDump.sys 0 Bytes SYS DEC/APP.92/1 39424 Bytes SYS 92/AP92.11/f424 3072 Bytes SYS 92/452 33280 Bytes SYS 92/92.62/f426 5632 Bytes SYS 92/App.92/17&18 26112 Bytes SYS 92/92.66/f454 9216 Bytes ... SYS 92/INDEX 11264 Bytes ---------------------------------------------------------- Directories: 2 Files: 29 Total Size: 311808 Bytes
In this example the volume header block (VHB) produces a checksum failure, but with correct File Header Block the simple directory structure is readable. The listing seems to be correct as it reproduces the filenames like sysImage.sys which was readable in the hex editor. With this listing at least some information might be read from the filenames itself. The next stage would be the file extraction feature which could cut single files out of the image or dump all contained files into some folder on the host system. These could be then inspected further on to gather more knowledge on their original purpose.