Beginning of this year we reported on first results of a joint Archives New Zealand and University of Freiburg data recovery project of a set of 5.25 inch floppy disks from the early 1990s. After recovering the raw bitstreams from the floppy disks with a special hardware device the resulting image files were sent over to Freiburg for further analysis. After being able to establish the file lists contained on each floppy it is possible now to extract single files.
Of course it was possible to sneak at the probable file contents before by opening the floppy image file in a hex editor. But this makes it very complicated especially for non-text files to distinguish between file boundaries. Depending on the filesystem used a file is not necessarily contained in consecutive blocks on the storage medium.
For the purpose of the Archive and the public institution donating the data it is not required to re-implement the filesystem driver of the old platform for some recent one as most probably nobody wants to write files on floppy disks for this architecture again. But nevertheless a thorough understanding of the past filesystem is required to write some tool which can at least perform some basic filesystem functionality like listing the content of a directory and reading a specific file. For fast prototyping and because processing speed and efficiency is not an issue here the Python scripting language was chosen by the student endeavoring this task in his thesis. After the first implementation step to read the directory content, the second step to read actual files was achieved.
Fortunately the project was started early enough so that all relevant information which was coming from one specific site (www.ctosfaq.com) on the net was recovered in time. This site went down and did not leave relevant traces either in the Internet Archive nor in the publicly accessible cache of the search engines. This is a nice example of the challenges digital archeologists face. It gives recommendations for the future to store all relevant information on a past computer architecture within the memory institutions and not to rely on the net too much.
The recovery experiment was run on 62 disk images created by the team in New Zealand. In three of those 62 images the File Header Block was unreadable. Two of the failing images had just half the size as the rest of them 320KByte instead of 640KByte. This issue lead to unavailable file information like file address on the image and file length. For the third failing case it is still a bit unclear why the File Header Block is unreadable. This comes to a total of 59 readable images with a total of 1332 identifyable files in them. The text content of the failing disk images was transfered to a single text file per image. At the moment the issues are investigated together with the manufacturer of the reading device. It might be possible to tweak the reading process and extract more information that way to add the missing pieces for the failing images. This might led to some deeper insight into the procedure and some best practice recommendations.