With the final presentation of the student discussing his CTOS filesystem recovery study in Freiburg it is time to draw a conclusion and summarize the lessons-learnt. The original problem was a set of CTOS floppies (this fact was to be discovered first, see a previous post on this topic). It represents a microcosm of the archival challenges at the beginning of the information recovery chain.
It is not limited to this special type of 5,25” floppy disks as the technological change renders computer platforms and general technologies like disks of various formats, peripheral buses or application programming interfaces obsolete. This phenomenon is on an abstract level not much different from traditional media like paper, audio records or films even with completely different technology. Like with traditional media, the containing information needs at least to be copied forward to remain accessible over the time.
The recovery study was conducted in two phases. Originally, 62 disks received by the Archives New Zealand were considered and analyzed (see the attached Python script ooxtract.py). Later on, the New Zealand – German team was joined by D. Schmidt from the U.S. working for RetroFloppy, getting to know the ongoing forensic activities from the OPF blog. He pursued a bit different approach to file recovery and added a couple of more images to the test corpus. Those were a set of floppies from the US coast guard studied at RetroFloppy. Both corpora combined spanned a time period of mid 1980ies to mid 1990ies. Different to the NZ set many non-ASCII files were contained in the second set. Finally, there were 1889 files successfully extracted, 1789 files with an active file header and 100 “deleted” files. With the files a couple of meta data like file size and creation/last date of change was recorded. A special meta datum were passwords, which are evaluated by the operating system only and did not hinder at all the interpretation of the file content. A funny side note here were the recovered passwords if set on some of the files of the second set: A couple of birthdays or other relevant date for the person which “secured” the file, some wife’s names, some randomness.
D. Schmidt used a different approach regarding hardware and software. The extraction bases on FC5025, a USB-attached circuit board that interfaces to a number of once-common 360 KByte and 1.2 MByte floppy drives. Similar to the Kryoflux device, used in the NZ study, it reads flux transitions, but exposes much less detail to the user. The Open Source driver of the FC5025 got extended to read CTOS formatted floppies and extract single files from them. He created a second version of the xtract program that uses the Master File Directory (MFD) to circumvent some potential errors which could arise by a simple sequential search of the file headers. The interesting observation was that some more files could be recovered by the sequential search. Deletion in this file system meant only to remove the directory entry (common to other file systems like DOS too).
The recovered files were briefly tried to identify by the Linux file utility. Of course some other file type detection tools should have been checked, but this was not the primary focus of the experiment. The detection rate was weak, most of the files (1635) were just some unidentified binary data and a couple of identifications like of the dBASE3 or Microsoft icon source type were simply wrong. Of the 1635 files 838 were identified by different means (the file extension) as word processor files. This highlights again the problem the digital continuity team at Archives New Zealand stumbled upon one and half a year ago: The file type detection is especially tricky for the older file types. Unfortunately, for the proprietary and personal nature of the extracted files, they cannot easily be transferred to a test corpus. The same applies for the images themselves: The original media, the 5,25″ floppies of course cannot be shipped around. The dumped images cannot be published as well because of the same reason as the extracted files. This is a bit unfortunate as they are not available for future regression tests e.g. of the developed extraction utilities. The American colleague pointed out an alternate source for CTOS images, unfortunately in a bit different format (of just another imaging utility, ImageDisk, the converter is available here).
Beside the hardware level recovery the unknown file types highlighted another problem: There is no any CTOS system available any more on any recent hardware to re-run applications which usually require a certain operating system. Unfortunately, no emulators have emerged, thus the contained information of the logically recovered files might be lost forever. The non-existing CTOS hardware posed an additional challenge: Both recovery approaches relied on newly developed hardware, the Kryoflux or the FC5025 adaptor in order to pull the bits off of the disks using PC 5,25″ floppy drives. Those drives were not available from regular stores but various online used-hardware markets. Someday this source will dry up too.