”#Migration: No one does it for the future; they do it (need to do it) for the now.” – https://twitter.com/beet_keeper/status/327968228276060160
Recently I was asked by a colleague to look at some files he’d been sent by Hutt City Council in New Zealand; an unknown format from a 1995 vintage IBM operating system – a format as yet unidentified by popular format identification tools.
As with most of these attempts to identify a format we ran the files through DROID, ExifTool and the Unix File Command. With neither identifying the files the search really begins with a Google search of the file’s magic bytes:
2B 41 2B 56 2B 43 2B +A+V+C+
A single result at the time provided little to go on; it confirmed someone had once asked the same question on a computer graphics forum. A few clues in the bitstream e.g. a potential font size and title, ‘Roman Bold 26’, and a few more Google searches meant that we could say these files were potentially proprietary to an IBM system as opposed to a file with a more open specification. Confirmation with the content provider gave us the original environment as OS/2.
But that was it. We were staring at an obsolete format; definition: “A format, which, within our limited resourced world view at the time, we could no longer use.”
Our final point of call was to see if we could put the format back into its original environment to observe it in its natural state.
From here, the process became much simpler. As it turned out, an OS/2 installation running on VirtualBox knew what to do with these files. It was able to render them natively in an application for handling IBM AVC (Advanced Video Connection) content. Even better than that, the context menu for these images gave us the option ‘Convert To’ with the following options available:
- BMP (OS/2 Bitmap)
- DIB (RIFF DIB Image)
- GIF (GIF Image Compressed)
- JPG (Baseline JPG)
- PCX (PCX Image Compressed)
- TGA (Truevision TARGA)
- TIF (Tag Image File Format)
- VID (IBM MMotion Still Video Image)
Variants existed under BMP, TGA and TIFF, for example OS/2 1.3 and 2.0 BMP and Motorola or Intel, Compressed or Uncompressed TIFF.
The context menu option also allowed for the bulk conversion of these images, so a single click gave us uncompressed TIFF images suitable for export.
Simple is of course a relative term, and although we had the images we wanted, there was a problem retrieving them from the emulated environment. Unable to successfully set up a shared drive to enable our Host OS to interact with VirtualBox, and unable to attach any form of writeable media, we were stuck.
The virtual machine was connected to the Internet but Netscape unable to interact with modern websites particularly well. Also we were unable to use FTP successfully, at least given the self-imposed timeframe we were working to.
Our final option was email. SMTP saved the day. Taking the images, Zipping them using the still available Info-Zip tool and emailing them from a Gmail account back to itself using the OS-provided Netscape Messenger email client enabled the images to be retrieved which immediately made them useable in a modern environment.
And that was it, job done!
But there is still more to this story.
It’s 1996. I boot up my OS/2 Warp 4.2 box. It’s being packed away today, ready for the new Pentium machines running Windows being rolled out by our IT department. Windows… *sigh* but my IT department wax lyrical about the improvements in performance and security. It’s just work, I’ve got a fishing trip at the weekend so I’ve other things to keep my mind off the IBM vs. Microsoft debate. Wait! I’d better make sure I’ve got all my files. Ah, those IM files I was looking at last year. Neat images; could come in handy again. Windows doesn’t support the format though. Hmm, right-click, convert. 300 files; IM to TIF – that’s going to take a few floppy disks! – Should be able to access them in a few applications though. Good!
What we did by grabbing hold of an OS/2 installation and VirtualBox was not create a solution we want to take into the future. It was us stepping back into 1996 for one time only. To create a version of a file we could take into 1997, and beyond, on a different platform. It is 1996 again and we’ve now got 300 TIFF files. As things move forward in 2013 we might start thinking about converting them to a new standard, PNG maybe to capitalize on space savings provided by lossless compression and also to make use of them on the web. Being an open standard (like TIF) might help to avoid a similar situation to our IM files in future as well. Whatever mechanism is best. It should be lossless and should give us the greatest potential for use moving forward.
Outside of the time travel context, with our images converted and the original provider of the materials happy with the work, we’re left with a success story, but an incomplete solution… an unsatisfying one.
An unsatisfying solution
At the end of this process we’re still left with a file format we don’t fully understand. I can’t migrate this format in a modern environment using modern tools. I can’t render it; I can’t really identify it with complete certainty. I can’t help matters and create a signature for it without really knowing more about where it came from and what its specification looks like. I do have enough examples from a single system to take apart some of the header and look for consistencies but is this precise enough for what we’re attempting to achieve in Digital Preservation? Maybe, for an experimental DROID signature file.
As for the completed migration, with no validation tools available I can’t look at the internals of this format and guarantee I know what was lost between the conversions from IM to TIFF – I do know I’ve lost something though – what were those references to font? They’re no longer in the TIF output, and what other plain text did I spot in the bitstream that might mean something? A part of the bitstream annotated as ‘TEXT’, another ‘HEAD’- fields pertaining to the DB/2 conversion described by the provider?
- We can’t validate the success of the conversion beyond the rendered image
- We haven’t isolated a specification for this format
- We haven’t an ability to express a signature in current production identification systems
- We cannot render IM files in a modern environment
- The mechanism of transfer from the emulated environment to our Host OS was certainly not a preferred route
As many a school report might say – could do better. The end result of this process is that we have some images that can now be reused by the original content provider. We can also say with a little confidence that we know what format these images were originally: IBM AVC Still Video Image. I’ll leave it up to the comments section of this blog to suggest ways forward from here. The main message for me, however, is that for this to be considered a satisfactory result for digital preservation, one, or more of these issues would have been solved as part of the process – a file format signature would be something, some idea about what the header says would be good, and a deeper analysis about the format structure even better. What would be really nice is an understanding of whether it might be possible to create a migration tool for this format in future, with some idea about what the original specification for the format suggests about the feasibility of being able to do that.
Before I conclude, we did consider two other options which with further investigation might help us in the short term.
- eComStation is a modern operating system, based on OS/2. In an emulated environment this might give us better methods of extracting the files, for example USB support, better access to file upload websites, and even the opportunity to set up a shared drive between it and the Host OS. We did try to convert the images using eComStation and found that it worked, and even provided PNG as an export format – what had been lost in translation, however, was the bulk processing capability – this left us wondering whether we’d need to create a MS-DOS based Batch Script to do this routine, or even use REXX – IBM’s own interpreted programming language native to the environment.
- Exporting the OS/2 executable for the native image viewer or even converter into Windows may have worked providing they had originally been written to be compatible with Windows and not just OS/2. Highly unlikely but we did have success running an Aldus PhotoStyler executable found in the user directories sitting alongside the original image files.
Migration for the Now
This was an interesting use case. It was nice to have the time to look at a problem outside of the context of the government records we’re expected to look after at Archives New Zealand. There was no expectation of this result, just some files to play around with and see what we could do.
There were a number of lessons alluded to above – goals that we should strive for in digital preservation.
For me, despite this solution relying wholly on emulation, what I really learned was the value of migration. Stepping back into 1996 allowed me to migrate my files to a format I could still use in 2013. I believe the same of file formats now. Any file formats that I have a doubt about, be them proprietary, be that an objective, or otherwise, measurement of over complexity, or simply because it’s not a widely adopted format – I should be thinking about migrating them. It might be the difference between a future Digital Preservation Analyst having to emulate my XP environment and finding an obscure way to transfer files from it, and the alternative, of simply being able to render them natively within their own modern OS.