The preservation of audio CDs is something that is slightly different from the preservation of CDs containing data other than audio. Data on audio CDs cannot be easily cloned for preservation, as the music industry has lobbied the main operating system developers to curtail the duplication of CDs to crack down on the mass production of pirate copies. While this is understandable from an intellectual property perspective, it is rather problematic from a preservation viewpoint.
I have scoured published documents in this area but there are no comprehensive examples of best practice related to data preservation from audio CDs. There are guidebooks on the preservation of the CDs themselves but next to nothing about the preservation of the data on the audio CDs. This area requires urgent attention because audio CDs may contain at risk and decaying audio data on a fragile medium. Certain types of audio CDs are nearing their end of life faster than others.
At the SPRUCE London Mashup in July 2013 I proposed the creation of a workflow model for the preservation of audio CDs. Working mainly with Peter May (British Library) and Carl Wilson (OPF), with input from other developers at the mashup, we established that the main problem that needed to be resolved was the fact that there was no open source tool to easily create a disk image or clone of data on an audio CD.
While this may seem like a straightforward project, it took no fewer than three experienced developers working on this problem many hours before a practical solution was proposed, based on cdrdao. (See: an outline of the initial solution)
Having resolved the basic need to create a clone or disk image from an audio CD, the next step in this project was to explore how to catalogue the disk image and its contents, as well as normalise the audio files into the standard BWAV format. This was supported by a SPRUCE award (funded by JISC) covering the period August-October 2013, involving Carl Wilson and Toni Sant, with the participation of Darren Stephens from the University of Hull. Through further consultation with digital forensics experts at the British Library and elsewhere, as well as systematic development, this project has addressed this issue directly.
Once the fundamental open solution was in hand, our attention could be turned to the development of a four-step workflow model for the preservation of audio CDs. The four steps are as follows:
1. Disk Imaging (stabilizing the data)
2. Cataloguing (through individual Cue sheets)
3. Data Ripping (normalising the data)
4. Open access to the catalogue (outputting the metadata)
Working with a specific dataset (see: an outline of the dataset) this project is now able to provide a practical workflow model utilizing the solution proposed during the London SPRUCE mashup as a tool for steps 1 & 3 called arcCD. An example of good practice has now been established in this under-explored area of preservation. All materials produced for this project are available on GitHub. Darren Stephens is also integrating further development on outputting the metadata into MediaWiki for easy access and editing of the catalogue, as part of his PhD research project entitled 'A Framework for Optimised Interaction Between Mediated Memory Repositories and Social Media Networks.'
The initial dataset used for the development of this project is managed by the Malta Music Memory Project (M3P), which seeks to provide an inclusive repository for memories of Maltese music and associated arts, ensuring that these are kept in posterity for current and future generations. M3P is one of the projects within the Media and Memory Research Initiative (MaMRI) of the University of Hull and it is facilitated by the M3P Foundation, a voluntary organization registered in Malta.