The preservation of audio CDs is something that is slightly different from the preservation of CDs containing data other than audio. Data on audio CDs cannot be easily cloned for preservation, as the music industry has lobbied the main operating system developers to curtail the duplication of CDs to crack down on the mass production of pirate copies. While this is understandable from an intellectual property perspective, it is rather problematic from a preservation viewpoint.
I have scoured published documents in this area but there are no comprehensive examples of best practice related to data preservation from audio CDs. There are guidebooks on the preservation of the CDs themselves but next to nothing about the preservation of the data on the audio CDs. This area requires urgent attention because audio CDs may contain at risk and decaying audio data on a fragile medium. Certain types of audio CDs are nearing their end of life faster than others.
At the SPRUCE London Mashup in July 2013 I proposed the creation of a workflow model for the preservation of audio CDs. Working mainly with Peter May (British Library) and Carl Wilson (OPF), with input from other developers at the mashup, we established that the main problem that needed to be resolved was the fact that there was no open source tool to easily create a disk image or clone of data on an audio CD.
While this may seem like a straightforward project, it took no fewer than three experienced developers working on this problem many hours before a practical solution was proposed, based on cdrdao. (See: an outline of the initial solution)
Having resolved the basic need to create a clone or disk image from an audio CD, the next step in this project was to explore how to catalogue the disk image and its contents, as well as normalise the audio files into the standard BWAV format. This was supported by a SPRUCE award (funded by JISC) covering the period August-October 2013, involving Carl Wilson and Toni Sant, with the participation of Darren Stephens from the University of Hull. Through further consultation with digital forensics experts at the British Library and elsewhere, as well as systematic development, this project has addressed this issue directly.
Once the fundamental open solution was in hand, our attention could be turned to the development of a four-step workflow model for the preservation of audio CDs. The four steps are as follows:
1. Disk Imaging (stabilizing the data)
2. Cataloguing (through individual Cue sheets)
3. Data Ripping (normalising the data)
4. Open access to the catalogue (outputting the metadata)
Working with a specific dataset (see: an outline of the dataset) this project is now able to provide a practical workflow model utilizing the solution proposed during the London SPRUCE mashup as a tool for steps 1 & 3 called arcCD. An example of good practice has now been established in this under-explored area of preservation. All materials produced for this project are available on GitHub. Darren Stephens is also integrating further development on outputting the metadata into MediaWiki for easy access and editing of the catalogue, as part of his PhD research project entitled 'A Framework for Optimised Interaction Between Mediated Memory Repositories and Social Media Networks.'
The initial dataset used for the development of this project is managed by the Malta Music Memory Project (M3P), which seeks to provide an inclusive repository for memories of Maltese music and associated arts, ensuring that these are kept in posterity for current and future generations. M3P is one of the projects within the Media and Memory Research Initiative (MaMRI) of the University of Hull and it is facilitated by the M3P Foundation, a voluntary organization registered in Malta.
November 21, 2013 @ 11:53 am CET
I think some copy-protection systems depend on the subchannel data (and even the lead-in track, in some cases). When I worked on a similar CD ingest process (summarised in draft form here), we found that although most of the disks were simple one-track data/ISO9660 disks, we needed tools better able to cope with the edge cases.
The lack of a standard open format for full disk images is somewhat alarming, but perhaps this has since been resolved (I'd not heard of some of those tools from the ArchiveTeam page until now).
June 26, 2014 @ 4:17 pm CEST
Bit of a delayed response, but possibly important if anyone's considering to build a preservation workflow for audio based on the approach outlined here.
While looking up some information on optical media, I came across this excellent paper by Alexander Duryee (which was published several months after this blog post appeared):
It addresses the preservation of various types of optical media, including audio CDs. The following fragment from the Section Applications in Preservation Workflows caught my eye:
From the information in the blog post and the Wiki page it's not clear to me if the authors have taken into account any of the above issues. E.g. how does cdrdao compare to Exact Audio Copy or cdparanoia in terms of handling read errors? At first glance both of these tools look like more obvious candidates tool for this kind of work than cdrdao. Was audio extraction quality even a consideration in the tool selection process?
November 21, 2013 @ 10:11 am CET
One note on cdrdao: according to the ArchiveTeam page below cdrdao "does not properly rip CDs with nonstandard data":
I've never used it myself (and I don't know what exactly they mean with "nonstandard" here) but this might be a thing to watch out for.
Another resource that might be useful (although mainly geared towards game preservation) is this dumping guide on redump.org: