As part of the scape project, we did a large-scale experiment and evaluation of audio migration using the xcorrSound tool waveform-compare for content comparison in the quality assurance.
I did a presentation of the results at the demonstration day at the State and University Library, see the SCAPE Demo Day at Statsbiblioteket blog post by Jette G. Junge.
And now I present the screencast of this demonstration:
The brief summary is:
- New tool: using xcorrSound waveform-compare, we can automate audio file content comparison for quality assurance
- Scalability: using Hadoop we can migrate our 20TB radio broadcast mp3 collection to the wav file format in a month (on the current SB Hadoop cluster set-up) rather than in years 🙂
And just a few notes:
- the large scale experiment did not include property extraction and comparison, but we are confident (based on earlier experiment) that we can do this effectively using FFprobe
- the large scale experiment did also not include file format validation. We made an early decision not to use JHOVE2 based on performance. The open question is if we are satisfied with the "pseudo validation" that the ffprobe property extraction and the xcorrSound waveform-compare cross correlation algorithm were both able to read the file…
Oh, and the slides are also on Slideshare: Migration of audio files using Hadoop.