xcorrSound: waveform-compare New Audio Quality Assurance Tool

PDF Eh? – Another Hackathon Tale

The xcorrSound tool package is being developed as part of the SCAPE Preservation Components: Quality Assurance work package. The tool compares sound waves and is used in different digital audio preservation scenarios. The xcorrSound tool package is available from https://github.com/openplanets/scape-xcorrsound. The package currently includes 3 different tools.

  • The tool overlap-analysis (earlier xcorrSound) finds the overlap between two audio files.
  • sound-match is a tool to find all occurrences of a shorter wav within a larger wav.
  • waveform-compare (earlier migration-qa) is a tool that splits two audio files into equal sized blocks, computes the correlation for each block, and outputs whether the input files are 'similar' or 'not similar'.

The tools all make use of cross correlation, which can be computed through the Fourier transform. The waveform-compare tool takes two wav audio files as input, and we want to determine whether they are equal or not. We split the files into smaller equal size blocks (5 seconds worth of samples), compute the cross correlation for each pair of blocks, and find the peaks. We remember the first peak, and if any of the following blocks’ peak position differs by more than 500 samples from the first block’s peak we conclude that the files are not similar, otherwise they are similar.

Why is it important to split the files into blocks? The intuition is that if we cross correlate the two files as is, then their similarity may be quite high even if some small parts have very bad correlation which could happen if an error occurred such that there was pure noise for a couple of seconds somewhere in the wav file.

The waveform-compare tool has been tested for migration of radio broadcasts from MP3 to WAV including automated Quality Assurance (QA) on the migrated files. Using the waveform-compare tool we want to determine if the two audio files (the original and the migrated one) are the same with respect to their content. This scenario is the SCAPE LS-DRT6 Migrate mp3 to wav scenario. There are several ways of designing heuristics that can give some assurance that the migration process went well such as checking if the length is the same before and after the migration. But such ’trivial’ measures do not take into account the possibility of just getting white noise as the migrated file, which obviously is a flaw.

The 'Migrate mp3 to wav validate compare list to list' Taverna workflow available from myExperiment was used to migrate a test set. The migration is done using FFmpeg. The workflow includes validation of file format of the migrated file using JHove2, and comparison of basic properties of the original and migrated files. The basic properties are extracted using FFprobe (the FFmpeg multimedia stream analyzer) and compared using a Taverna Beanshell. The properties that are checked are sampling rate, number of channels, bit depth and bit rate.

The test data set contains 70 two-hour Danish radio broadcast files. The average file size of the original mp3 files is only 118Mb, while the migrated wav files are approximately 1.4Gb. All the migrated files in the test set were reported to be valid, and the basic properties preserved.

In order to test the waveform-compare tool we needed a data set with errors. Three of the migrated files were thus replaced by a randomly generated file with a ’correct’ wav-header, such that the waveform-compare tool was able to process them. Five of the remaining 67 files were kept intact except for a few seconds a few places within the file which were replaced by randomly generated bytes. The other 62 files were kept as they were after migrating through FFmpeg. We have an inherent problem using this data set because it is quite artificial. We imagine that the data set contains errors that might occur during a migration, but we have no basis for this as we have never seen any erroneous migrations.

The waveform-compare tool works on wav files, so we also need to ’play’ or interpret the original mp3 files, just as a human needs to ’play’ or interpret an mp3 file to hear the sound. We currently use MPG321 to ’play’ the original mp3 files. MPG321 is an independent implementation of an mp3-decoder, thus independent from FFmpeg, which was used to migrate the files. The migrated files are already in wav format and are used directly.

We tested the tool on the described 70 pairs of mp3 and migrated wav test files using a bash script. The 'waveform-compare including MPG321 workflow bash script'' was run on a test machine with an Intel(R) Xeon(R) CPU X5660 @ 2.80GHz processor and 8GB RAM.

The script ran for 4 hours and 45 minutes. This gives us a performance of just over 4 minutes pr. file. This is roughly equally divided between the MPG321 migration and the waveform-compare comparison. In total there were 12 reported errors, which is 4 more than we expected. All the files that were supposed to be found during this QA check were found, so we only have some false positive left (or false negatives depending on your view). We investigated the additionally reported errors. The ’limit’ of 500 samples difference from the first block may in fact be too low. On one pair of files the best offset was 1152 samples during the first 6850 seconds of the file (00:00:00-01:54:10) but during the remaining part of the file it changed to having the best offset at 3456 samples and a cross correlation match value of nearly 1 (0.999-1.0).

The xcorrSound tool package is open source (GPLv2) and available from https://github.com/openplanets/scape-xcorrsound. There is a README file with installation and use guide. We are also planning some large scale experiments with the tool in the SCAPE project.

1013
reads

Leave a Reply

Join the conversation