Developing an Audio QA workflow using Hadoop: Part II

First things first. The Github repository with the Audio QA workflows is here: https://github.com/statsbiblioteket/scape-audio-qa. And version 1 is working. Version is really all wrong here. I should call it Workflow 1, which is this one:

SlimAudioMigrateAndQATavernaWorkflowUsingHadoopJobs

To sum up what this workflow does, is migration, conversion and content comparison. The top left box (nested workflow) migrates a list of mp3s to wav files using a Hadoop map-reduce job using the command line tool Ffmpeg, and outputs a list of migrated wav files. The top right box converts the same list of mp3s to wav files using another Hadoop map-reduce job using the command line tool mpg321, and outputs a list of converted wav files. The Taverna work flow then puts the two lists of wav files together and the bottom box receives a list of pairs of wav files to compare. The bottom box compares the content of the paired files using a Hadoop map-reduce job using the xcorrSound waveform-compare commandline tool, and outputs the results of the comparisons.

What we would like to do next is:

  • "Reduce" the output of the Hadoop map-reduce job using the waveform-compare commandline tool
  • Do an experiment on 1TB input mp3 files on the SB Hadoop cluster, and write an evaluation and a new blog post 😉
  • Extend the workflow with property comparison. The waveform-compare tool only compares sound waves; it does not look at the header information. This should be part of a quality assurance of a migration. The reason this is not top priority is that FFprobe property extraction and comparison is very fast, and will probably not affect performance much…

By BoletteJurik, posted in BoletteJurik's Blog

3rd Feb 2014  11:35 AM  11489 Reads  No comments

Comments

There are no comments on this post.


Leave a comment