marwoodls's Blog

  Tika File Mime Type Identification and the Importance of Metadata An evaluation was recently carried out to determine how well Apache Tika was able to identify the mime types of a corpus of test files, described in the ‘Data Set’ section. The purpose of the evaluation was to determine: 1.      if the performance* of Tika […]

By marwoodls, posted in marwoodls's Blog

20th May 2013  12:43 PM  18583 Reads  5 Comments

Background In 2002 the UK government introduced regulation that required all UK local authorities to provide the British Library with a copy of the electoral register every year. However, the legislation did not require this data to be provided in any particular format and, as a result the data is sent to the British Library […]

By marwoodls, posted in marwoodls's Blog

1st Mar 2013  2:38 PM  11841 Reads  1 Comment