Per Møldrup-Dalum's Blog

Well over a year ago I wrote the ”A Year of FITS”(http://www.openpreservation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit […]

By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog

28th May 2014  9:30 PM  15421 Reads  1 Comment

In December last year I attended a Hadoop Hackathon in Vienna. A hackathon that has been written about before by other participants: Sven Schlarb's Impressions of the ‘Hadoop-driven digital preservation Hackathon’ in Vienna and Clemens and René's The Elephant Returns to the Library…with a Pig!. Like these other participants I really came home from this […]

By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog

23rd Jan 2014  9:01 AM  12305 Reads  No comments

Introduction We have got in excess of 300 TB of essential unknown data. At the State and University Library in Denmark we recently passed 300TB of harvested web resources in our web archive. These web resources have been harvested by crawling the Danish part of the internet since 2005, i.e. from every publicly available URL […]

By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog

9th Jan 2013  2:23 PM  22464 Reads  4 Comments