Blogs: WARC

Blog posts filtered by the WARC subject tag.

Browse blogs by subject

In a previous blog post I showed how we resurrected NL-menu, the first Dutch web index. It explains how we recovered the site’s data from an old CD-ROM, and how we subsequently created a local copy of the site by serving the CD-ROM’s contents on the Apache web server. This follow-up post covers the final […]

By johan, posted in johan's Blog

11th Jul 2018  3:47 PM  362 Reads  No comments

In my last blog post about ARC to WARC migration I did a performance comparison of two alternative approaches for migrating very large sets of ARC container files to the WARC format using Apache Hadoop, and I said that resolving contextual dependencies in order to create self-contained WARC files was the next point to investigate […]

By shsdev, posted in shsdev's Blog

24th Mar 2014  4:13 PM  15022 Reads  No comments