Blog posts filtered by the wget subject tag.
Browse blogs by subject
In a previous blog post I showed how we resurrected NL-menu, the first Dutch web index. It explains how we recovered the site’s data from an old CD-ROM, and how we subsequently created a local copy of the site by serving the CD-ROM’s contents on the Apache web server. This follow-up post covers the final […]
By johan, posted in johan's Blog
Related to my work exploring hyperlinks in documentary heritage – something I feel we’ll be taking care of for a long time – I created a hyperlink extract tool called tikalinkextract. Put simply – the tool will take your collection of files, extract the intellectual content using Apache Tika, and then analyse that content for […]