Blogs: Tools

Blog posts filtered by the Tools subject tag.

Browse blogs by subject

All subjects Access Analysis Android apache tika ApacheTika AQuA ARC ARC to WARC archives archiving audiovisual Benchmark benchmarking best practice best practices Bit rot bitcurator board game British Library Characterisation Community compression Corpora CSV-Validator curation Database Database Archiving Database Preservation Delivery Digital Forensics digital preservation digitisation Disk Images DROID E-ARK Project EaaS Education Emulation epub Experimentation extensible Fido File Formats FLAC Flashback floppy disk floppy disks floppy drive Format Identification Format Registry GitHub Hackathon Hardware obsolescence help httpreserve Identification IDPD17 IMPACT Internet Standards iPRES. community survey isolyzer jhove job JP2 JPEG2000 jpylyzer LZW magnetic media Matchbox MediaConch Members Metadata metadate Migration Monitoring Normalisation OCR open Open Planets Foundation Open Preservation Foundation Open source OPF diary Optimization Packaging PDF PDF/A Planets policy PREFORMA PREMIS preservation Preservation Actions preservation planning Preservation Risks Preservation Strategies Preservia Process Projects PRONOM Provenance pywb recordkeeping records Representation Information Research data research infrastructure Resources RFC Rogues Gallery Rosetta Roy SCAPE Server Siegfried Signature Development Software Software benchmarking SPARQL specification spreadsheets SPRUCE standards technical technical registry testing TIFF Tika Tools training validation veraPDF Virtual Machines w3c WARC Watch WAV WAVE Web Archiving Web Publications wget Wikidata Workflow Workflows Zip

The problem We have a large volume of content on floppy disks that we know are degrading but which we don't know the value of. Considerations We don't want to waste time/resources on low-value content. We don't know the value of the content. We want to be able to back up the content on the […]

By Euan Cochrane, posted in Euan Cochrane's Blog

26th Jun 2014  3:15 PM  14670 Reads  1 Comment

I have been working on some code to ensure the accurate and consistent output of any file format analysis based on the DROID CSV export, example here. One way of looking at it is an executive summary of a DROID analysis, except I don't think executives, as such, will be its primary user-base.  The reason for pushing […]

By ross-spencer, posted in ross-spencer's Blog

3rd Jun 2014  7:20 AM  12916 Reads  1 Comment

Well over a year ago I wrote the ”A Year of FITS”(http://www.openpreservation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit […]

By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog

28th May 2014  9:30 PM  15653 Reads  1 Comment

The third and final SCAPE developers workshop was held at the Royal Dutch Library in The Hague on 23-25 April. This workshop was the final opportunity to work together face to face in a large group, since we are getting closer to the end of the project in September. The  workshop objectives were to identify […]

By MelanieImming, posted in MelanieImming's Blog

9th May 2014  11:45 AM  11674 Reads  No comments

For quite some time at The National Archives (UK) we've been working on a tool for validating CSV files against user defined schema.  We're now at the point of making beta releases of the tool generally available (1.0-RC3 at the time of writing), along with the formal specification of the schema language.  The tool and […]

By David Underdown, posted in David Underdown's Blog

21st Mar 2014  2:51 PM  12742 Reads  No comments

This post covers two main topics that are related; characterising web content with Nanite, and my methods for successfully integrating the Tika parsers with Nanite. Introducing Nanite Nanite is a Java project lead by Andy Jackson from the UK Web Archive, formed of two main subprojects: Nanite-Core: an API for Droid    Nanite-Hadoop: a MapReduce […]

By willp-bl, posted in willp-bl's Blog

21st Mar 2014  1:58 PM  15937 Reads  No comments

Fifteen days was the estimate I gave for completing an analysis on roughly 450,000 files we were holding at Archives New Zealand. Approximately three seconds per file for each round of analysis: 3 x 450,000 = 1,350,000 seconds 1,350,000 seconds = 15.625 days My bash script included calls to three Java applications, Apache Tika, 1.3 […]

By ross-spencer, posted in ross-spencer's Blog

24th Feb 2014  2:17 AM  19000 Reads  5 Comments

The Web is constantly evolving over time. Web content like texts, images, etc. are updated frequently. One of the major problems encountered by archiving systems is to understand what happened between two different versions of the web page.   We want to underline that the aim is not to compare two web pages like this […]

By Zeynep, posted in Zeynep's Blog

7th Feb 2014  1:15 PM  12802 Reads  No comments

A while back I wrote a blog post, MIA: Metadata. I highlighted how difficult it was to capture certain metadata without a managed system – without an Electronic Document and Records Management System (EDRMS). I also questioned if we were doing enough with EDRMS by way of collecting data. Following that blog we sought out […]

By ross-spencer, posted in ross-spencer's Blog

4th Feb 2014  5:21 AM  13957 Reads  1 Comment

One of my first blogs here covered an evaluation of a number of format identification tools. One of the more surprising results of that work was that out of the five tools that were tested, no less than four of them (FITS, DROID, Fido and JHOVE2) failed to even run when executed with their associated […]

By johan, posted in johan's Blog

31st Jan 2014  12:58 PM  1682034 Reads  6 Comments