Blogs: Tools

Blog posts filtered by the Tools subject tag.

Browse blogs by subject

All subjects Access Analysis Android apache tika ApacheTika AQuA ARC ARC to WARC archives archiving audiovisual Benchmark benchmarking best practice best practices Bit rot bitcurator board game British Library Characterisation Community compression Corpora CSV-Validator curation Database Database Archiving Database Preservation Delivery Digital Forensics digital preservation digitisation Disk Images DROID E-ARK Project EaaS Education Emulation epub Experimentation extensible Fido File Formats FLAC Flashback floppy disk floppy disks floppy drive Format Identification Format Registry GitHub Hackathon Hardware obsolescence help httpreserve Identification IDPD17 IMPACT Internet Standards iPRES. community survey isolyzer jhove job JP2 JPEG2000 jpylyzer LZW magnetic media Matchbox MediaConch Members Metadata metadate Migration Monitoring Normalisation OCR open Open Planets Foundation Open Preservation Foundation Open source OPF diary Optimization Packaging PDF PDF/A Planets policy PREFORMA PREMIS preservation Preservation Actions preservation planning Preservation Risks Preservation Strategies Preservia Process Projects PRONOM Provenance pywb recordkeeping records Representation Information Research data research infrastructure Resources RFC Rogues Gallery Rosetta Roy SCAPE Server Siegfried Signature Development Software Software benchmarking SPARQL specification spreadsheets SPRUCE standards technical technical registry testing TIFF Tika Tools training validation veraPDF Virtual Machines w3c WARC Watch WAV WAVE Web Archiving Web Publications wget Wikidata Workflow Workflows Zip

“Characterization” can mean many things (I’m particularly fond, especially in this context, of the OED’s “creation of a fictitious character or fictitious characters”). Back in October Paul Wheatley suggested that digital preservation practitioners needed “better characterisation” and defined this as enabling them to determine the condition, content and value of digital records prior to ingest […]

By pixelatedpete, posted in pixelatedpete's Blog

15th Mar 2013  12:23 PM  15715 Reads  1 Comment

Part of my work on the SCAPE testbeds involves producing a workflow for the large scale migration of TIFF to JP2 files, with validation.  The tests I have run all involve the lossy compression of files. Two tools that could be used for the validation of image payload, and therefore success of a migration, are […]

By willp-bl, posted in willp-bl's Blog

5th Mar 2013  10:04 AM  12834 Reads  No comments

Building a Debian Package from a program written in Ruby is not a straightforward task. This post intends to be a step by step practical guide on packaging ruby programs based on the lessons we learned during the debianization process. We will use in this guide a sample program: Pagelyzer ( This program is an […]

By jordi.creus, posted in jordi.creus's Blog

18th Feb 2013  2:25 PM  18615 Reads  No comments

As part of our work on test-beds for the SCAPE project we have been investigating the various ways in which a large scale file format migration workflow could be implemented.  The underlying technologies chosen for the platform are Hadoop and Taverna.  One of the aims of the SCAPE project is to allow the automatic generation […]

By willp-bl, posted in willp-bl's Blog

14th Feb 2013  1:48 PM  16188 Reads  No comments

Last week I had the honour to host the OPF Webinar "Digital Preservation at your command, part II". During the Webinar attendees were shown the difference and/or similarities between the command line interfaces of MS DOS, Linux and Apple. Here is a short summary of the Webinar:* Comparison of command line interfaces (MS DOS, Linux, […]

By TechMaurice, posted in TechMaurice's Blog

4th Feb 2013  6:05 PM  12163 Reads  No comments

The most important new feature of the recently released PDF/A-3 standard is that, unlike PDF/A-2 and PDF/A-1, it allows you to embed any file you like. Whether this is a good thing or not is the subject of some heated on-line discussions. But what do we actually mean by embedded files? As it turns out, […]

By johan, posted in johan's Blog

9th Jan 2013  1:42 PM  136678 Reads  16 Comments

The PDF format contains various features that may make it difficult to access content that is stored in this format in the long term. Examples include (but are not limited to): Encryption features, which may either restrict some functionality (copying, printing) or make files inaccessible altogether. Multimedia features (embedded multimedia objects may be subject to […]

By johan, posted in johan's Blog

19th Dec 2012  3:15 PM  16988 Reads  1 Comment

In the middle of November 2012, the first OPF Hackathon on Emulation took place in Freiburg, Germany. It brought together practitioners from different national libraries, library information services as well as a couple of researchers in the domain. The aim of the three-day Hackathon was to work on practical use-cases and real-live challenges stemming from […]

By Dirk von Suchodoletz, posted in Dirk von Suchodoletz's Blog

4th Dec 2012  3:15 PM  12762 Reads  No comments

Several of us at The British Library took part in the CURATEcamp file id hackathon on Friday. We decided that one issue we could make a useful impact on was identification of various ebook formats. eBooks are an important content type for the British Library, especially with the expected implementation of non-print legal deposit legislation […]

By willp-bl, posted in willp-bl's Blog

19th Nov 2012  3:53 PM  16786 Reads  1 Comment

In the last months, I have been researching the problem of large-scale content profiling for preservation analysis. I do this for a number of reasons. For one, I support the opinion that formats are just another property. Undoubtedly, a very important one, but knowing which formats you have is not sufficient for good preservation planning […]

By peshkira, posted in peshkira's Blog

19th Nov 2012  11:03 AM  18576 Reads  No comments