Blogs: Corpora

Blog posts filtered by the Corpora subject tag.

Browse blogs by subject

Inspired by Jenny Micham’s blog post about developing her first file format signature, I thought it would be fun to take a crack at creating one myself. I previously dipped my toe into the world of contributing to PRONOM by looking at a few mis-identifications and multi-identifications, but I had yet to create a file […]

By Andrea Byrne, posted in Andrea Byrne's Blog

8th Sep 2016  9:14 AM  1098 Reads  4 Comments

Abstract() This blog discusses what we have available in our toolkit for contributing more signatures to PRONOM for the benefit of the digital preservation community. It also discusses the potential issues we need to work around in the short time we have between controlled PRONOM releases. The blog outlines an idea for a temporary, federated […]

By ross-spencer, posted in ross-spencer's Blog

11th Aug 2015  10:59 AM  2571 Reads  3 Comments

While browsing ArchiveTeam's File Formats Wiki earlier this week, I came across some entries I created there on Quattro Pro spreadsheets two years ago. At the time I had also contributed some old Quattro Pro for DOS spreadsheets (here and here) from my personal archives to the OPF format corpus. Seeing those files again, I […]

By johan, posted in johan's Blog

29th Oct 2014  2:59 PM  17459 Reads  2 Comments

Conducting some research into the chaining of digital preservation tools using a Linux shell script, I once again found it difficult to source a set of files that I could use as a stake in the ground and allow my work to be in some way replicated by others wishing to confirm results and find […]

By ross-spencer, posted in ross-spencer's Blog

20th Feb 2014  6:57 AM  10098 Reads  No comments

This blog follows up on three earlier posts about detecting preservation risks in PDF files. In part 1 I explored to what extent the Preflight component of the Apache PDFBox library can be used to detect specific preservation risks in PDF documents. This was followed up by some work during the SPRUCE Hackathon in Leeds, […]

By johan, posted in johan's Blog

27th Jan 2014  3:08 PM  16091 Reads  7 Comments

My previous blog Assessing file format risks: searching for Bigfoot? resulted in some interesting feedback from a number of people. There was a particularly elaborate response from Ross Spencer, and I originally wanted to reply to that directly using the comment fields. However, my reply turned out to be a bit more lengthy than I […]

By johan, posted in johan's Blog

8th Oct 2013  4:24 PM  14523 Reads  4 Comments

Here's a little newsbulletin about FIDO, the open source file format identification tool of OPF. It seems that the use of FIDO is growing the last few months. I am getting responses by e-mail and through the Github issuetracker from all over the world, ranging from requests for help, giving suggestions for improvement and even […]

By TechMaurice, posted in TechMaurice's Blog

18th Sep 2013  10:27 AM  16373 Reads  No comments

Last winter I started a first attempt at identifying preservation risks in PDF files using the Apache Preflight PDF/A validator. This work was later followed up by others in two SPRUCE hackathons in Leeds (see this blog post by Peter Cliff) and London (described here). Much of this later work tacitly assumes that Apache Preflight […]

By johan, posted in johan's Blog

25th Jul 2013  12:57 PM  21341 Reads  12 Comments

Now that the subproject lead in PW is being transferred from me to Kresimir, it seems a good time to reflect a little on what we have achieved in PW since February 2011 and what is left to do! What did we set out to do? To accomplish effective digital preservation, environments with a preservation […]

By cbecker, posted in cbecker's Blog

23rd Jul 2013  9:20 AM  11626 Reads  No comments

Following the community response to our workshop last year, we want to invite you again to contribute your future preservation challenge! Digital Preservation has emerged as a key challenge for information systems in almost any domain from eCommerce and eGovernment to finance, health, and personal life. The field is increasingly recognized and has taken major […]

By cbecker, posted in cbecker's Blog

17th Jun 2013  5:24 PM  12469 Reads  2 Comments