Blogs: Identification

Blog posts filtered by the Identification subject tag.

Browse blogs by subject

All subjects Access Analysis Android apache tika ApacheTika AQuA ARC ARC to WARC archives archiving audiovisual Benchmark benchmarking best practice best practices Bit rot bitcurator board game British Library Characterisation Community compression Corpora curation Database Database Archiving Database Preservation Delivery Digital Forensics digital preservation digitisation Disk Images DROID E-ARK Project EaaS Education Emulation epub Experimentation extensible Fido File Formats FLAC Flashback floppy disk floppy disks floppy drive Format Identification Format Registry GitHub Hackathon Hardware obsolescence help httpreserve Identification IDPD17 IMPACT Internet Standards iPRES. community survey isolyzer jhove job JP2 JPEG2000 jpylyzer LZW magnetic media Matchbox MediaConch Members Metadata metadate Migration Monitoring Normalisation OCR open Open Planets Foundation Open Preservation Foundation Open source OPF diary Optimization Packaging PDF PDF/A Planets policy PREFORMA PREMIS preservation Preservation Actions preservation planning Preservation Risks Preservation Strategies Preservia Process Projects PRONOM Provenance pywb recordkeeping records Representation Information Research data research infrastructure Resources RFC Rogues Gallery Rosetta Roy SCAPE Server Siegfried Signature Development Software Software benchmarking SPARQL specification spreadsheets SPRUCE standards technical technical registry testing TIFF Tika Tools training validation veraPDF Virtual Machines w3c WARC Watch WAV WAVE Web Archiving Web Publications wget Wikidata Workflow Workflows Zip

For anyone dealing with a relatively small number of records, compared to say an internet or data archive, a reasonable process for ingest of material into your digital preservation system might be: 1. Process files with a file format identification tool 2. Per 1. process files with a file format validation tool 3. Per 1. […]

By ross-spencer, posted in ross-spencer's Blog

13th Mar 2016  5:27 AM  3808 Reads  No comments

At my workplace, we write a lot of small scripts to encode preservation workflows. These scripts pipeline simple actions like munging metadata, moving files about, and calling other tools such as Tika and ImageMagick. Often these actions are conditional on the format of the file being processed: for example, we only want to run Tika over the formats for […]

By Richard, posted in Richard's Blog

18th Feb 2016  3:20 AM  3919 Reads  No comments

This is the second blog inspired by my visit to colleagues at National Library of Australia, last August. The first, discusses a federated approach to better incorporating custom signatures into the PRONOM signature base without modifying PRONOM. The essence of the blog, however, still centers around how the community can create signatures for itself, and […]

By ross-spencer, posted in ross-spencer's Blog

7th Jan 2016  7:15 AM  4076 Reads  No comments

Presented here is a tool that will create a 'rogues gallery' out of any digital collection for which you have a DROID report for (alternatively, soon, a Siegfried report for). The tool was presented at a recent OPF Webinar, Preservation in Practice: Archives New Zealand; slides here. And was created by myself and Andrea K. […]

By ross-spencer, posted in ross-spencer's Blog

25th Aug 2015  9:44 AM  3856 Reads  No comments

Abstract() This blog discusses what we have available in our toolkit for contributing more signatures to PRONOM for the benefit of the digital preservation community. It also discusses the potential issues we need to work around in the short time we have between controlled PRONOM releases. The blog outlines an idea for a temporary, federated […]

By ross-spencer, posted in ross-spencer's Blog

11th Aug 2015  10:59 AM  4203 Reads  3 Comments

Ok. I know what you’re thinking. Do we really need another PRONOM-based, file format identification tool? A year or so I might have said “no” myself. In DROID and FIDO, we are already blessed with two brilliant tools. In my workplace, we’re very happy users of DROID. We trust it as the reference implementation of […]

By Richard, posted in Richard's Blog

27th Sep 2014  7:52 AM  16480 Reads  8 Comments

We’ve been doing legacy disk extracts at Archives New Zealand for a number of years with much of the effort enabling us to do this work being done by colleague Mick Crouch, and former Archives New Zealand colleague Euan Cochrane – earlier this year, we received some disks from New Zealand’s Department of Conservation (DoC) which we successfully imaged and […]

By ross-spencer, posted in ross-spencer's Blog

23rd Sep 2014  8:14 AM  14397 Reads  4 Comments

In this post I'll be taking a look at format identification of PDF files and highlighting a difference in opinion between format identification tools. Some of the details are a little dry but I'll restrict myself to a single issue and be as light on technical details as possible. I hope I'll show that once […]

By Carl Wilson, posted in Carl Wilson's Blog

21st Aug 2014  10:40 AM  17595 Reads  13 Comments

I thought OPF members might be interested in this UK Web Archive blog post I wrote on format identification and validation of our historical web archives: How much of the UK's HTML is valid?

By Andy Jackson, posted in Andy Jackson's Blog

2nd Jul 2014  12:05 PM  9968 Reads  No comments

The problem We have a large volume of content on floppy disks that we know are degrading but which we don't know the value of. Considerations We don't want to waste time/resources on low-value content. We don't know the value of the content. We want to be able to back up the content on the […]

By Euan Cochrane, posted in Euan Cochrane's Blog

26th Jun 2014  3:15 PM  14179 Reads  1 Comment