Blogs: DROID

Blog posts filtered by the DROID subject tag.

Browse blogs by subject

All subjects Access Analysis Android apache tika ApacheTika AQuA ARC ARC to WARC archives archiving audiovisual Benchmark benchmarking best practice best practices Bit rot bitcurator board game British Library Characterisation Community compression Corpora curation Database Database Archiving Database Preservation Delivery Digital Forensics digital preservation digitisation Disk Images DROID E-ARK Project EaaS Education Emulation epub Experimentation extensible Fido File Formats FLAC Flashback floppy disk floppy disks floppy drive Format Identification Format Registry GitHub Hackathon Hardware obsolescence help httpreserve Identification IDPD17 IMPACT Internet Standards isolyzer jhove job JP2 JPEG2000 jpylyzer LZW magnetic media Matchbox MediaConch Members Metadata metadate Migration Monitoring Normalisation OCR open Open Planets Foundation Open Preservation Foundation Open source OPF diary Optimization Packaging PDF PDF/A Planets policy PREFORMA PREMIS preservation Preservation Actions preservation planning Preservation Risks Preservation Strategies Preservia Process Projects PRONOM Provenance pywb recordkeeping records Representation Information Research data research infrastructure Resources RFC Rogues Gallery Rosetta Roy SCAPE Siegfried Signature Development Software Software benchmarking SPARQL specification spreadsheets SPRUCE standards technical technical registry testing TIFF Tika Tools training validation veraPDF w3c WARC Watch WAV WAVE Web Archiving Web Publications wget Wikidata Workflow Workflows Zip

Course Overview If you have a digital preservation strategy that involves digital files, you’ll know how important it is to understand the file formats in which your data is encoded. To do this comprehensively involves at least three main operations: identifying the format, characterising the format, and validating the format. To put it another way, […]

By Becky, posted in Becky's Blog

18th Jul 2017  12:00 AM  0 Reads  No comments

BACKGROUND Nearly two and a half years ago, I started an effort for Apache Tika™ to help improve its robustness via TIKA-1302.  Apache Tika™ is an umbrella/wrapper project that “detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).” I documented some of the early work […]

By tallison, posted in tallison's Blog

4th Oct 2016  3:03 PM  6852 Reads  No comments

This is a relatively long post, so to summarise before delving into the details: We’re exploring Wikidata, the (relatively new) Wikipedia for data, as a knowledge base for digital preservation information and would appreciate feedback and involvement. At Yale University Library we are beginning a new programme of work (with funding from both CLIR and […]

By Euan Cochrane, posted in Euan Cochrane's Blog

30th Sep 2016  9:47 PM  9058 Reads  7 Comments

Inspired by Jenny Micham’s blog post about developing her first file format signature, I thought it would be fun to take a crack at creating one myself. I previously dipped my toe into the world of contributing to PRONOM by looking at a few mis-identifications and multi-identifications, but I had yet to create a file […]

By Andrea Byrne, posted in Andrea Byrne's Blog

8th Sep 2016  9:14 AM  2882 Reads  4 Comments

At Archives New Zealand we were finding ‘WAVE’ files becoming a bottleneck of one of our ingest processes. The result initially looked odd to me where I had thought I had understood in the past that file format identification would not take longer to divine than a checksum. My rationale being that to identify a […]

By ross-spencer, posted in ross-spencer's Blog

22nd Aug 2016  8:15 AM  2357 Reads  2 Comments

Jenny Mitcham, Digital Archivist at the University of York started a nice snowball rolling last week when she asked “Research data – what does it really look like?” Paul Young at the National Archives, UK, was one of those to respond, to show that perhaps the snowball had been generating momentum for a number of […]

By ross-spencer, posted in ross-spencer's Blog

14th Jun 2016  6:43 AM  2844 Reads  No comments

As promised yesterday this is the follow up blog to the refactor of my original DROID SQLite Analysis work. The new version now allows you to produce reports from the format identification tool Siegfried. In this blog I wanted to talk about a small number of other details that can be a bit harder to […]

By ross-spencer, posted in ross-spencer's Blog

24th May 2016  9:59 AM  2398 Reads  No comments

With the release of the latest Siegfried there was added motivation for me to provide an analysis output for the format identification tool. With ‘double the magic’ there was a lot more for us to explore as analysts, and fingers crossed this release (a refactor) of my SQLite based analysis tool will help with that exploration. Previous […]

By ross-spencer, posted in ross-spencer's Blog

23rd May 2016  6:56 AM  2202 Reads  No comments

This is the second blog inspired by my visit to colleagues at National Library of Australia, last August. The first, discusses a federated approach to better incorporating custom signatures into the PRONOM signature base without modifying PRONOM. The essence of the blog, however, still centers around how the community can create signatures for itself, and […]

By ross-spencer, posted in ross-spencer's Blog

7th Jan 2016  7:15 AM  3570 Reads  No comments

Presented here is a tool that will create a 'rogues gallery' out of any digital collection for which you have a DROID report for (alternatively, soon, a Siegfried report for). The tool was presented at a recent OPF Webinar, Preservation in Practice: Archives New Zealand; slides here. And was created by myself and Andrea K. […]

By ross-spencer, posted in ross-spencer's Blog

25th Aug 2015  9:44 AM  3590 Reads  No comments