Blogs: Tools

Blog posts filtered by the Tools subject tag.

Browse blogs by subject

All subjects Access Analysis Android apache tika ApacheTika AQuA ARC ARC to WARC archives archiving audiovisual Benchmark benchmarking best practice best practices Bit rot bitcurator board game British Library Characterisation Community compression Corpora CSV-Validator curation Database Database Archiving Database Preservation Delivery Digital Forensics digital preservation digitisation Disk Images DROID E-ARK Project EaaS Education Emulation epub Experimentation extensible Fido File Formats FLAC Flashback floppy disk floppy disks floppy drive Format Identification Format Registry GitHub Hackathon Hardware obsolescence help httpreserve Identification IDPD17 IMPACT Internet Standards iPRES. community survey isolyzer jhove job JP2 JPEG2000 jpylyzer LZW magnetic media Matchbox MediaConch Members Metadata metadate Migration Monitoring Normalisation OCR open Open Planets Foundation Open Preservation Foundation Open source OPF diary Optimization Packaging PDF PDF/A Planets policy PREFORMA PREMIS preservation Preservation Actions preservation planning Preservation Risks Preservation Strategies Preservia Process Projects PRONOM Provenance pywb recordkeeping records Representation Information Research data research infrastructure Resources RFC Rogues Gallery Rosetta Roy SCAPE Server Siegfried Signature Development Software Software benchmarking SPARQL specification spreadsheets SPRUCE standards technical technical registry testing TIFF Tika Tools training validation veraPDF Virtual Machines w3c WARC Watch WAV WAVE Web Archiving Web Publications wget Wikidata Workflow Workflows Zip

This blog follows up on three earlier posts about detecting preservation risks in PDF files. In part 1 I explored to what extent the Preflight component of the Apache PDFBox library can be used to detect specific preservation risks in PDF documents. This was followed up by some work during the SPRUCE Hackathon in Leeds, […]

By johan, posted in johan's Blog

27th Jan 2014  3:08 PM  23596 Reads  7 Comments

In December last year I attended a Hadoop Hackathon in Vienna. A hackathon that has been written about before by other participants: Sven Schlarb's Impressions of the ‘Hadoop-driven digital preservation Hackathon’ in Vienna and Clemens and René's The Elephant Returns to the Library…with a Pig!. Like these other participants I really came home from this […]

By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog

23rd Jan 2014  9:01 AM  12410 Reads  No comments

More than 20 developers visited the ‘Hadoop-driven digital preservation Hackathon’ in Vienna which took place in the baroque room called "Oratorium" of the Austrian National Library from 2nd to 4th of December 2013. It was really exciting to hear people vividly talking about Hadoop, Pig, Hive, HBase followed by silent phases of concentrated coding accompanied […]

By shsdev, posted in shsdev's Blog

6th Dec 2013  4:30 PM  11814 Reads  No comments

As previously blogged about by Carl we now have virtually all SCAPE and OPF projects in Continuous Integration; building and unit testing in both Travis CI and Jenkins.  Travis compiles the projects and executes unit tests whenever a new commit is pushed to Github, or when a pull request is submitted to the project.  Jenkins […]

By willp-bl, posted in willp-bl's Blog

1st Nov 2013  10:19 AM  11873 Reads  No comments

My previous blog Assessing file format risks: searching for Bigfoot? resulted in some interesting feedback from a number of people. There was a particularly elaborate response from Ross Spencer, and I originally wanted to reply to that directly using the comment fields. However, my reply turned out to be a bit more lengthy than I […]

By johan, posted in johan's Blog

8th Oct 2013  4:24 PM  16222 Reads  4 Comments

Here's a little newsbulletin about FIDO, the open source file format identification tool of OPF. It seems that the use of FIDO is growing the last few months. I am getting responses by e-mail and through the Github issuetracker from all over the world, ranging from requests for help, giving suggestions for improvement and even […]

By TechMaurice, posted in TechMaurice's Blog

18th Sep 2013  10:27 AM  19546 Reads  No comments

Like many other organisations that are using JPEG 2000, the KB produces two representations of most of its digitised content (newspapers, books, periodicals): a high-quality, losslessly compressed JP2 that is the archival master; a lesser-quality, lossily compressed JP2 that is used as an access image (this is used for e.g. our newspapers website). The majority […]

By johan, posted in johan's Blog

19th Aug 2013  11:22 AM  15131 Reads  No comments

The browser-shots tool is developed by Internet Memory in the context of SCAPE project, as part of the preservation and watch (PW) sub-project. The goal of this tool is to perform automatic visual comparisons, in order to detect rendering issues in the archived Web pages. From the tools developed in the scope of the project […]

By stanislav.barton, posted in stanislav.barton's Blog

26th Jul 2013  2:31 PM  15451 Reads  No comments

Last winter I started a first attempt at identifying preservation risks in PDF files using the Apache Preflight PDF/A validator. This work was later followed up by others in two SPRUCE hackathons in Leeds (see this blog post by Peter Cliff) and London (described here). Much of this later work tacitly assumes that Apache Preflight […]

By johan, posted in johan's Blog

25th Jul 2013  12:57 PM  24542 Reads  12 Comments

Now that the subproject lead in PW is being transferred from me to Kresimir, it seems a good time to reflect a little on what we have achieved in PW since February 2011 and what is left to do! What did we set out to do? To accomplish effective digital preservation, environments with a preservation […]

By cbecker, posted in cbecker's Blog

23rd Jul 2013  9:20 AM  13333 Reads  No comments