Loading Events
  • This event has passed.

Access to Digital Collections that Take Bitstreams Seriously: Hands-on with BitCurator, BitCurator Access and BitCurator NLP Software

7th Jun 2018 - 8th Jun 2018


In this workshop, participants will learn about and gain experience with products of the BitCurator, BitCurator Access and BitCurator NLP (Natural Language Processing) projects.  It will involve substantial hands-on components, applying the tools on real digital materials.

Libraries, archives and museums (LAMs) are increasingly called upon to move born-digital materials from their original locations into more sustainable preservation environments. Information professionals must be prepared to extract digital materials from removable media in ways that reflect the rich metadata and ensure the integrity of the materials. They must also support and mediate appropriate access: allowing users to make sense of materials and understand their context, while also preventing inadvertent disclosure of sensitive data.

There has been a significant shift in recent years toward the adoption of digital forensics tools and methods by LAMs, in order to meet the above goals. This process has been facilitated by the BitCurator project (2011-2014), which has packaged and disseminated an open-source software environment that allows users to create disk images; extract data and metadata from disks or directories; scan bitstreams for the presence of potentially sensitive data values; characterize the contents of disks; and perform other practical tasks, such as scanning for viruses, finding duplicate files, mounting forensically packaged disk images, generating cryptographic hashes, and viewing hexadecimal representations of bitstreams.

The BitCurator Access project (2014-2016), investigated mechanisms for providing access to forensically-acquired data. A major product of the project has been BitCurator Access Webtools, which allows users to dynamically navigate filesystems of disk images, as well as searching over the content of many common files types contained within the images. The project also created BitCurator Access Redaction Tools to redact strings and byte sequences identified in disk images.

BitCurator NLP (2016-2018), is developing and disseminating software for identifying, extracting and exposing contextual entities from the wide diversity of born-digital materials that LAMs already hold and continue to receive. This includes helping to identify and explore information based on specific entities (e.g. people, places, organizations, events) of interest to curators and researchers.

Who should attend?

This workshop will be of interest to information professionals and practitioners who are responsible for acquiring or transferring collections of digital materials. We also welcome individuals involved in digital preservation research, development and IT management, who will learn how data generated by the BitCurator tools can complement and potentially be integrated with data generated by other tools and systems.

Why attend?

  • Learn about how digital forensics tools are applied and why this is important for the preservation of born-digital materials
  • Find out about tools and methods that can help support digital curation
  • Get hands-on experience using the BitCurator environment
  • Gain a practical understanding of how to apply these tools in your own organisation
  • Network with colleagues who are undertaking similar work.


  • Cal Lee, University of North Carolina at Chapel Hill
  • Kam Woods, University of North Carolina at Chapel Hill
  • Carl Wilson, Open Preservation Foundation

Preliminary Agenda

Thursday 7 June

Time Session
10:00 – 10:30 Registration
10:30 – 10:45 Welcome & housekeeping
10:45 – 12:00 Getting started:

  • Lightning talks
  • Background to BitCurator projects
12:00 – 12:45 BitCurator Environment: Introduction and rationale
12:45 – 13:30 Lunch
13:30 – 14:45 Hands-on: getting started with the BitCurator environment
14:45 – 15:00 Break
15:00 – 16:30 BitCurator Access and NLP: Introduction and rationale
16:30 – 17:00 Wrap up, set up, questions
17:00 CLOSE
18:30 Event dinner at Carluccio’s, 1 Brunswick Centre, London, WC1N 1AF

Friday 8 June

Time Session
09:30 – 10:00 Tea and coffee
10:00 – 10:15 Recap and introduction to day 2
10:15 – 11:15 Hands-on: BitCurator Access and BitCurator NLP Tools
11:15 – 11:45 Break and getting set up
11:45 – 12:45 Hands-on: Using the tools on your own materials
12:45 – 13:45 Lunch
13:45 – 14:45 Discussion: Access to born digital materials
14:45 – 15:00 Break
15:00 – 16:15 Feedback and future
16:15 – 16:30 Wrap up
16:30 CLOSE

How do I prepare for the workshop?

Lightning talk

We ask all participants to prepare a short talk to introduce yourself, any content you have brought to with you, or are working on, and what you hope to get out of the workshop.

Laptop requirements

To get the most out of the workshop, we ask you to bring a laptop that meets the following requirements:

  • Laptop running Windows 8/8.1/10, OS X 10.12 (or better), or 64-bit linux variant.
  • 4GB RAM (minimum)
  • 20GB hard drive space free (minimum)
  • Intel Virtualization (VTx) extensions must be enabled in BIOS (does not apply to Macs) in order to run 64-bit virtual machines. This process for this will vary from laptop to laptop. Particularly for Dell, HP, and Lenovo machines, the manufacturer may have shipped your laptop with these disabled. Ask your IT contact to enable these, or follow the manufacturer instructions to enable them if you have permissions.
  • Current release of VirtualBox installed (https://www.virtualbox.org) along with the VirtualBox extension pack (https://download.virtualbox.org/virtualbox/5.2.12/Oracle_VM_VirtualBox_Extension_Pack-5.2.12.vbox-extpack).
  • Current release of Vagrant installed (https://vagrantup.com)
  • Current release of BitCurator (https://wiki.bitcurator.net/) VM downloaded, uncompressed, and added to VirtualBox manager.

Content requirements

We will provide a selection of sample data, but we strongly encourage you to also bring your own test data.

  • We will not provide hardware to image or extract data from media, but if you have disk images that you have extracted from media, those would be ideal.
  • Note that the access and natural language processing (NLP) tools must run on text extracted from files.  To see a list of supported formats, visit: https://textract.readthedocs.io/en/stable/


This workshop is now FULLY BOOKED. Registration is closed.

OPF charter members are invited to register up to five participants free of charge, affiliate members can register for one place free of charge. Please use the link and code distributed to the members mailing list to apply the discount.

Please note due to the practical nature of this workshop, we are limiting registration to 25 places.

We are offering an early bird rate of £195 until 30 April 2018.

The registration fee from 1 May is £250. 

Venue and accommodation information

The event takes place at the British Library, 96 Euston Rd, London NW1 2DB

Nearby hotels

~5 minutes walk

~10 minutes walk

~15 minutes walk

~20 minutes walk


7th Jun 2018
8th Jun 2018
Event Tags:
, , ,


Open Preservation Foundation


British Library
96 Euston Road
London, NW1 2DB United Kingdom
+ Google Map