Access to Digital Collections that Take Bitstreams Seriously: Hands-on with BitCurator, BitCurator Access and BitCurator NLP Software
Jun 7 - Jun 8
In this workshop, participants will learn about and gain experience with products of the BitCurator, BitCurator Access and BitCurator NLP (Natural Language Processing) projects. It will involve substantial hands-on components, applying the tools on real digital materials.
Libraries, archives and museums (LAMs) are increasingly called upon to move born-digital materials from their original locations into more sustainable preservation environments. Information professionals must be prepared to extract digital materials from removable media in ways that reflect the rich metadata and ensure the integrity of the materials. They must also support and mediate appropriate access: allowing users to make sense of materials and understand their context, while also preventing inadvertent disclosure of sensitive data.
There has been a significant shift in recent years toward the adoption of digital forensics tools and methods by LAMs, in order to meet the above goals. This process has been facilitated by the BitCurator project (2011-2014), which has packaged and disseminated an open-source software environment that allows users to create disk images; extract data and metadata from disks or directories; scan bitstreams for the presence of potentially sensitive data values; characterize the contents of disks; and perform other practical tasks, such as scanning for viruses, finding duplicate files, mounting forensically packaged disk images, generating cryptographic hashes, and viewing hexadecimal representations of bitstreams.
The BitCurator Access project (2014-2016), investigated mechanisms for providing access to forensically-acquired data. A major product of the project has been BitCurator Access Webtools, which allows users to dynamically navigate filesystems of disk images, as well as searching over the content of many common files types contained within the images. The project also created BitCurator Access Redaction Tools to redact strings and byte sequences identified in disk images.
BitCurator NLP (2016-2018), is developing and disseminating software for identifying, extracting and exposing contextual entities from the wide diversity of born-digital materials that LAMs already hold and continue to receive. This includes helping to identify and explore information based on specific entities (e.g. people, places, organizations, events) of interest to curators and researchers.
Who should attend?
This workshop will be of interest to information professionals and practitioners who are responsible for acquiring or transferring collections of digital materials. We also welcome individuals involved in digital preservation research, development and IT management, who will learn how data generated by the BitCurator tools can complement and potentially be integrated with data generated by other tools and systems.
- Learn about how digital forensics tools are applied and why this is important for the preservation of born-digital materials
- Find out about tools and methods that can help support digital curation
- Get hands-on experience using the BitCurator environment
- Gain a practical understanding of how to apply these tools in your own organisation
- Network with colleagues who are undertaking similar work.
- Cal Lee, University of North Carolina at Chapel Hill
- Kam Woods, University of North Carolina at Chapel Hill
- Carl Wilson, Open Preservation Foundation
Thursday 7 June
|10:00 – 10:30||Registration|
|10:30 – 10:45||Welcome & housekeeping|
|10:45 – 12:00||Getting started:
|12:00 – 12:45||BitCurator Environment: Introduction and rationale|
|12:45 – 13:45||Lunch|
|13:45 – 15:00||Hands-on: getting started with the BitCurator environment|
|15:00 – 15:30||Break|
|15:30 – 17:30||BitCurator Access and NLP: Introduction and rationale|
|17:30 – 18:00||Wrap up, set up, questions|
|18:30||Event dinner (location TBC)|
Friday 8 June
|09:30 – 10:00||Tea and coffee|
|10:00 – 10:15||Recap and introduction to day 2|
|10:15 – 11:15||Hands-on: BitCurator Access and BitCurator NLP Tools|
|11:15 – 11:45||Break and getting set up|
|11:45 – 12:45||Hands-on: Using the tools on your own materials|
|12:45 – 13:45||Lunch|
|13:45 – 14:45||Discussion: Access to born digital materials|
|14:45 – 15:00||Break|
|15:00 – 16:15||Feedback and future|
|16:15 – 16:30||Wrap up|
How do I prepare for the workshop?
We ask all participants to prepare a short talk to introduce yourself, any content you have brought to with you, or are working on, and what you hope to get out of the workshop.
To get the most out of the workshop, we ask you to bring a laptop that meets the following requirements:
- Laptop running Windows 8/8.1/10, OS X 10.12 (or better), or 64-bit linux variant.
- 4GB RAM (minimum)
- 20GB hard drive space free (minimum)
- Intel Virtualization (VTx) extensions must be enabled in BIOS (does not apply to Macs) in order to run 64-bit virtual machines. This process for this will vary from laptop to laptop. Particularly for Dell, HP, and Lenovo machines, the manufacturer may have shipped your laptop with these disabled. Ask your IT contact to enable these, or follow the manufacturer instructions to enable them if you have permissions.
- Current release of VirtualBox installed (https://www.virtualbox.org) along with the VirtualBox extension pack.
- Current release of Vagrant installed (https://vagrantup.com)
- Current release of BitCurator (https://wiki.bitcurator.net/) VM downloaded, uncompressed, and added to VirtualBox manager.
We will provide a selection of sample data, but we strongly encourage you to also bring your own test data.
- We will not provide hardware to image or extract data from media, but if you have disk images that you have extracted from media, those would be ideal.
- Note that the access and natural language processing (NLP) tools must run on text extracted from files. To see a list of supported formats, visit: https://textract.readthedocs.io/en/stable/
OPF charter members are invited to register up to five participants free of charge, affiliate members can register for one place free of charge. Please use the link and code distributed to the members mailing list to apply the discount.
Please note due to the practical nature of this workshop, we are limiting registration to 25 places.
Early bird registration is now open!
Sign up for the early bird rate of £195 at: https://www.picatic.com/opf-bitcurator2018.
This rate is valid until 30 April 2018. The full registration fee from 1 May is £250.
Registration closes on Thursday 24 May.
Venue and accommodation information
The event takes place at the British Library, 96 Euston Rd, London NW1 2DB
A list of nearby hotels will be published soon.