BitCurator Access and BitCurator NLP: Processing and Enabling New Forms of Access to Born-Digital Materials
Feb 22 : 3:00 PM - 4:00 PM
The BitCurator Access project developed open-source software that support access to disk images through three approaches: building tools to support web-based services, enabling the export of file systems and associated metadata, and the use of emulation environments. The BitCurator Access project also developed a tool to redact files, file system metadata, and targeted bitstreams within disks or directories. BitCurator Access focused on approaches to simplify access to raw and forensically-packaged disk images; allowing collecting institutions to provide access environments that reflect as closely as possible the original order and environmental context of these materials. The use of forensic technologies allows for detailed metadata to be generated reflecting the provenance of the materials, the exact nature of the file-level items they contain, and the metadata associated with both file-level items and data not observed within the file system (but still accessible within the original materials). One of the primary motivations for using the BitCurator and BitCurator Access software is to capture and provide access to contextual information. For example, the original filesystem attributes associated with files (e.g. directory paths, timestamps) can be essential to understanding their provenance and original order. However, there are many other types of contextual information that can be vital to making sense and meaningful use of digital objects.
The BitCurator NLP project has been developing software for collecting institutions to extract, analyze, and produce reports on features of interest in text extracted from born-digital materials. The project is adapting existing natural language processing (NLP) software to identify and report on items likely to be relevant to ongoing preservation, information organization, and access, including entities (e.g. persons, places, and organizations), potential relationships among entities, and topic models to provide insight into how concepts are naturally clustered within the documents. We will demonstrate the current versions of BCA Webtools and the BitCurator NLP tools. The webinar will conclude with a discussion of future directions.
The webinar takes place at 10:00 EST | 15:00 GMT | 16:00 CET and will last approximately one hour.
Cal Lee and Kam Woods, UNC School of Information and Library Science
This webinar is now FULLY BOOKED. Registration is closed.
There are 50 places available on a first come, first served basis. The recording and slides will be made available to OPF members who are unable to attend at this time.