As you may or may not know, C3PO is a content profiling tool for preservation analysis.
It reads in characterisation meta data and gives you the possibility to aggregate
it and/or to visualise it.
The first versions of C3PO generated quite a lot of interest within the digital preservation community, which was a clear sign to me that it has the potential to become a valuable asset to the standard digital preservation tool belt.
These first versions were more of a prototypical nature, where the problem was explored as well as integration interfaces with other tools were defined.
Thanks to the SPRUCE Project and the award I won, I had the chance to spend a month of work on the codebase and to improve many issues. The goal was to create a stable version of the codebase with clearly defined interfaces and guides, as well as default implementations as examples. This should lower the entry barrier for third party developers, so that the community can take the code base and develop/extend it in any desired direction.
Today, I am releasing version 0.4 of the core and command line of C3PO. The codebase is completely documented and waiting to be starred and forked. If you just want to download it and use it, you can use this bin tray download link here.
As of today C3PO is released under the Apache 2.0 license.
Although, the only change you can directly see is the new logo:
this new version offers significant improvements in the core of the framework. I would like to give you a short overview of some of them and why they are important.
The most significant change is the abstraction of the persistence layer. C3PO uses a Mongo database as a default persistence. Many developers in this community have expressed their concerns about the dependency of the Mongo Database and the tangled code. Well, version 0.4 completely abstracts the persistence layer. If you want to use a different backend, you have to implement a single class and plug it in – so if you are an HBase expert, please consider contributing :).
The second improvement are the new filtering capabilities. With the new enhancements, the users will be able to create a bit more flexible filters, which should enable them to find out even more interesting aspects of their data. Once the Web application is updated to make use of these new changes, then a significant improvement in the UI responsiveness will be achieved, due to filter and result caching as well as a number of bug fixes.
The third major improvement is that C3PO now allows consolidation of meta data coming from different sources. This means that if you have characterisation data coming from e.g. FITS and TIKA for the same digital objects (with the same identifier), C3PO will automatically consolidate the data. This will allow to reduce the sparsity that we currently see through many different data sets.
The new release includes also numerous other improvements and bug fixes in the core.
How does the future look like?
Well, I will continue to maintain the repository and will make sure the web application catches up with all these new changes of the core framework within the next months.
My former colleagues from the University of Technology in Vienna and partners from the SCAPE project (Thank you guys, you all rock) will continue to develop and maintain the codebase in order to overcome the next scalability boundaries.
A Roadmap for the foreseeable future can be found here. It will be updated in the coming weeks.
What can you do?
If you are a user and you think C3PO is or can be valuable to you or your institution, please try it out, give feedback (it is very important!), report issues and contribute to the ROADMAP.
If you are a developer, please star the repository on github and give the tool a try.
If you want to contribute, it is easy. You can start by reading the dev guide here. You can report issues here. Take a look at the open issues or at the Roadmap and pick up something you find useful.
For example, writing a new meta data adaptor is the easiest and requires implementing one method. If you have more time and knowledge about HBase, consider providing an HBase Persistence Layer.
I hope that this will make C3PO more useful and that the community will not hesitate to take the tool, use it, tear it apart and shape it according to the current needs. If you have any questions or feedback, please drop me a line at email@example.com
Last but not least, I want to thank the SPRUCE project and especially Paul Wheatley and Carl Wilson for the opportunity and for their help and support!
P.S. on the 31 of May, there will be a webinar on C3PO hosted by the OPF – if you are interested, please check it out. At the end, I will join and try to answer your questions.
The Austrian Centre for Digital Humanities (ACDH) of the Austrian Academy of Sciences, Austria’s leading non-university research facility, is offering a position as Data Analyst with expertise in NLP and Semantic Web technologies. You will join an internationally networked team, pursuing research on the border between ICT and humanities studies. Your work will be focused on […]
The general theme of TPDL 2019 is “Connecting with Communities”. Digital libraries and repositories store, manage, represent and disseminate rich and heterogeneous data that are often of enormous cultural, scientific, educational, artistic, and social value. Serving as digital ecosystems for empowering researchers and practitioners they provide unparalleled opportunities for novel knowledge extraction and discovery. TPDL […]