Behind the scenes at a SCAPE Project workshop

PDF Eh? – Another Hackathon Tale

I’ve just spent 3 days in Portugal working with colleagues from the EU funded SCAPE Project. We’re part of a large Integrated Project that is developing an array of new solutions to preservation challenges. Some of these solutions, at the atomic level, may be just simple command line tools that implement tightly defined functions (eg. identify the format of a file). At the macro level we’re coordinating what you might call these micro services using Taverna workflows, integrating with our data stores so we can apply these services to our data, running the services on parrallel computing clusters so we can run them quickly and at scale, and deciding when, where and how to do all this using automated preservation planning and watch services. Sounds complicated? Lots of interfaces and dependencies? Oh yes.

 

A whole variety of communication and coordination channels exist to help us turn these challenging goals into a set of integrated and useful results. The project has a structure of sub projects and workpackages, each with an associated management and coordination role. A technical coordination committee exists to ensure join up and consistency of the technical approaches that are pursued. We have mailing lists, regular skype calls and use a wiki to coordinate datasets, preservation issues and the solutions we develop. We’re also beginning to use the social networking style communication channels available in the Taverna workflow system (which I’m sure will become an important communication channel by the end of the project).

 Blog time! Everyone at the workshop was tasked with blogging about their experiences....

Despite all these comms channels, sometimes you just need to get everyone in the same room and hack through the issues you’re facing. By working through some group exercises its possible to break down some of the communication barriers and get our project collective (who on a usual working day are in organisations spread across Europe) working together as a team. And thats what we’ve been up to in Portugal over the last few days. Along with colleagues from my team at the British Library, and colleagues from the OPF we had the responsibility for organising and facilitating the first of a series of cross project technical workshops. As well as the technical focus we’ve also brought in a handful of practitioners from our content holding partners, enabling us to look at the business focus of our solution development and think about how we will apply our new developments to collections and real preservation problems in our institutions in order to evaluate them. As we approach the end of the first year of the project, we’re beginning to consider how we will show what we’ve been working on to our funders. So we’ve focused minds on how we could turn the great work we’ve been doing during the first year of the project into punchy demos that show how we’re beginning to solve real digital preservation challenges.

 

As often happens with these kinds of events our initial planning quickly went out of the window and we did some rather agile changes to the schedule in order to tailor our activities to the audience and to try and encourage more of the some of the great interactions and group work we saw on the first day.

 

So what did we do? We began with a quick demonstration from Carl Wilson on some work we’ve been doing in applying new SCAPE solution (in this case a JPEG2000 characterisation tool) to a problematic collection at the British Library. The complete story, as told by Carl, was an embryonic version of how we might communicate SCAPE’s work in this area to the outside world. We then had 1 minute lightning talks from all the participants, introducing themselves and describing what they’ve been working on. From this information we selected 4 quite fully realised developments from the first year of SCAPE that we could turn into demos. We then split up into groups and did some brainstorming around how we could shape these demos. We thought about the specific examples we would use (datasets, issues and workflows), we considered the value or benefits of the work and how it will advance the state of the art, and we looked at what we would actually show in the formal demo for our funders (live demo, visualization, case study, taverna workflow, results, analysis (eg. performance), statistics and so on). The four areas we explored were: deep JPEG2000 validation, web archive characterisation, quality assurance of image collections and tool packaging and installation.

 

After reporting back, selected individuals went off to work on the demos. The rest of us did some brainstorming on gaps that we currently have in our Scenarios, as well as any other integration or coordination issues that have cropped up. We resolved quite a lot of these issues on the fly and in some follow up discussions and captured the rest of the issues on an etherpad page. We’ll be following up on these over the next few weeks. We then set in motion a couple of more groups, one focused on preservation watch and another refining some existing work from a previous event. This is developing a set of workflows that generate a variety of lossily compressed images from a test dataset, OCRs the resulting images, and then collates the OCR accuracy rates. In other words, it provides a handy analysis tool for getting your compression levels right, without impacting on future use (such as re-OCR with better, future OCR engines).

 

At the end of the event our groups ran through their demos and we had a chance to discuss the results and look for any gaps in what we covered. We also got everyone blogging about the work that they’ve done, and these posts should be appearing here on the OPF site over the next few days and weeks. I’m really excited by the great work that the project has delivered in just the first year of the project.

 

As well as coming up with some great demos and resolving various challenges we had with the project, we’ve hopefully got to know each other better, got better at working together as a team, and have the foundation to keep this increased team working going once we get home from Portugal and are once again separated by geography and time zones. We’ll be running further workshop events over the next two and half years of the project. We’re expecting these to gradually move from development focused internal events to outwardly focused training events, where we will invite along practitioners from outside of the project and help them apply SCAPE solutions to their digital preservation challenges.

5
reads

Leave a Reply

Join the conversation