Interview with a SCAPEr – Rui Castro

Interview with a SCAPEr – Rui Castro

Rui Castro

Who are you?

I’m Rui Castro. I work at KEEP SOLUTIONS since 2010 where I have the roles of Director of Infrastructures, project manager and researcher. Before joining KEEP SOLUTIONS, I was part of the team who developed RODA, the digital preservation repository used by the Portuguese National Archives.

Tell us a bit about your role in SCAPE and what SCAPE work you are involved in right now?

My role in SCAPE is primarily focused on Preservation Action Components and Repository Integration.

In Action Components, I’ve worked in the identification, evaluation and selection of large-scale action tools & services to be adapted to the SCAPE platform. I’ve contributed to the definition of a preservation tool specification with the purpose of creating a standard interface for all preservation tools and a simplified mechanism for packaging and redistributing those tools to the wider community of preservation practitioners. I have also contributed to the definition of a preservation component specification with the purpose of creating standard preservation components that can be automatically searched for, composed into executable preservation plans and deployed on SCAPE-like execution platforms.

Currently my work is focused on repository integrations where I have the task of implementing the SCAPE repository interfaces into RODA, an open-source digital repository supported and maintained by KEEP SOLUTIONS. These interfaces when implemented will enable the repository to use the SCAPE preservation environment to perform preservation planning, watch and large-scale preservation actions.

Why is your organisation involved in SCAPE?

KEEP SOLUTIONS is a company that provides advanced services for managing and preserving digital information. One of the vectors that drive us is continuous innovation in the area of digital preservation. In the SCAPE project, KEEP SOLUTIONS is contributing with expertise in digital preservation, especially migration technologies, and with practical knowledge on the development of large-scale digital repository systems. KEEP SOLUTIONS is also acquiring new skills in digital preservation, especially in preservation planning, watch and service parallelisation, we are enhancing digital preservation products and services we currently support, such as RODA, and enhancing relationships with world leader digital preservation researchers and institutions. KEEP SOLUTIONS’ participation in the project will enhance our expertise in digital preservation and that will result in better products and services for our current and future clients.

What are the biggest challenges in SCAPE as you see it?

SCAPE is a big project, from the number of people and institutions involved to the number of digital preservation aspects covered. I think the biggest challenge will be the integration of all parts into a single coherent system. From a technical point of view the integration between content repositories, automated planning & watch and the executable platform is a huge challenge.

What do you think will be the most valuable outcome of SCAPE?

I see two very interesting aspects emerging from SCAPE.

One is the integration of automated planning & watch into digital preservation repositories. Planning is an essential part of digital preservation and it involves human level activities (like policy and decision making) and machine activities (like evaluation of alternative strategies, characterisation and migration of contents). Being able to make the bridge between these two realms and provide content holders the tools to take informed decisions about what to do with their data is a great achievement.

The other is the definition of a system architecture for large-scale processing, applied to the specific domain of digital preservation, that is able of executing preservation actions like characterisation, migration and quality-assurance over huge amounts of data in a “short” time.


Contact information:

Email: [email protected]




Leave a Reply

You might also like…

Post icon

We don’t do migration for the future; we do it for the present: Emulation and an ever so slightly unsatisfying success story

A reminder about obsolescence at Archives New Zealand. Re-evaluating emulation and migration and a digital preservation report card for a file format that simply says: Could. Do. Better.

Post icon

Weirder than old: The CP/M File System and Legacy Disk Extracts for New Zealand’s Department of Conservation

We’ve been doing legacy disk extracts at Archives New Zealand for a number of years with much of the effort enabling us to do this work being done by colleague Mick Crouch, and former Archives New Zealand colleague Euan Cochrane – earlier this year, we received some disks from New Zealand’s Department of Conservation (DoC) which we successfully imaged and extracted what was needed by the department. While it was a pretty straightforward exercise, there was enough about it that was cool enough to warrant that this blog be an opportunity to document another facet of the digital preservation work we’re doing, especially in the spirit of being another war story that other’s in the community can refer to. We do conclude with a few thoughts about where we still relied on a little luck, and we’ll have to keep that in mind moving forward.

Post icon

A Weekend With Nanite

Well over a year ago I wrote the ”A Year of FITS”( blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit that kind of heterogenic data in such large amounts. In the time that has passed since that experiment, FITS has been improved in several areas including the code base and organisation of the development and it could be interesting to see how far it has evolved for big data. Still, FITS is not what I will be writing on today.

Today I’ll present how we characterised more than 250 million web documents, not in 9 months, but during a weekend.

Join the conversation