In my previous post, I tried to give a quick overview on Planning and Watch in SCAPE. Here I will shortly outline our main streams of work and summarise some of the achievements to report.
Our key goal is scalability, but one title we sometimes use for this subproject is “Context-aware planning and watch”. It points to the obvious truth that planning should never operate in a vacuum, but instead operates in the context of evolving drivers, constraints, ends and means: Content, users, technologies, policies, legal obligations, format risks… everything can change over time. This means we need awareness of the organisation’s context and policies.
The work package “Policy Representation” will develop ways to represent such context so that planning and monitoring activities can reason on contextual factors. Examples that will directly drive planning and watch processes include (for example!) access requirements, format risks, and significant property specifications.
The work package Automated Watch has the primary goal of developing a monitoring system that continuously gathers information from a variety of information sources. It will allow other systems and persons to sign up for notification events when certain conditions are met. For example, it will collect content profiles and monitor how they evolve, and allow an organisation to receive a notification when the organisation is the last known organisation holding content in a certain format. But it will allow quite a bit more of sophisticated monitoring. If you are interested, please take a look at the report on Watch component design. The work package also develops a simulation environment that can be used to make predictions and simulate the effects of operations over time.
Finally, the work package Automated Planning is working towards increasing the automation available for preservation planning by integrating automated features and services and making the planning process policy-aware. As discussed, planning now is complex and effort-intensive; this work package will change this quite substantially.
Where are we now in these work streams? To know it all, you will have to read the reports and stay tuned… but I will try to make your choice of reading a bit easier here 😉 so here are some of our most interesting results so far:
1. Lessons learned in Preservation Planning
A number of serious preservation planning case studies have been conducted during and after the PLANETS project, with and without support from the core Plato team. In SCAPE, we took a close look and spent quite a while analysing the data of these studies and the experiences gained. Some of the key conclusions of this analysis are outlined in our JCDL article.
2. Decision factors in planning
In planning, you have to evaluate potential preservation actions against your requirements. That includes defining a number of specific decision criteria that are driving the evaluation. We analysed hundreds of decision criteria from more than a dozen case studies, categorised them, mapped them to established quality models and discussed how to measure the different kinds of criteria so that evaluation and decision making is based on real measures instead of vague judgements or industry averages. Our article in JASIST discusses the measurement side of things and the distribution of criteria across different groups.
Now this is quite interesting in itself, but does not tell us how important decision criteria are: If I have created a plan using 35 critera that I considered in my decision, which of these were actually the critical ones? What if I evaluated this one incorrectly? What are the most critical aspects across multiple organisations? How much of a decision about image migration is typically driven by considerations about the target formats?
To answer questions like these, Markus built a tool that he calls the “Knowledge Browser”. It quantifies the impact of decision criteria and can show you what effect different measures have in the real world: It computes impact factors for criteria such as “Average relative filesize resulting from a migration” or “Compound impact of all format-related decision criteria in case studies on images”, all based on real-world case studies from Plato’s knowledge base. In his IPRES article and the SCAPE deliverable D14.1, he discusses the methods and techniques behind this and summarises results. The knowledge browser itself will be part of the new release of Plato, version 4, which we are currently developing. That means that soon you will be able to analyse the importance of decision criteria in real-time, based on (anonymised) decisions taken within the community.
3. A fully automated planning experiment
Trustworthy Planning means that we have to verify every aspect of preservation actions against dozens of criteria using controlled experimentation on sample content. This is not easy if done manually, so of course we want to automate the experimentation to achieve scalability. The preservation components sub project is developing key components for this purpose, in particular Quality Assurance components that verify the results of migration actions.
One preservation planning experiment consists of a number of components put together: For migration, for example, we need to analyse the properties of the sample content (Characterisation), carry out the preservation action we want to evaluate and measure its performance, analyse the output (Characterisation), compare the original and the output (Quality Assurance) and document the results. The resulting measures can then be fed back into planning for analysis and decisions.
From the side of planning, we put together an experiment where every important decision criterion that has to be validated in an experiment is in fact measured automatically. The experiment design focuses on born-digital raw photographs, which present quite a challenge: Each photograph combines raw sensor data with development settings that work much like an analog negative film that is chemically developed to yield a photographic print. A change in the development parameters produces completely different outputs. The main challenge is Quality Assurance for converting proprietary raw photograph formats to the standard Adobe Digital Negative (DNG). Our preliminary results are discussed in this paper published at ICADL 2011, and we have since then worked on combining all these elements using a Taverna workflow that can be called from Plato. More on this soon!
4. Decision making and governance
Scalable control requires standardised governance processes. In collaboration with the SHAMAN project, we have looked at decision making and governance processes and suggested a path to integrating preservation planning and operations into the leading IT Governance framework COBIT. We also developed a simple Capability Maturity Model for Preservation Planning and Operations that shows how operations, control and monitoring fit together and how an organisation can assess and improve their capabilities systematically. These are discussed in articles at ASIST-AM 2011 and IPRES 2011.
5. Automated Monitoring: The Preservation Watch component
Since many factors can change over time in a preservation environment, we need a mechanism to gather information, monitor changes, and react to them in time. These means we need internal and external monitoring capabilities: Internal Monitoring is the ability to monitor operations for certain properties of interest, which include operations specified by plans and operational attributes of the system. External Monitoring is the ability to monitor external influencers of interest. The key goals of the Watch component are:
1. Enable the planning component to automatically monitor entities and properties of interest
2. Enable human users and software components to pose questions about entities and properties of interest
3. Collect information from different sources through adaptors
4. Act as a central place for collecting relevant knowledge that could be used to preserve an object or a collection
5. Enable human users to add specific knowledge
6. Notify interested agents when an important event occurs
7. Act as an extensible component. This last item is particularly important: The Watch component is intended to function as a platform where additional information sources can be connected easily.
All these goals and concepts are detailed and illustrated in much more detail in the latest report on the Watch component architecture, together with a high-level design of the system as we are starting to develop it. The first release of the Watch component is scheduled for this year (2012).
It seems I failed one goal with this post: to “shortly” outline the results… apologies, there’s just too much to report! 😉
Happy planning,
Christoph