Experimentation, simulation and prediction in DP: Emerging challenges

Experimentation, simulation and prediction in DP: Emerging challenges

As many of you may know, Cal Lee, Andi Rauber and myself recently attempted to facilitate a broad discussion on emerging research challenges within the DP community at a workshop at IPRES 2012. We solicited – and received (thanks again to all contributors!) – wide-ranged contributions from Europe, North America, and New Zealand. The invitation was to think outside the box and to think ahead. Ten position papers were invited for presentations:

These presentations set the stage for day-long intense discussions on six tables in rotating groups, involving about 60 digital preservation experts from very diverse fields. The tables were chaired by us as well as Natasa Milic-Frayling (MSR Cambridge), Rainer Schmidt (AIT), and Brian Matthews (STFC). Many thanks again to the participants and moderators for the great discussions! A report from a participant’s perspective (with lots of photographs) can be found here.

The discussion topics were intentionally broad and inclusive. The topics in particular were

  1. Digital information models
  2. Value, utility, cost, risk and benefit
  3. Organizational aspects
  4. Experimentation, simulation, and prediction
  5. Changing paradigms, shift, evolution
  6. Future content and the long tail

We are (still) working on a full workshop report that summarises the key discussion points for each of these tables. We are also thinking about continuing this workshop to evolve into a series, and to feed the content of the challenges into the DP? Research challenges wiki.

For now, I will report on the table I was moderating for the day.  Our topic was experimentation, simulation, and prediction, which has close ties to much of the ongoing work in SCAPE, as well as a new nationally-funded project I am starting these weeks called BenchmarkDP (more on this soon). I can only give a few hints at the greatly varied strands of discussion here and hope I was able to capture some coherent views – if you were on the workshop, feel free to fill in the gaps in the comments!

The discussion started from the observation of a participant that “digital preservation is striving to become a science, yet does in large parts not yet behave like a science“. (I fully subscribe to that statement.) Taking this as a motivation, we analysed the modes and goals of experimentation, model building, simulation, prediction, and hypotheses testing as some of the key aspects that characterise systematic scientific approaches.

As an interdisciplinary field, experimentation in digital preservation will take many forms. This can range from psychological experiments on how individuals perceive difference in order to analyse the notions of authenticity and content value, as well as questions of cultural differences, to the level of large-scale simulation of automated information processes for predictive purposes, as it can be found in fields such as meteorology and physics. Clearly, the rigor that can be found in the latter is a challenging aspiration and may in fact not be an appropriate goal to strive for… Quantitative research also always needs to be  embedded in a context, with a clear understanding of the stakeholders and goals.

A lot of research and development in digital preservation (and in fact a key goal of SCAPE) is focused on improving the reproducibility of experiments. This can also be seen in a number of research goals in SCAPE. However, to a large degree, experimentation in our field is still in an arts-and-craft stage or, as our colleagues from New Zealand phrased so nicely, in the “hobbyist” stage. Moving it to an “industrial” or more scientific stage requires fundamental advances. Well-founded and formally grounded verification and validation approaches are almost non-existent to date. If we aspire to “compare apples with apples”, with known variables, we need to develop a more systematic and coherent approach and address a number of key building blocks.

A clear understanding of the different levels and dimensions of experimentation is required, ranging from a micro-level of digital objects and their constituent “information elements” to the macro-level of content collections, organisations, technologies, and communities.

It is clear that the community direly misses proper frameworks for benchmarking on all of these levels. Going back to the dictionary, we see “1: usually bench mark: a mark on a permanent object indicating elevation and serving as a reference in topographic surveys and tidal observations.  2 a: a point of reference from which measurements may be made. b: something that serves as a standard by which others may be measured or judged. c: a standardized problem or test that serves as a basis for evaluation or comparison (as of computer system performance)

Key building blocks for benchmarking hence need to include (at least)

  • A clear, unambiguous understanding of the processes that we want to measure,
  • a clear set of attributes and indicators for taking measures,
  • a well-defined value system for judging and assessing measures,
  • solid hypotheses that can be tested and falsified,
  • public, openly available data sets that can be shared and referenced,
  • ground truth that annotates these data sets with useful and trusted measures corresponding to the attributes above, where trusted at least means that we know how reliable they are (something which is almost entirely absent in the data sets currently available), and
  • a means for publication of benchmark results and all of the above elements.

(Please comment if you think we missed a building block in our discussion!)

A key focus today, and hence a starting point for future experimentation, is the verification of correctness of tools and processes such as migration, emulation, interpretation, characterisation, etc. This is maybe the most active area of R&D in digital preservation today. But it is hindered by a combination of barriers that still prevent us from actually having these building blocks:

  • There are legal constraints on sharing existing data.
  • It is technically challenging to develop robust benchmark data.
  • There are economic resource constraints on data collection, annotation, sharing, and developing systematic and coherent approaches.
  • There is no central reference point or body to coordinate such benchmarking.

On a specific pragmatic note, as one participant put it in the workshop, “We want carefully variant crap”, i.e. data sets of objects that are carefully designed, and thoroughly annotated, to violate specific constraints and produce specific expected behaviour. These would represent meaningful test data sets that can be used to benchmark the robustness of existing preservation tools and processes and uncover the causes of typical errors. The rationale behind such testing is that only through systematic validation, improvements can be requested, delivered, evaluated, and approved.

In the end, it is only with the above elements that we can start not just to build models, but also to systematically test and falsify them in order to enable better models. This is clearly an area of research that requires a long-term perspective, since our early models will inevitably be too crude to yield accurate and useful predictions. However, it is only through such systematic approaches that we can truly advance the state of art, rather than merely trying incremental product development and testing for immediate problem solving.

Carefully annotated test data collections, however, are just one of the building blocks of experimentation. Systematic experimentation and simulation should include (as non-exhaustive suggestions):

  • Loss: Can we simulate loss on different levels? How do users recognise, locate, perceive and assess loss?
  • Authenticity, significant properties, and equivalence: How should we conduct experiments on authenticity? Can we combine and correlate the equivalence of an “information artefact”, the meaning of an information artefact, and the representation? Can we model the cause-effect relationships between these to build predictive models?
  • Aging: Can we invent mechanisms for accelerated aging, mechanisms that simulate accelerated decay processes?

From the perspective of users, the predictions that would provide tremendous value include questions such as the following.

  • When is this object going to die?
  • What is the total cost of ownership (TCO) of keeping this object alive for X years?
  • What are the expected costs of preservation at time X?
  • What are the marginal costs of increasing life expectancy by X years?
  • At time X, will my collection be preserved successfully?
  • Can we simulate relationships between funding streams and project level results, enabling content holders to claim “If you cut this budget by X, object Y will be in accessible in Z years“?

An interesting comparison came up between life expectancy of digital objects and of living beings. In Actuarial Science, “actuary tables” are used to calculate life expectancy based on a number of factors. The question is: What is the equivalence of this in preservation? What are the genetic, social, economic, cultural, and other factors contributing to life expectancy? And when can we say that a piece of information is “dead”?

Obsolescence needs to be understood in a contextual form: An object is obsolete to a user if the user does not possess the means to access the content of that object at this moment. If it is possible to recover these means, the object becomes accessible to the user. The question, most of the time, will be that of the costs of doing so.  Hence, we obviously have to acknowledge the limitations of such allegories. Objects aren’t really dead (usually)… or are they?

As you notice, we did not arrive at new answers, but quite a few new questions. It is clear that experimentation, simulation and prediction have yet a long way to go in this field, but also hold tremendous insights. Much more systematic approaches are required. There is a clear need to build, publish, test, and falsify models that attempt to explain the critical cause-effect relationships of the phenomena in our domain. So, off to work…

Leave a Reply

Join the conversation