Software Archiving for EaaS

The typical digital artefact or complex object does not function (render, execute, …) without a certain software environment. Emulation-as-a-Service (EaaS) provides original environments running in platform emulators. Depending on the (complex) object to be handled, several software components are required to reproduce an original environment. Often, these components are proprietary and require a software license. The software itself and the licenses need to be preserved to enable the reproduction of the original environments. There are a couple of issues linked to software licenses. The issue can change over time definitely influence EaaS as licenses (and software "patents") expire or local and remote license servers become unavailable. Another interesting point, masively disputed by some software vendors, is the development of a second hand software market.

Software Archive of Standard Components

Software components required to reproduce original environments for certain (complex) digital objects can be classified in several ways. There is standard software such as operating systems and off-the-shelf applications sold in (significant) numbers to customers. There might exist different releases and various localized versions (the user interaction part translated to different languages as is the case for Microsoft Windows or Adobe products) but otherwise the copies were exactly the same. Such software should be described uniquely and kept in a software archive of standard components.

There are several ideas on software identification and description already discussed in this blog (e.g. by Andrew Jackson). DOIs would definitely be helpful to tag software like ISBNs, describe books and other media. These tags would be useful for tool registries like TOTEM, too. Optimally, such software archives are managed by the relevant (national) memory institutions. As the archive's content is comparably small and well described by the tags, the workload can easily be shared (federated) among several institutions. Different ways could be envisioned to stock these archives. Legal deposit, as is well established for books and other media, is one option. Or, software components could be collected on-demand upon object ingest. This option is discussed and demonstrated e.g. by the bwFLA project. It provides necessary interfaces to a software archive, so that all required software components can be collected and described. This is done via observed installation processes which records all the required user interaction to install a certain component. Such additional information is to be stored alongside the standard metadata such as license keys. The successful rendering of the object can be directly validated by the user to verify the complete capture of all relevant components.

Unfortunately, a general, coordinated software archiving is still a partially unresolved issue. There are a several activities going on at the National Archives of New Zealand or the National Library of Australia. These activities are very valuable to the whole community as some of the software producers often do not archive their products very long. Additionally, some companies leave the market and not all assets are maintained. There exist initiatives like vetusware.com which try to tackle this problem but operate in a legally problematic domain. They might go down because of take-down or simply because of running out of funding. Other sources are specialized archives like browsers.evolt.org for web browsers. The drive-by software archiving as run by the Internet Archive might not capture all relevant software as many components were not freely and openly available for download. Especially for older and less popular platforms it becomes more difficult to get hold of obsolete software. Nevertheless, storing and maintaining software components is a prerequisite of the deal. Nevertheless, memory institutions should have special rights to archive software.

Licensing

Every actually running instance of an original environment requires a certain set of licenses depending on the installed or used software. If e.g. a set of presentation slides with embedded audio, video and spreadsheets needs to be rendered, the licenses for the operating system and the presentation software are required. Additionally, audio and video codecs as well as an appropriate spreadsheet renderer needs to be obtained and installed to make the presentation of the object complete. For EaaS a license management component is required to match the number of available licenses to the requested original environments to run. The sources of the licenses could be different and could depend on the user (and institution) requiring access to a certain digital object in its original environment. In a federated EaaS environment run by different institutions, the sharing and handling of licenses becomes an interesting topic, especially if national borders are crossed (e.g. because software vendors try to maintain seperated markets with different pricing).

Within the realm of (national) libraries and archives the licenses of the legal deposit might suffice. For a more open and general service other ways of licensing are required. Either, the software producers offer a specific type of license for that purpose or specifically acquired licenses (e.g. pre-owned license market) are used. Another option is that licenses are obtained (from the original user/producer of the object) when ingesting the particular object. This might be the case for finished (scientific) projects or end-of-life office environments in companies or government organizations. At the moment, licenses are often just thrown away like used IT equipment. For the future a more elaborate digital lifecycle management should be put in place. With the planning and beginning of a project the licensing of all required components should be secured for the complete intended lifecycle of a particular object.

Custom Made Software Components

Not for all software components a (federated) software archive of standard components makes sense. In many domains custom made software and user programming plays a significant role. This could be scripts or applications written by scientists to run their analysis on gathered data, run specific computations or extend existing standard software packages. Other examples are software tools written for governmental offices or companies to produce certain forms or implement and configure business processes. Such software is to be taken care of and stored alongside the preserved object. The same applies for complex setups of standard components with lots of very specific configurations. In these cases it could make sense to preserve the system as a whole (see blog post on that topic for full system preservation).

Pre-Produced and On-Demand Original Environments

EaaS allows to centralize services and share the efforts. This could be especially useful to re-use pre-produced original environments of standard components. Depending on the type of user – if rendering the object within the premises of the memory institution or being from some commercial entity or a private person – different ways of the (re)production of original environments could be chosen:

  • Complete environments together with the required metadata to run it in the chosen virtual machine or emulator. This would be the method to deploy for imaged complete systems.
  • Reproduce the complete environment from standard components using the license information delivered by the user together with the object to render. This may take a while as the setup procedure needs to be completed. The bwFLA project started to implement workflows to gather all the required metadata and user interaction to automatically reproduce such steps.
  • Re-use existing environments from a "cache" (pre-produced environments). This should be possible for in-house use or as an external service if the required type and number of licenses is available. Here a couple of legal concerns might prove problematic as many licenses may not explicitly allow software lending.
  • Partially re-use pre-configured environments if licenses are less problematic and just add the problematic/proprietary component.

Several ways were described to automatically re-produce certain environments e.g. for Windows operating systems (link) or as researched within the bwFLA context. Nevertheless, these procedures take time to complete and extend the time span till an artefact or original environment can be presented to the user.

1
reads

3 Comments

  1. bram van der werf
    April 5, 2013 @ 2:17 pm CEST

    Hi Euan,

    Fully agree with your observation. Preservation of digital objects is about avoiding risk and potential damage. Once rendering issues occur one needs to be able to take the right action.  Keeping objects and their respective formats  as they are untill you experience rendering problems is probably the savest way to avoid damage.  A combination of having legacy software available and supporting technologies such as emulation seems very appropriate and feasible.

    It would take some negotiations with software vendors to waive licenses of their legacy software, but I am convinced that if we are able to clearly explain purpose and legal context this will be a good option for them as well. Enterprises like to be seen as "responsible citizens". Enterprises do not want to exploit their legacy software anymore, so it would mean that a trusted organization would need to develop a kind of escrow service within the right legal and financial framework. 

    With the above in mind OPF conducted a survey (sponsored by Microsoft Research) amongst its members in summer 2012. We asked if members would like to have a access to legacy software and if this would represent an economical value to them. Most liked the idea, however did not consider it important and more importantly would only value such a service if it would be for free.

    To run a reliable escrow service for legacy software the hosting organization needs to test this legacy periodically against new technology stacks.  Hosting costs are not dramtically but it always takes technically qualified people to manage and maintain the service. A free service is therefor not an option at all.  Based on the survey we parked the plan to further develop the idea. 

    I would be very happy to re-open the intiative once there is a more realistic view on the value and associated costs for running such a service.

  2. Dirk von Suchodoletz
    April 4, 2013 @ 7:25 am CEST

    To "test the appetite of the community" we started a demo service available to OPF members. This one demonstrates the use for curation of digital art but more use cases to follow. Another use case/demo in preparation is e a migration service using emulation. Exactly because of the yet unclear licensing issues we did not open the service for the general public. Another demo with more background information will available at the upcoming hackathons in Copenhagen and Chapel Hill.

  3. ecochrane
    April 5, 2013 @ 2:41 am CEST

    EaaS in general seems to have huge potential for disrupting the digital preservation landscape. Combined with a central software archving and licensing agency that worked across jurisdictions to provide software as needed, the applicability of the concept is immense.

    While the potential is obvious to me it might be worthwhile to gauge support in the community in order to be better prepared for engaging wiht the licensing issues. Have you considered a survey of the community to investigate appetite for uptake of such services?

Leave a Reply

Join the conversation