Organisational readiness for Open Source
The demand for mature tools and services that support the digital preservation process is strong and growing stronger – and for a good reason.
Sure, a substantial number of research and grant-funded projects have delivered software and tools – and these tools are freely available on SourceForge, in accordance with the requirements of funders. But, what has happened to them? Most have become orphans because they have been abandoned. This is why I sometimes call SourceForge half-jokingly a “software cemetery”.
The tools and open source projects that have “survived” beyond the grant period, have done so thanks to some degree of prolonged investment in maintenance and development by one or two institutions who have “adopted” the software, after the project ended. Typical examples are DROID (National Archives UK), JHOVE (JSTOR/Harvard) and FITS (Harvard).
It is claimed that some of these tools are widely deployed in the community and firmly embedded in institutional digital preservation processes. But the number of software downloads is not a measure of deployment. You will read in most digital preservation survey reports that these same tools are not meeting the needs of the community. At conferences, you will hear complaints about the performance of the tools. BUT, most strikingly, when visiting the sites where these tools are downloadable for free, you will see no signs of an active user community reporting bugs and submitting feature requests. The forums are silent. The open source code is sometimes absent and there are neither community building approaches nor procedures in place for committing code to the open source project.
Why is it so difficult for digital preservation tools and software in the open source to achieve maturity? I contend that this is due to the fact that, in our community, we talk a lot about open source but in fact we don’t “do open source”. Most institutions prefer to act as critical “consumers” of open source software, instead of being constructive “contributors”. Most developers of open source tools are not actively engaged in a shared open source project, they are developing for the needs of the institution that employs them.
In this blog I propose to explain how the community might benefit more from a reciprocal approach to Open Source and the organisational consequences.
Understanding Open Source
At the heart of the “consumer” behaviour lies a very limited understanding of what open source development entails.
I mentioned the funders of software development projects, who have realised that sharing software deliverables as open source increases the chances of re-use. What they fail to understand however is that dumping code on SourceForge is not a guarantee for sustainability.
Within research and cultural heritage institutions – and in particular at middle and higher management levels, open source is quite popular. To a certain extent this is due to bad experiences with vendor lock-in solutions in past years, but it also fits in the spirit of the time. The polarised stance against vendor solutions and for open source solutions is based on quite naïve assumptions. Moreover it is not conducive to constructive collaboration based on trust – which lies at the basis of successful partnerships with vendors and open source communities alike. On a more opportunistic note, open source is also conceived as “software for free” and thus an attractive option to reduce the costs of ICT-solutions. What the managers fail to understand is that consuming open source is not for free, that it can actually be very costly if they do not have the right expertise and skills in house to integrate open source in their ICT-environments.
The ICT-staff (system administrators, software developers and their managers) in such organisations is largely selected to manage standard office automation environments and as a result they have little affinity with open source and lack the necessary skills-set to carry out open source development. They use a lot of stable and mature open source tools and software on a daily basis, such as Apache web servers, but again, in such cases they act as consumers and are hardly aware of the development model driving the product. As software consumers, they are by default more inclined to use commercial solutions because of the support they get and the SLA-based guarantees.
In academic research units, developers are usually employed on a project basis to build tools that support the short-term needs of research (e.g. a database, a visualisation tool, etc.). These developers are often working on their own and largely dependent on open source software – again, as consumers. Ask your favourite developer what he thinks of open source development and you will be surprised to hear mostly negative reactions. Open source developers are often considered by their colleagues (non open source developers) as small entrepreneurs, who develop bad code deliberately so that they can sell their expertise to consumers who have trouble in making the software work in their environment. A lot of the misunderstanding has to do with fear of loss of control and fear of sharing code with peers.
All these observations lead me to the conclusion that there is widespread misconception of what open source software development entails. As long as this is the case, there will continue to be too little cultural and organisational readiness in the community to really embrace such an approach for digital preservation.
Doing Open Source the right way
We can distinguish between 3 types of open source activities: 1) using available open source software (consumer), 2) developing software and making it available for free in the open source (contributor) and 3) community-wide open source development (open source project).
From the previous, it is clear that in the digital preservation community we primarily act as consumers (1) and contributors (2). Both are unilateral forms of activities. The third type of activity however, is reciprocal and therefore much more effective and interesting. Most successful open source software activities are of this type, to name just a few: Apache, Linux, Firefox, Drupal.
What are the characteristics of community-wide open source development?
First of all, such activity is not a temporary, grant-driven project. Although misleadingly referred to as “open source project” – it is not a project at all but rather a process activity. A project is by definition limited and constrained by scope, funds and deadlines. A process, in terms of quality management, is a continuous cycle of improvement. Open source is a software development process. The software is incrementally improved in cycles (inception, elaboration, construction, transition) which are the Digital Age incarnation of Quality Circles and PDCA (Plan, Do, Check, Act) – see my previous blog.
Digital preservation is not a project either: it has no predefined deliverables or predetermined results. Preservation is a continuous process that tries to respond to the challenges of the day. The underlying software tools need to be developed in close relation with the preservation process and practices. Both, the digital preservation process and the software development process, go hand in hand. Remember, it is all about learning by doing. And it is about responsiveness to change – in software development this translates into short and fast moving iterative cycles.
Secondly, the Open source approach is all about sharing and collaborating. It presumes a shared interest and a shared purpose. It leads to shared benefits and shared rights. An open source community collaborates at all levels: at the strategy level, software development, testing and maintenance levels. Most importantly, there is agreement on shared requirements. In the digital preservation community we tend to cultivate our differences and to think in terms of “what are the benefits of open source to my organization?”. Even national libraries, which form a closely knit community, are unable to join hands in defining their process requirements. Each is focused on its own, specific and customized Ingest process. Each chooses and implements different approaches and divergent solutions – ranging from open source, vendor and in-house development solutions. There is little open exchange of lessons learned from which the community as a whole can learn. These are not signs of a strong community that is able to develop and maintain its own robust and sustainable solutions. It is the sharing by many that makes the load bearable – digital preservation is a task that no organization can carry out on its own – not the big ones, not the small ones. For open source and for commercial solutions alike, the same economic principles apply. It is all about scale. The more users choose for the same solution, the more economic it will become. Our goal should be to turn digital preservation tools into commodities.
Thirdly, Open source is based on trust. It is an environment without contracts, SLA’s and formal liabilities. All community members are equally responsible and share the successes and the failures. All are peers. There is no vendor-customer relationship. All contribute and consume; the big players and the small ones alike. They share their experts and resources on the basis of reciprocity. Experts and leaders in this environment are selected on the basis of meritocracy and not on the basis of seniority or institutional affiliation. The open source approach has to do with the ability of organizations to learn from and contribute to their peers without expecting to get something back. It is all about unselfishness and trust.
Building constructive and successful partnerships with vendors is also based on trust, not on SLAs. Defining joint requirements as a user group of vendor solutions is also necessary for turning “bespoke” commercial tools into commodities. In many ways, open source and vendor approaches are alike in terms of the pre-conditions necessary to be successful and to achieve economies of scale. There are however, a few fundamental differences between a vendor and an open source approach. One difference has to do with innovation. A commercial solution is constrained by SLAs and revenue models, etc. It will try to avoid bleeding-edge technology and will tend to be driven by more conservative demands – to keep the customer base happy. In contrast, the free and collaborative nature of open source communities can be more conducive to out-of-the-box thinking and responding to technology trends. Another difference has to do with investment of resources. At the end of the day, one buys products and services from a vendor, but one invests expertise and personnel in open source solutions. The benefit of open source is not immediate: it is an investment. More specifically, it is a long-term investment in people, an investment in the organization. This is why the title of this blog is “Organizational readiness for Open Source”.
Are we ready?
One would expect that the open source approach would be a perfect fit for the public sector: no spending of tax-money on expensive commercial solutions, opportunity to deploy own resources in optimal ways, etc. In reality however, the public sector has institutionalized bureaucratic organizational and financial practices that are in every respect conflicting with Open source practices: measures that favor the outsourcing of tasks and expertise, thresholds to public expenses and requirements to tender, limited flexibility to deploy human resources, limited investment possibilities, etc. It is often far more difficult for public sector institutions to be involved in open source activities than it is to buy commercial solutions.
The underlying philosophy of open source originates from the Free Software movement and the concept of the digital commons – which is based on trust, diversity and reciprocity. The digital commons only exist by virtue of self-control, collaboration, intellectual freedom and freedom to act. During the last decade many public institutions have embraced the use of Open Source Software and started to adopt Creative Commons licenses to make cultural heritage freely available in the digital commons. Europeana’s advocacy for CC0-licensing is an illustrative example. Yet, the logic of rights & obligations and the drive to control, regulate, standardize and register is deeply embedded in the organizational DNA of public sector institutions. Will the organizational culture in the public sector be able to adapt to the uncontrolled, trust-based open source software practice?
I have highlighted many aspects and characteristics that demonstrate why the digital preservation community is probably not yet ready to embrace a full open source approach – but more importantly, why it might not even be a desirable or realistic objective. Still, there are compelling arguments for the digital preservation community to start working together as an open source software community: namely, to foster innovation, to break through the resistance of industry and memory organisations to change, and to invest in a shared pool of experts and skilled people. The OPF Hackathons serve as a venue for digital preservation practitioners, (open source) tool developers and vendor participants – a venue where the needs, the possibilities and the constraints are brought together with the goal to arrive at deployable solutions.
Ideally, the digital preservation community will make use of the best of both worlds: the open source solutions to drive innovation and commercial solutions to deploy and commoditise robust services.
garymcgath
January 28, 2013 @ 1:00 pm CET
Thanks for shedding some realistic light on open source. Too many people think of it just as a resource that's available at no cost; I've been guilty of that myself too often. The airy claim that if there's a problem with an open-source application you can just change it, flies in the face of practical reality too. Open source runs into the "tragedy of the commons" problem: The optimum strategy is to let other people do the contributing. Not all institutions can usefully participate in maintaining an open-source product, but it would be nice if "tipping" for OS software became as habitual a practice as tipping in a restaurant.
I need to make one factual correction. JHOVE didn't survive beyond its initial grant period because of "adoption" by any institution. Harvard did not budget any time for its continuing support, but only allowed me to spend some of my working time continuing to support it, when the demands of budgeted projects weren't more urgent. Since August it's been completely separate from Harvard and JSTOR. There was a community of sorts for JHOVE at one time, with four or five people making significant contributions. Participation has gradually dropped off, perhaps because JHOVE is getting to be an old product.
bram van der werf
January 28, 2013 @ 12:46 pm CET
Thanks Johan,
Testing is very important. In commercial software development projects it is often a major part of the project. Main reason is to reduce liability and potential legal claims. Bugs and fixes are seen as major vehicles to drive product improvement. And bugs reported by users and practioners should be taken seriously.
We really need to change the idea that there is a Digital Preservation developer and a practioner world, we are all in the same community. A tool / solution cannot exist whithout real users and real users will always discover bugs. Mature Open Source intiatives are almost whithout exception able to deal with both users and developer reported bugs, issues and feature requests.
In Scape we evaluate and test tools from allover the world, how many bugs did we report to the developers? Maybe a performance issue is not a bug, but it can at least be reported as feature request.
I know you have the good habit to do this and I would really like to ask you to encourage your colleagues to do the same.
bram van der werf
January 28, 2013 @ 12:20 pm CET
Thanks Andy,
Agree with what you wrote and "uncontrolled" would have been better worded as "less formal control".
I also wrote that "Experts and leaders in this environment are selected on the basis of meritocracy and not on the basis of seniority or institutional affiliation". We know that in many Open Source initiatives this actually implies a very firm, transparant and open type of informal control with robust and documented processes.
Main reason for this slip of the pen was to stress the cultural difference between the Public/Institutional world and the Open Source world. There is lots of literature about how organisations are able to deal with change and practice tells that cultural change is most challenging. Are we able to modify some of the institutional formal control DNA and replace it with community, merit and reciprocity sense?
johan
January 28, 2013 @ 12:13 pm CET
One small (but I think important) addition: your description of 'community-wide open source development' also includes non-development activities, which can be as simple as testing software and reporting any issues and bugs back to the developers. You briefly mention this further on, but I think people often underestimate the importance of this.
Just an example to illustrate my point here: When I did my initial evaluation of identification tools for SCAPE in 2011, I made sure to report my findings back to the tool developers. What really surprised me was that most of them responded that it was quite unusual for them to receive any feedback at all! That doesn't stop many people from endlessly complaining about these tools (as mentioned also by Bram). With this in mind the slow pace (and sometimes lack) of any progress isn't exactly surprising.
Last year I did a report on the EPUB format that included some recommendations for improvement of the Epubcheck validation tool. This got picked up by its developers, and the recommendations from that report (more elaborate metadata extraction, XML output handler) are all included in the latest version of Epubcheck. Similarly, while doing some tests with the Apache Preflight PDF/A validator I came across a number of bugs. I reported those back to the developers, who fixed them briefly thereafter. This is just to illustrate that it is possible to contribute to open source development even without any actual 'development' whatsoever, but that does require a change from passive complaining (which I think is far too common in the digital archiving community) to a more constructive involvement.
andy jackson
January 28, 2013 @ 11:25 am CET
Nice post, but I would disagree that open source projects are 'uncontrolled'. Successful open source projects are very firmly, but very openly, controlled – indeed, ensuring an open meritocracy is precisely what the Apache Foundation 'incubation' period is all about. As you say, open source projects are a great way of carrying out more experimental and exploratory work, but then again, every successful OS project has something like a 'stable' and a 'development' branch, to keep the current and future users happy (so to speak).
However, true engagement with an open source project does mean reliquishing (some) control, and being willing to work with a broader community to establish a common path. To do so, compromise may well be necessary. My suspicion is that the uncertainties introduced when linking project funding to an external community are all to easy to see and to recoil from. We can combat this by being clear about the potential benefits of open source projects, and by structuring our open source projects so that they are openly managed and demonstrably dependable.