Organisational readiness for Open Source
The demand for mature tools and services that support the digital preservation process is strong and growing stronger – and for a good reason.
Sure, a substantial number of research and grant-funded projects have delivered software and tools – and these tools are freely available on SourceForge, in accordance with the requirements of funders. But, what has happened to them? Most have become orphans because they have been abandoned. This is why I sometimes call SourceForge half-jokingly a “software cemetery”.
The tools and open source projects that have “survived” beyond the grant period, have done so thanks to some degree of prolonged investment in maintenance and development by one or two institutions who have “adopted” the software, after the project ended. Typical examples are DROID (National Archives UK), JHOVE (JSTOR/Harvard) and FITS (Harvard).
It is claimed that some of these tools are widely deployed in the community and firmly embedded in institutional digital preservation processes. But the number of software downloads is not a measure of deployment. You will read in most digital preservation survey reports that these same tools are not meeting the needs of the community. At conferences, you will hear complaints about the performance of the tools. BUT, most strikingly, when visiting the sites where these tools are downloadable for free, you will see no signs of an active user community reporting bugs and submitting feature requests. The forums are silent. The open source code is sometimes absent and there are neither community building approaches nor procedures in place for committing code to the open source project.
Why is it so difficult for digital preservation tools and software in the open source to achieve maturity? I contend that this is due to the fact that, in our community, we talk a lot about open source but in fact we don’t “do open source”. Most institutions prefer to act as critical “consumers” of open source software, instead of being constructive “contributors”. Most developers of open source tools are not actively engaged in a shared open source project, they are developing for the needs of the institution that employs them.
In this blog I propose to explain how the community might benefit more from a reciprocal approach to Open Source and the organisational consequences.
Understanding Open Source
At the heart of the “consumer” behaviour lies a very limited understanding of what open source development entails.
I mentioned the funders of software development projects, who have realised that sharing software deliverables as open source increases the chances of re-use. What they fail to understand however is that dumping code on SourceForge is not a guarantee for sustainability.
Within research and cultural heritage institutions – and in particular at middle and higher management levels, open source is quite popular. To a certain extent this is due to bad experiences with vendor lock-in solutions in past years, but it also fits in the spirit of the time. The polarised stance against vendor solutions and for open source solutions is based on quite naïve assumptions. Moreover it is not conducive to constructive collaboration based on trust – which lies at the basis of successful partnerships with vendors and open source communities alike. On a more opportunistic note, open source is also conceived as “software for free” and thus an attractive option to reduce the costs of ICT-solutions. What the managers fail to understand is that consuming open source is not for free, that it can actually be very costly if they do not have the right expertise and skills in house to integrate open source in their ICT-environments.
The ICT-staff (system administrators, software developers and their managers) in such organisations is largely selected to manage standard office automation environments and as a result they have little affinity with open source and lack the necessary skills-set to carry out open source development. They use a lot of stable and mature open source tools and software on a daily basis, such as Apache web servers, but again, in such cases they act as consumers and are hardly aware of the development model driving the product. As software consumers, they are by default more inclined to use commercial solutions because of the support they get and the SLA-based guarantees.
In academic research units, developers are usually employed on a project basis to build tools that support the short-term needs of research (e.g. a database, a visualisation tool, etc.). These developers are often working on their own and largely dependent on open source software – again, as consumers. Ask your favourite developer what he thinks of open source development and you will be surprised to hear mostly negative reactions. Open source developers are often considered by their colleagues (non open source developers) as small entrepreneurs, who develop bad code deliberately so that they can sell their expertise to consumers who have trouble in making the software work in their environment. A lot of the misunderstanding has to do with fear of loss of control and fear of sharing code with peers.
All these observations lead me to the conclusion that there is widespread misconception of what open source software development entails. As long as this is the case, there will continue to be too little cultural and organisational readiness in the community to really embrace such an approach for digital preservation.
Doing Open Source the right way
We can distinguish between 3 types of open source activities: 1) using available open source software (consumer), 2) developing software and making it available for free in the open source (contributor) and 3) community-wide open source development (open source project).
From the previous, it is clear that in the digital preservation community we primarily act as consumers (1) and contributors (2). Both are unilateral forms of activities. The third type of activity however, is reciprocal and therefore much more effective and interesting. Most successful open source software activities are of this type, to name just a few: Apache, Linux, Firefox, Drupal.
What are the characteristics of community-wide open source development?
First of all, such activity is not a temporary, grant-driven project. Although misleadingly referred to as “open source project” – it is not a project at all but rather a process activity. A project is by definition limited and constrained by scope, funds and deadlines. A process, in terms of quality management, is a continuous cycle of improvement. Open source is a software development process. The software is incrementally improved in cycles (inception, elaboration, construction, transition) which are the Digital Age incarnation of Quality Circles and PDCA (Plan, Do, Check, Act) – see my previous blog.
Digital preservation is not a project either: it has no predefined deliverables or predetermined results. Preservation is a continuous process that tries to respond to the challenges of the day. The underlying software tools need to be developed in close relation with the preservation process and practices. Both, the digital preservation process and the software development process, go hand in hand. Remember, it is all about learning by doing. And it is about responsiveness to change – in software development this translates into short and fast moving iterative cycles.
Secondly, the Open source approach is all about sharing and collaborating. It presumes a shared interest and a shared purpose. It leads to shared benefits and shared rights. An open source community collaborates at all levels: at the strategy level, software development, testing and maintenance levels. Most importantly, there is agreement on shared requirements. In the digital preservation community we tend to cultivate our differences and to think in terms of “what are the benefits of open source to my organization?”. Even national libraries, which form a closely knit community, are unable to join hands in defining their process requirements. Each is focused on its own, specific and customized Ingest process. Each chooses and implements different approaches and divergent solutions – ranging from open source, vendor and in-house development solutions. There is little open exchange of lessons learned from which the community as a whole can learn. These are not signs of a strong community that is able to develop and maintain its own robust and sustainable solutions. It is the sharing by many that makes the load bearable – digital preservation is a task that no organization can carry out on its own – not the big ones, not the small ones. For open source and for commercial solutions alike, the same economic principles apply. It is all about scale. The more users choose for the same solution, the more economic it will become. Our goal should be to turn digital preservation tools into commodities.
Thirdly, Open source is based on trust. It is an environment without contracts, SLA’s and formal liabilities. All community members are equally responsible and share the successes and the failures. All are peers. There is no vendor-customer relationship. All contribute and consume; the big players and the small ones alike. They share their experts and resources on the basis of reciprocity. Experts and leaders in this environment are selected on the basis of meritocracy and not on the basis of seniority or institutional affiliation. The open source approach has to do with the ability of organizations to learn from and contribute to their peers without expecting to get something back. It is all about unselfishness and trust.
Building constructive and successful partnerships with vendors is also based on trust, not on SLAs. Defining joint requirements as a user group of vendor solutions is also necessary for turning “bespoke” commercial tools into commodities. In many ways, open source and vendor approaches are alike in terms of the pre-conditions necessary to be successful and to achieve economies of scale. There are however, a few fundamental differences between a vendor and an open source approach. One difference has to do with innovation. A commercial solution is constrained by SLAs and revenue models, etc. It will try to avoid bleeding-edge technology and will tend to be driven by more conservative demands – to keep the customer base happy. In contrast, the free and collaborative nature of open source communities can be more conducive to out-of-the-box thinking and responding to technology trends. Another difference has to do with investment of resources. At the end of the day, one buys products and services from a vendor, but one invests expertise and personnel in open source solutions. The benefit of open source is not immediate: it is an investment. More specifically, it is a long-term investment in people, an investment in the organization. This is why the title of this blog is “Organizational readiness for Open Source”.
Are we ready?
One would expect that the open source approach would be a perfect fit for the public sector: no spending of tax-money on expensive commercial solutions, opportunity to deploy own resources in optimal ways, etc. In reality however, the public sector has institutionalized bureaucratic organizational and financial practices that are in every respect conflicting with Open source practices: measures that favor the outsourcing of tasks and expertise, thresholds to public expenses and requirements to tender, limited flexibility to deploy human resources, limited investment possibilities, etc. It is often far more difficult for public sector institutions to be involved in open source activities than it is to buy commercial solutions.
The underlying philosophy of open source originates from the Free Software movement and the concept of the digital commons – which is based on trust, diversity and reciprocity. The digital commons only exist by virtue of self-control, collaboration, intellectual freedom and freedom to act. During the last decade many public institutions have embraced the use of Open Source Software and started to adopt Creative Commons licenses to make cultural heritage freely available in the digital commons. Europeana’s advocacy for CC0-licensing is an illustrative example. Yet, the logic of rights & obligations and the drive to control, regulate, standardize and register is deeply embedded in the organizational DNA of public sector institutions. Will the organizational culture in the public sector be able to adapt to the uncontrolled, trust-based open source software practice?
I have highlighted many aspects and characteristics that demonstrate why the digital preservation community is probably not yet ready to embrace a full open source approach – but more importantly, why it might not even be a desirable or realistic objective. Still, there are compelling arguments for the digital preservation community to start working together as an open source software community: namely, to foster innovation, to break through the resistance of industry and memory organisations to change, and to invest in a shared pool of experts and skilled people. The OPF Hackathons serve as a venue for digital preservation practitioners, (open source) tool developers and vendor participants – a venue where the needs, the possibilities and the constraints are brought together with the goal to arrive at deployable solutions.
Ideally, the digital preservation community will make use of the best of both worlds: the open source solutions to drive innovation and commercial solutions to deploy and commoditise robust services.
Building Premissh: Part 3 – Coding for Digital Curation – Bish Bash Backup
May 28, 2024 @ 10:23 am CEST
[…] could have the impact required, to gain this signifianct level of community support (see this Open Preservation Foundation blog). This would depend on others being interested in the project, and having a sufficient level of […]
mauricederooij
January 30, 2013 @ 12:12 am CET
As a technical digital preservation researcher and maintainer of FIDO, the OPF OS file format identification tool, I see it like this:
Accepting bugfixes, improvements and suggestions is a very diligent process which implies more than just changing some lines of code. There are many technical aspects to it, such as maintaining stability, maintainability and, most important of all credibility, ie. how stuff is done, can it be verified, is it done the right way. The latter is non-technical as well. A commit I received removed “annoying stuff” that did not break the result per se, but breached the verifiability of results by taking away verbosity about the used identification method.
Next to that, there is a bit of diplomatics involved as well, we are a community of international organisations after all… If I start bitching, or the other way around, that is not good in general for relations…
Talking about the other way around, I have submitted questions, suggestions and bugreports to the maintainers of the tools named in Brams post, but progress is slow. But I understand and respect that quite well, because I also have experience the other way around.
About time for FIDO, I have got granted time to keep on maintaining the tool, but unfortunately cannot make it a high priority task. Needless to say this has a lot to do with funding.
As bram implied, Open Source is not free, rather it is worth investment…
garymcgath
January 28, 2013 @ 2:03 pm CET
I can't say what JSTOR is doing, but Harvard is definitely using JHOVE. It's one of the tools used by FITS. which is in constant use with the Digital Repository Service (DRS).
FITS presents the opposite side of the coin; in bringing FITS into last fall's online Hackathon, I found a certain level of reluctance at Harvard to have outsiders messing with it, even though it's available as open source. There have been some moves since then toward a more "open" approach to FITS, but I haven't heard of anything definite yet.
bram van der werf
January 28, 2013 @ 1:39 pm CET
Thanks Gary,
The fact that JHOVE can almost be considered "abandoned" , makes the case for cultural change in the digital preservation community even stronger. From its start OPF always welcomed Open Source intiatives like JHOVE and JHOVE2. OPF has always been willing to moderate and steward further development and make tangible efforts (hackathons, webinars, wiki, github) to mobilize its community of developers and practitioners. Main condition for me has always been that there is at least a solid amount of active users and some committers. Preferably someone from the "old" development team who needs to be involved as lead developer.
And yes, we need to learn that nothing comes for free! Contributors in the commons do need sponsorship. But we also need a real user community that actually deployed JHOVE as part of their preservation workflow and experiences operational benefits. I am convinced that when we can meet this condition, there are ways for sponsoring some bug fixing and feature requests.
Is it fair to conclude that Harvard and JSTOR are not using JHOVE themselves?