Strategic objective/theme: ICT 6-4.1 – Digital Libraries and Digital Preservation
Table of Contents
B1 Concept and objectives, long term integration, Joint Programme of Activities 2
B1.1 Concept and project objective(s) 2
B1.2 Long term integration 8
B1.3 Joint Programme of Activities 12
B2 Implementation 26
B2.1 Management structure and procedures 26
B2.2 Beneficiaries 33
B2.3 Consortium as a whole 65
B2.4 Resources to be committed 83
B3 Impact 95
B3.1 Strategic impact 95
B3.2 B 3.2 Spreading excellence, exploiting results, disseminating knowledge 99
B4 References 108
Concept and objectives, long term integration, Joint Programme of Activities
Concept and project objective(s)
Digital media have become the dominant way that we create, shape and exchange information. Governments, businesses, research organisations and memory institutions, as well as individuals, have become completely dependent on digital information. This dependence comes with a number of major risks because of the many unresolved challenges in the long-term management, access and preservation of this information.
This is explicitly recognised in the Challenge 41 workprogramme text, as follows: "(This) increasingly complex content needs to be safeguarded for future access. Preservation needs to be intelligently planned, capturing and selection of content need to be automated and hardware and software dependencies must be overcome. Keeping the associated semantics as well as the digital objects, should guarantee the integrity and authenticity of the information as originally recorded." The objective of this project may be simply stated, namely to look across the excellent work in digital preservation which is carried out in Europe and to try to bring it together under a common vision. The success of the project will be seen in the subsequent coherence and general direction of travel of research in digital preservation, with an agreed way of evaluating it and the existence of an internationally recognised Virtual Centre of Excellence.
Within this broad aim, part of the challenge is that the challenges of making information accessible in the long-term require a multi-disciplinary approach across several stakeholder groups and our first step is to bring those various communities closer together as a Network of Excellence.
The socio-economic impact of the knowledge generated by research activities has long been recognised due to its importance in stimulating innovation, which leads to wealth creation, growth in employment and more sustainable social development. As Commissioner Viviane Reding expressed back in 2007, ….”we need to learn from each other and find together the best means to strengthen our effort in ICT research and to ensure the best use of ICT”….. “There is a need for a synchronised effort to overturn the existing inertia and drive forward growth and competitiveness”.
EU innovation policy acknowledges the need for better coordination between Member States, the EC, industry and academic research communities, but action is still required to align the research they are collectively responsible for to industrial and societal needs and expectations. The involvement of industrial stakeholders - technology providers, ICT suppliers and integrators, content providers and especially leading edge users - is vital for testing and benchmarking innovative solutions in realistic settings, but too few corporate market players have committed to being involved in driving new RTD endeavours.
This lack of industrially-led coherence is particularly significant within the digital preservation community. This spans the varied needs of the cultural, scientific and business communities as well as the vast array of public administration services. Not only are the needs fragmented by organisation-type, but also by content-type - from simple rendered documents and images through to highly-structured business/government records or civil engineering records such as maintenance specifications of new nuclear facilities; from semantically-rich descriptions of content through to the preservation of scientific datasets and their experimental context. Even within a business sector like manufacturing, efforts such as the work on governance in data sustainment from the aerospace industry's LOTAR project runs the risk of becoming isolated from the mainstream.
One very good example of this fragmentation is the great diversity in the use of (what should be) persistent identifier systems with different technical implementations and a huge disparity in guarantees of persistence. Sometimes this is the result of work taking place in isolation (for example, in small archives), but even large, well-funded initiatives addressing long-term problems are often resourced by short-term project-oriented mechanisms with little thought about sustainability, only the need to 'top-up' the funding with the next project. In some cases this results in good work being discontinued while, in others, extravagant claims lead to continuation of investment in industrially-irrelevant activities.
Duplicated effort due to fragmentation is potentially wasted effort. Lack of scrutiny and serious debate about the potential for innovation within ideas proposed for research funding is also potentially wasteful. Both reduce the capacity to meaningfully address the real problems of today, tomorrow and the next millennium.
Stakeholders in the emerging DP market understand that digital preservation is an important Information Society issue. When DP practices penetrate mainstream markets, social benefits will become visible, economies of scale will materialise, sustainability will be secured, and the impact of DP will be multiplied across different sources and types of digital content, in larger volumes. This virtuous cycle will enable consolidation of the DP market as an economic sector of the Information Society.
The APARSEN NoE brings together an extremely diverse set of practitioner organisations and researchers in order to bring coherence, cohesion and continuity to research into barriers to the long-term accessibility and usability of digital information and data, exploiting our diversity by building a long-lived Virtual Centre of Digital Preservation Excellence.
Some of our partners are already working in this way at a national level, for example the Digital Preservation Coalition in the UK and Nestor (represented here by DNB) in Germany. There are also transnational organisations such as LIBER, which brings together university libraries across Europe. While these do a good job, they tend to be library focussed and there is a need for a broader, more diverse, Europe-wide organisation collaboration if we wish to achieve the profound restructuring that is required.
The Alliance for Permanent Access (APA) was formed in 2006 by a small core group of 11 organisations from the major research laboratories, international organisations and national libraries in recognition that trying to organise, from the start, the many hundreds of organisations working in this area would be too difficult. The founder members included ESA, CERN, BL, KB, CCLRC (now STFC), STM Association, NESTOR and DPC who are all members of this consortium. The Alliance's initial five year Strategic Action Programme, presented to the EC and national governments in 2005, ends this year and APARSEN embodies the alliance strategy for the next five years - broadening of membership and taking a leading role in aligning the digital preservation research agenda to address the barriers to permanent access to digital information.
The next step builds on the membership of the APA, supplementing it with important players from industry and academia in order to obtain a critical mass which can consolidate the otherwise fragmented research capacity across Europe. Moreover by including the key data producers, publishers, funders, libraries, repositories and commercial interests both at national and Europe-wide level, with a huge and very varied set of users, we have the potential to exert considerable influence on the national research activities and corresponding funding.
The recent final report from the High Level Expert Group on Digital Libraries entitled Digital Libraries: Recommendations and Challenges for the Future2 stated "A general policy framework, including sustainable custody and funding/business models, needs to be established by the key stakeholders in science and science information and national and EU policymakers. The aim is to establish the roles and responsibilities in building a European Digital Information Infrastructure that allows the access and re-use of research data and ensures their long term preservation."
This project supports those aims and furthermore has close links with the EU e-Research Infrastructures through the recently formed High-Level Expert Group on Scientific Data e-Infrastructure (HLEG-SDI) which has the mandate to define a vision for 2030 and produce a detailed action plan of how to create the infrastructures needed to support the use and re-use of the very pervasive, and persistent, deluge of data with which the Information Society will have to deal. This allows us to see the direction of the future demands and therefore help to ensure the current research can provide the necessary solutions.
Additional guidance is available from the PARSE.Insight Roadmap , supported by surveys  and case studies with massive responses. From this we can see a consensus on the way in which the re-structured research activities can lead to the future preservation services which support current re-use as well as re-use by future generations through digital preservation; this consensus seems consistent with the views of the HLEG-SDI, whose final report is due in the Summer of this year.
We believe that a small investment now in APARSEN will bring great benefits to Europe because the cost of digital preservation is much less than the cost of recollecting data or the opportunity cost of not having the data. More specifically:
by ensuring information availability to promote research and innovation for the knowledge society
by providing best practice
we will aim to ensure that that the benefits from digital preservation/curation are realised effectively and efficiently.
Figure 1 APARSEN positioning
This figure shows the way in which APARSEN provides a route between research funded through EU, academic, international organisations and commercial channels and the implementation of the infrastructure needed to support the vision for 2030, in which preservation and re-use of the torrent of data, on which we all depend, is absolutely fundamental.
It is equally necessary to be able to supply the inevitable demand for expertise in digital preservation, therefore we will lay the foundations for the longer term embedding of digital preservation expertise across Europe by putting in place graduate level courses in many countries through our university and research funding members. We expect that our many contacts across the world will demonstrate that we can both import the best ideas from outside Europe and influence their development.
The Joint Programme of Activities (JPA) detailed below has been created by using, as its starting point, the widely reviewed APA research programme, together with the closely related PARSE.Insight project’s Roadmap , greatly supplemented by the research activities in our non-APA collaborators.
The several thousand responses to the PARSE.Insight survey and case studies , across disciplines, stakeholders and from around the world, showed overwhelmingly that there were many threats to digitally encoded information which were at the forefront of people's minds. These are illustrated in the following table which shows the threats and their types of solutions and example implementations, from FP7 and FP7 projects, needed to counter them.
Requirements for solution
Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved
Ability to create and maintain adequate Representation Information
RepInfo toolkit, Packager and Registry – to create and store Representation Information.
In addition an Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate.
Non-maintainability of essential hardware, software or support environment may make the information inaccessible
Ability to share information about the availability of hardware and software and their replacements/substitutes
Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes.
The Representation Information will include such things as software source code and emulators.
The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity
Ability to bring together evidence from diverse sources about the Authenticity of a digital object
Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity.
Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future
Ability to deal with Digital Rights correctly in a changing and evolving environment
Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation.
Persistent Identifier system: such a system will allow objects to be located over time.
The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future
Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation
Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another.
The ones we trust to look after the digital holdings may let us down
Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term (see RAC)
The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.
We believe that this programme of activities will form a good basis on which to consolidate our various research programmes, but we recognized that new ideas will (must!) be developed and so this program does not claim to be all inclusive. A significant fund is ear-marked for yet unidentified collaborators and to integrate new ideas.
We can benefit from our diversity by testing our theories and techniques about digital preservation with digital objects outside the experience of the originators of those theories and techniques. In this way we will more clearly understand the limitations of each of these approaches; we may also be able to extend the approaches so that they can be more widely applicable. For example the concept of significant properties has been widely used in the library community but not in the science community; however testing the concept against science data has allowed a richer view to be developed of this concept such that it can be applied to many science communities and also linked to another other key concept, namely authenticity . We strive to also make access to the holdings of our data producer and provider members more easily customizable by challenging them with different sets of users; we will begin by addressing the alignment of the access portals and access APIs of our science data repositories. We expect that this will also challenge the requirements on the metadata (representation information in this case) required by users from different domains. In many cases the current users of data will already have the knowledge to understand and use those digital resources, by challenging our capabilities with users from different disciplines we strive to mimic future users who do not have that tacit knowledge, and as a by-product expect to open up those resources for greater re-use right now.
By combining our expertise and facilities we will strive to reinforce the capacity for our organisations, and others, to preserve their digital content in a more effective and cost-efficient manner, safeguarding the authenticity and integrity of those holdings. For example when one organisation can no longer support a particular dataset or collection, which is bound to happen from time to time, the brokering service which we are researching, and our partners are implementing, offers the possibility of handing on the custodianship to another organisation, while the coherence we will bring to our understanding of the demands of evidence for authenticity will allow such vital evidence to be transferred effectively, thereby safeguarding confidence in authenticity. In this way we can significantly reduce the loss of replaceable information and, as explained above, create new opportunities for its re-use in a broader set of user communities, increasing the possibility of serendipitous discoveries by bringing fresh eyes to each domain.
We will strive to make the research effort in digital preservation in Europe more coherent and, by challenging our individual preconceptions, make it stronger by continually testing our advances in the state of the art, where appropriate using competitions and awards. Because we include many major data holders in the consortium who have immediate need for these new digital preservation techniques we expect to be able to guarantee impact of our research results. Building on its members' already firm commitments, the APA's natural expansion goal is that the coherence, which the Network of Excellence will foster, will continue into the future; our links to the developing e-Research infrastructures developments will also help this continuity. Through our commercial partners we will strive to foster the creation of a critical mass of competitive suppliers of DP services and tools, required to cope with the increasing demand of DP institutional/corporate strategies to be put in place within the next decade. By widening the scope of efforts, aiming to bring in mainstream players who will have to build on top of pioneers and early adopters’ efforts to secure DP solutions become a permanent feature of Information Society. Finally by working closely with the developing ISO audit and certification process , which if successful will itself be a major force in re-shaping the digital preservation environment, we expect to truly stabilise the landscape and prevent the erosion of digital capital in Europe and throughout the world.