Carthage Smith
Directorate for Science, Technology and Innovation
Carthage Smith
Directorate for Science, Technology and Innovation
This chapter considers how digital technologies that have arisen out of publicly funded scientific research are now rapidly transforming the practice of research and enabling open science. This transformation is apparent across all of the three main pillars of open science: dissemination of scientific information, access to research data and engagement with stakeholders from outside of research. Recent developments and analysis are presented for each of these areas. This is followed by a discussion of what these developments mean for the governance of science as a whole, including for international co‑ordination and co‑operation. The chapter builds on earlier work by the OECD’s Working Party on Innovation and Technology Policy and the report “Making open science a reality” and synthesises findings from recent work by the OECD Global Science Forum.
Digital technologies are transforming science. Much discussion about the digital economy focuses on the dominant role of a small number of multinational companies. In this context it is easy to overlook the fact that public sector science is at the origin of the digital revolution and continues to play a critical role in shaping it. The World Wide Web was first developed at the European Laboratory for Particle Physics in Switzerland to meet the needs of particle physicists. Foundational work on the Internet was supported by the Defence Advanced Research Projects Agency and the National Science Foundation in public laboratories in the United States. Academic researchers are playing a key role in developing the next generation of digital technologies – from quantum computing to biological storage of data. At the same time, science itself is being radically transformed by the digital technologies it has helped create.
Digitalisation is affecting all stages of the scientific process – from agenda setting and experimentation to knowledge sharing and public engagement. In so doing, it is facilitating the transition towards a new paradigm of open science. The transformative and sometimes disruptive effects of digital technologies are apparent across all fields of science, but manifest differently in different communities. Scientific domains that have historically been data intensive and co-operative, such as particle physics or astronomy, face different challenges to much of medical research or social sciences that have been less data-centric. In contrast, these latter fields have a stronger history of societal engagement, which is also being transformed by digitalisation.
Open Science, in its broadest sense, is about making the scientific process more open and inclusive for all relevant actors. There are three main pillars: open access (OA) to scientific publications and information; enhanced access to research data; and broader engagement with stakeholders from within and beyond the scientific community. Strengthening these three pillars could increase the efficiency and effectiveness of science, and accelerate the translation of scientific findings into innovations and socio-economic benefits. Achieving this and realising the full benefits of open science, while minimising the associated risks, will require new policies and careful balancing of mandates and incentives. It will also require long-term strategic investment in digital infrastructure and skills.
This chapter considers the three pillars of open science: how digitalisation is changing established practices, the opportunities and challenges that this entails and what this means for policy. It then discusses the meaning of these developments for the governance of science as a whole, including for international co‑ordination and co‑operation. The chapter builds on earlier work by the OECD’s Working Party on Innovation and Technology Policy and the report “Making open science a reality” (OECD, 2015). It synthesises some key findings from recent work done by the OECD Global Science Forum. This work includes an overall framework for open science (Dai, Shin and Smith, 2018) and specific policy reports relating to new forms of data and ethics (OECD, 2016), data repositories (OECD, 2017a, 2017b); agenda setting (OECD, 2017c); and access to research infrastructures (OECD, 2017d). These reports are complemented by insights from other recent OECD activities, including workshops, surveys and references to other relevant information.
The results of scientific research have traditionally been published in specialist scientific journals, following review of submitted manuscripts by peers. The costs of managing the review process and journal production and distribution, have been recovered by charging readers (or academic libraries). Over time, a large and profitable industry has grown up around scientific publishing and many professional scientific societies have come to depend on publishing income to offset the costs of other services that they provide for their communities. As the scientific community has grown, the number of scientific journals has massively increased, together with the overall subscription charges to access these journals. Even within academia, only the better-endowed institutions, mainly in developed countries, have been able to keep up with this expansion. With the advent of the World Wide Web and online publishing, the marginal costs of disseminating scientific information have been reduced almost to zero, opening up new possibilities for more inclusive and broader access to scientific information. New OA publishing models have emerged (gold, green, hybrid etc.) and pre-print servers (such as arXiv.org in physics, or bioRxiv.org for biology), mega-journals (such as PLOS One), institutional repositories and online scientific information aggregators (such as PubMedCentral or LENS.org) are making access to scientific information easier and more inclusive. This transition to new science publishing models has raised concerns about the quality and sustainability of the scientific record. Ensuring these has been an important aspect of the added-value that commercial publishers – in partnership with scientific societies – have provided. This role has been integrated into their traditional business models. In the new OA publication era, it is less clear how editorial and peer-review processes will work and how the academic record will be maintained and updated. Estimates for the costs of publication vary considerably. Better information will be required to move away from a reader-pays market model to a high-quality and sustainable upstream, or author pays, model. It is notable in this regard that cOAlition S – a consortium of research funders – has identified lack of transparency on OA publication costs and fees as an obstacle to promoting OA publishing (Science Europe, n.d.).
As new actors enter the science publishing arena, there is considerable concern about the growth in predatory online journals. These journals charge authors for publication, but carry out little or no review and quality control. Their publications are contaminating the scientific record and can undermine public trust in science. An online catalogue of predatory journals created by the librarian Jeffrey Beall in 2008 became an important reference site for the scientific community. Since the catalogue ceased to be updated in 2017, its absence has been lamented and there have been a number of subsequent efforts to revive it (Weebly.com, n.d.). Predatory journals must be publicly identified, and researchers discouraged from seeking publication in them.
The sheer volume of scientific papers is overloading both legitimate journals and researchers who try to keep up with them. The growth in scientific papers is in keeping with the expansion of the global scientific community. It also partly reflects academic incentive systems and the “publish or perish” dynamic. Scientists have reached “peak reading” and too many research papers are of inadequate quality.1 Even the most prestigious journals are having problems with quality assurance and the number of retractions is increasing. In some research fields, including life sciences and psychology, there is a reproducibility crisis, with many peer-reviewed and published findings being impossible to replicate. High-profile cases of scientific misconduct have become apparent in publications across all areas of science. Online forums such as Retraction Watch are helping the research community to identify questionable publications and the Committee on Publication Ethics is providing valuable guidelines to assist editors in dealing with these, but the numbers continue to increase (Brainard and You, 25 October 2018).
While digital tools cannot address the underlying causes of information overload and lack of scientific rigour, they can help manage these issues. Information and communication technology (ICT) can assist in organising, sharing and analysing large volumes of scientific information. Emerging tools and platforms enable researchers to rapidly identify and access papers that correspond to their interests (e.g. IRIS.AI., n.d.). Articles can be automatically “recommended” to scientists based on previous online search histories. Anti-plagiarism software, combined with data linkage systems such as Crossref, is helping editors and publishers with quality control. These tools, however, depend on the broad adoption of standards and unique digital identifiers, which can be supported at the policy level. For example, several research funders have joined publishers in mandating the use of Open Researcher and Contributor IDs (ORCIDs) for individual researchers.
Digitalisation is creating new possibilities for peer review, which remains at the core of the scientific publishing process. In some fields, including physics and astronomy, there is a tradition of making results available on line for open review and comment prior to formal publication. Pre-print archives and open peer review are now being tested in other fields, including life sciences (Cold Spring Harbor Laboratory, 2018). Looking to the future, this could be imagined as one part of a tiered process for publication, with more scientific information being shared earlier and commented on by the community and only a fraction of this eventually being formally published in journals. Some fields are also testing post-publication peer review, which can potentially help ensure the quality and rigour of the scientific record. Sites such as Pubpeer (Pubpeer Foundation, n.d.) are playing an important role in enabling the community to report and discuss concerns about published results. Technologies such as blockchain can potentially help ensure the fidelity of peer review, while accelerating the process and rewarding reviewers (Blockchain for Peer Review, 2019).
As indicated previously, the predominant model for communication of scientific information to date has been via the release of peer-reviewed publications at the end of the research process. However, much useful information that is generated during research, including negative results that may be important with regards to reproducibility, are never shared. While at one level scientists are over-loaded with information, at another level the information that can be readily accessed is often inadequate to critically evaluate, replicate and build on what is published. Again, digital technologies can help address this challenge. Online open lab notebooks, can provide access to the primary experimental data and information linked to publications and also help to ensure appropriate accreditation. The landmark publication of the detection of gravitational waves that led to a Nobel prize in Physics in 2017, for example, was accompanied by OA to the experimental records in a Jupyter notebook. The scientific article of the future may be more than just a narrative with summary results. It may also include direct links to all supporting data and a record of the process by which that data was generated and analysed (Schapira, 2018).
The publication of scientific articles in journals is intimately coupled to the evaluation and rewards systems for science. This means that changes to publication practices can directly affect scientific careers. This is critically important in the current transition period, when many science funders are mandating OA publication (Science Europe, n.d.) but promotion and tenure, and, in some cases, institutional funding, continue to be largely determined by publication in high-impact, pay-for-access, journals. Mandates need to be matched by incentives and changes to current evaluation systems if the transition to OA publication is to be accelerated. A stronger focus on article-based metrics rather than journal impact factors is one way to assist this transition.
Digitalisation also provides opportunities to communicate scientific results and information in different ways that can complement or even replace traditional scientific articles in journals. Not all scientific disciplines are equally dependent on scientific articles as their main means of communicating results. In some areas of social sciences books are the main output of academic work and in computing sciences, conference proceedings are the most important mechanism for sharing results. Again, digitalisation and online tools can increase access to these outputs.
The use of social media, such as Facebook and Twitter, has transformative potential across all fields of science. Already, science blogs (e.g. LSE, n.d.) are becoming essential information sources, and increasingly cited, in scientific articles. The publication of scientific papers is now frequently accompanied by tweets. Alternative metrics or “altmetrics” are being developed to measure the impact of traditional scientific publications via their uptake in social media networks. Such metrics can clearly provide interesting information. However, further experimentation is required to test what kind of impact they are really measuring and how their deployment in evaluations might affect scientific behaviour and trust in science.
Data that is used in research and/or generated by research is the lifeblood of the science enterprise. Some fields of science are facing a reproducibility crisis and OA to the data [and code] that provides the basis for published scientific results is important as it allows for verification of those same results. Secondary analysis of data and application of the same data in different research fields can provide new scientific insights. Greater access to data can help to make science more inclusive and productive by allowing new actors to engage in the scientific process. The integration of data from diverse sources is important for science to be able to address complex societal challenges. Research data can also be an important substrate for innovation and economic growth. This is particularly the case when data are combined with mathematical algorithms, models and high-performance computing.
The OECD first advocated for greater access to data from publicly funded research in 2006. Since then, both the rationale and the tools for enabling greater access have been strengthened considerably. The OECD Principles and Guidelines for Access to Research Data from Public Funding (OECD, 2007) laid out 13 overarching principles that have stood the test of time. More recently, with an added emphasis on open science, the essence of this earlier normative work has been distilled into four concise findability, accessibility, interoperability and reusability (FAIR) principles: research data should be Findable, Accessible, Interoperable and Re-usable. The FAIR principles have been widely adopted across countries and the focus is now on how they can be implemented at the operational level, where issues such as standards, security and protection of privacy need to be addressed. Funding, infrastructure and skills are also limiting factors. It is increasingly recognised that, as data volumes increase, the costs of stewarding this data become prohibitive, while, at the same time, much of the data probably has little secondary value. The mantra is moving towards making research data “as open as possible and as closed as necessary”, as opposed to making all data open to everyone (OECD, 2018a).
Online data and associated services have dramatically changed many fields of science, from genomics to astronomy. The Global Earth Observation System that combines huge amounts of data from space, ocean and terrestrial observation devices, is essential for understanding the planet we live on and how it is changing. Social networking data is providing new insights into human behaviour and even the spread of disease (HealthMap, n.d.). Nevertheless, and despite broad agreement on the FAIR principles, a number of significant obstacles inhibit access to data. These include: i) costs and business models for data repositories; ii) trust and transnational barriers; iii) privacy and ethical considerations; iv) access to cyber-infrastructure and skills for data management and analysis; and v) incentives and rewards. The first three obstacles are considered in the paragraphs that follow, while the fourth and fifth are discussed at the end of this chapter.
Research data repositories are the main focus for implementation of FAIR data principles. However, as data volumes and user demands expand, the costs of data management are straining research budgets. Recent analysis of almost 50 data repositories, across diverse areas of research, has identified key actions to improve long-term sustainability of these critical infrastructures (OECD, 2017e).
Hence, repositories need to be considered as an integral part of the infrastructure for research and they need to have clearly articulated business models (Figure 3.1). This, in turn, affects how they are funded and, in particular, how public funding is allocated. Many valuable data resources start out with short-term project funding but then struggle to be sustainable. Mandates for OA need to be matched with incentives, including appropriate funding. Opportunities for cost-optimisation, including scale effects and technological advances, need to be actively pursued. Where the commercial sector provides repositories and associated data services for research, it should be consistent with the aim of enhanced long-term access. Monopoly arrangements, which can have longer-term negative consequences, should be avoided.
International data networks play an important role to assure data quality across borders. The sharing of research data across national borders is critical for many areas of science and, in most cases, this depends not just on single global data repositories but on federated international data networks. Examples of such networks include the multidisciplinary World Data System, the International Virtual Observatory Alliance (IVOA) in astronomy and the Inter-University Upper Atmosphere Global Observation Network. These networks can play an important role in data quality assurance, with membership being conditional on compliance with agreed standards and recognised accreditation systems (e.g. Data Seal of Approval, n.d.).
As is the case for individual repositories, these networks also need to have well-defined business models and value propositions. However, several additional challenges are associated with the establishment and maintenance of such networks (OECD, 2017b). The main barriers to sharing data across borders are the lack of policy coherence and trust between different communities.
Despite the growing acceptance of the FAIR principles as an aspirational aim, at the operational level there is considerable discordance around what data should be available to who and how – there is an absence of commonly agreed legal and ethical frameworks for sharing different types of public research data. Although no one model fits all, a number of organisational issues need to be addressed for networks to operate effectively. These range from aligning different objectives and user needs to governance arrangements. Ensuring inclusiveness and respecting cultural differences and capacity limitations can be problematic. Cutting across all of this are issues related to the adequacy of funding, and there is a need for funders to participate in relevant international discussions and fora, such as the Research Data Alliance, to improve co-ordination of their strategies and support for data infrastructure.
While the technical issues should not be underestimated, establishing trust is perhaps the main obstacle to enhancing data access and implementing the FAIR principles. This applies both from the perspective of the data provider and the user (Box 3.1). In recent OECD work on sharing scientific data and information during crises, it was striking that lack of trust was identified as the major obstacle to cross-sectoral and transnational co-operation (OECD, 2018b). There are a number of policy actions that can be taken to address issues of trust. Some of these relate to technology, such as blockchain, or the adoption of standards and processes, e.g. the use of safe havens for working on sensitive data. However, trust is fundamentally a sociological issue and building trust requires dialogue and shared understanding.
A number of governments and research funding agencies are beginning to mandate increased sharing and/or OA to research data. However, four key issues impede data sharing by researchers. Each of these is amenable to policy interventions:1
Trust. This obstacle refers to the mutual mistrust between scientists (Do I trust the data? Will I receive credit for my data if someone else uses them? Will my data be used appropriately?). In the case of personal data, the need to ensure trust between human subjects/patients and users is also important. Where commercial sector users are involved, then issues of trust can be further amplified.
Policy options: Put in place processes for data tracking and citation; adopt trusted repository accreditation systems and support international data networks; strengthen ethics committees by including data experts; and organise public dialogues on personal data and privacy and develop consensus around key issues, such as consent, anonymisation and commercial use.
Good practices: Data Seal of Approval for repository certification; establishment of the Ada Lovelace Institute in the United Kingdom to ensure data and artificial intelligence (AI) work for people and society (Nuffield Foundation, n.d..
Burden. This obstacle relates to the time, expertise, and resources required by providers to make their data available and the time invested by users to discover available data.
Policy options: Develop national strategic plans, including long-term funding plans, for sustainable research data infrastructure (data repositories and services); require data management plans and provide funding to implement them in association with grant awards; provide dedicated funding to develop new data services; and identify and address data skills gaps in the research workforce.
Good practices: European Open Science Cloud Strategic Implementation Roadmap; Australian National Data Service Skills support services (ANDS, 2018).
Motivation, credit and reward. There is little incentive for scientists to make their data openly available. While publication of research results is critical for career advancement, there is little reward for developing and sharing useful data resources.
Policy options: Develop new indicators/measures for data sharing and incorporate these into institutional assessments and individual researcher evaluation processes; promote the use of unique digital identifiers for individual researchers and for data sets to enable citation and accreditation; and develop attractive career paths for data professionals, who are necessary for the long-term stewardship of research data and provision of services.
Good practices: Open Research Funders Group work on incentivising the sharing of research outputs through research assessment (ORFG, n.d.).
Governance and legal frameworks. A lack of understanding and clear guidance on data privacy regulations can inhibit data sharing by scientists. Likewise, in the absence of clear guidelines and relevant expertise, institutional review boards (IRBs) may act as a barrier to data sharing.
Policy options: Identify and support trusted brokers to mediate access to data; support the development of standardised data management plans and data use agreements; where appropriate, involve lay persons/patients in governance and oversight structures; encourage citizen science projects; and ensure that IRBs include the necessary expertise in data science, including legal aspects.
Good practices: National Services Scotland Save Haven for secure access to health service data (ISD Scotland, 2018); Science Europe initiative for the development of domain-specific data management protocols (Science Europe, 2018).
1. This was identified at an INCF-OECD workshop on data sharing in dementia research, Stockholm, September 2015.
Trust is a particularly important issue with regards to access to personal data. New forms of personal data are becoming available in digital format from many sources, ranging from supermarket transactions to social media. These have enormous potential value for research, particularly when combined with administrative data, public health records or more traditional population survey data. Such data combinations can provide important new understanding into human behaviour, economic systems and the social determinants of health and well-being (OECD, 2013).
The rapid advance of technology is raising ethical questions about the use of personal data that go beyond the scope of existing legal agreements. Several legal frameworks, most notably the General Data Protection Regulation (GDPR) in Europe, provide some guidance and establish agreed limits on the use of personal data. However, the technology is developing so fast that new possibilities for data use in research raise ethical dilemmas that transcend these frameworks. Something can be legal without being ethically acceptable. Indeed, this is implicit in the GDPR’s provision for ex ante Data Protection Impact Assessments (DPIAs) when a process involving data is likely to pose a high risk to people’s rights and freedoms.
Increasing concern about data privacy and security has created an urgent need for science to adapt its governance and review mechanisms (OECD, 2016). The previously accepted requirements for the use of human subject data in research were informed consent and anonymisation. However, both of these are now being questioned as a consequence of advances in ICTs. For example, is it possible to get informed consent for specific purposes from all the individuals in very large sets of social media data? Can personal data from one source be truly anonymised when its linkage to other personal data is required for research?
There is a critical role for institutional review boards and/or research ethics committees to ensure oversight of what research is being conducted with new forms of personal data – or in the language of the GDPR – to carry out DPIAs. These bodies need to be empowered, supported and have the expertise necessary to assess the balance between protecting personal privacy and ensuring the public good. Social consensus will need to be established both within and beyond the scientific community as to what the appropriate limits are on the use of new forms of data.
It is difficult to foresee everything and, along the way, mistakes will certainly be made. Transparency and accountability will be critical to building a consensus on the use of new forms of data. Policy makers play an important role in ensuring the right governance frameworks are in place and supporting the necessary consultation and consensus building processes. In the United Kingdom, for example, public consultation has been important in establishing the core policies and operations of the Administrative Data Research Network (Verwulgen, 2017).
Broader engagement in science is the third pillar of open science. Digitalisation is opening the scientific process to a variety of societal actors, including patient groups, citizen scientists, non-governmental organisations, industry and policy makers. This shift has considerable potential to improve the quality, relevance and translation into practice of scientific research. Societal engagement can take place across the research process – from agenda setting to co-production of research and dissemination of scientific information. Depending on the emphasis, societal engagement encompasses concepts such as responsible research and innovation and transdisciplinary research. Engagement, which depends on access to scientific information and data, is being transformed by the use of digital tools.
Many countries promote citizen engagement to help ensure research relevance and promote transparency and trust in science. If science is to provide solutions for pressing societal challenges, then arguably it needs to be more closely engaged with society. In this context, digitalisation is providing powerful new tools to assist with societal engagement.
The first, and perhaps most critical, step in citizen engagement is to frame research agendas and set priorities for research investment. Recent OECD work has focused on this, including an in-depth analysis of key features and lessons learned from a variety of open agenda-setting exercises for research (OECD, 2017c). These exercises ranged from broadly focused citizen consultations and dialogues to inform international and national agenda setting through to more local and community-specific co-design processes.
OECD (2017c) identified ten key issues for consideration in designing effective open agenda-setting processes. These begin with clear articulation of the rationale for a consultation; selection of an appropriate methodological approach; and consideration of resource implications and impact assessment. If these three areas are addressed, then open agenda setting can make research more relevant and may also generate new research questions. A case in point is the Great New Zealand Science Project, a national campaign to define research priorities. Citizens expressed the need for more research on care issues as opposed to drug development for the elderly. There is a substantial body of work and tested methods for citizen engagement (Engage2020, n.d.; PARTICIPEDIA, n.d.). As the interest in open agenda setting expands, these previous experiences can provide valuable lessons.
Research infrastructures (RIs) provide a variety of shared services to the research community in all fields of science. Digitalisation is changing the operations of these infrastructures in many ways. Several of these RIs are at the forefront of the big data revolution, including the development of related hardware, software and standards. RIs are at the centre of many issues relating to open science – from information and data management to data security, privacy protection, analysis and training, and citizen science. Indeed, the growth of data and the policy emphasis on FAIR data are putting the financial sustainability of many RIs at risk (OECD, 2017e).
At a more mundane level, RI managers, funders and potential users are faced with a simple and persistent challenge: identifying what RIs exist, what they can do and how they can be accessed. Scientists are likely familiar with the main RIs used routinely in their own field. However, access to facilities and resources in other research areas is increasingly required. Other potential RI users – from companies, the public sector or civil society – can find it difficult to explore the possibilities of RIs related to their interests. Optimising the use of RIs depends first on accessing systematically collected, up-to-date information. This is where digitalisation potentially provides a solution.
Recently published work (OECD, 2017d) included an in-depth analysis of eight initiatives that are using dedicated digital platforms to promote broader access to, and more effective use of, RIs. These platforms ranged from digital catalogues providing standardised metadata on the resources available in a specific scientific domain, to national and regional service platforms enabling virtual access or online reservations of facilities.
Greater co‑operation across borders on definitions, standards and interoperability of digital platforms is needed to provide sustainable high-quality service for users. The OECD work identified seven areas requiring attention. Of these, the most important is the need for international co‑operation around definitions, standards and interoperability. Different countries and institutions are developing ad hoc solutions to meet their own specific data aggregation needs. However, there is limited long-term planning and co‑operation with other actors. The Mapping of European Research Infrastructure Landscape initiative, for example, consulted broadly with the community to develop a set of definitions, glossaries and RI classifications together with a metadata model. These have all been made openly available for other users, but take up has been limited.
RIs, such as telescopes, can provide a focus for citizen science, i.e. the engagement of people who are not professional scientists in research processes. In the field of astronomy, for example, lay persons are helping to classify images of the night sky that are shared on line (Zooniverse, n.d.. More broadly, many fields are promoting citizen science as a way of both addressing unique issues and of promoting public trust in science. Digitalisation is rapidly changing what is feasible, enabling new approaches to crowdsourcing and access to untapped intellectual resources to solve problems (OECD, 2015; Dai, Shin and Smith, 2018).
Beyond data collection and analysis, ICT can also help engage the networked public in novel forms of discovery. For instance, in 2011, players of an online protein-folding game – Foldit – outperformed scientists by discovering the structure of a protein involved in the Mason-Pfizer monkey virus. This discovery was facilitated by complex software that permitted visualisation of protein shapes, allowing the employment of shape recognition and modification skills by persons not necessarily trained in biochemistry (University of Washington, 2012).
At the more applied level, many companies are using online crowd sourcing platforms, such as InnoCentive, to help solve technological challenges, with significant prizes being awarded to problem solvers. Hackathons, that bring together interested actors, on line, are a common way of addressing software development challenges and are increasingly being organised in association with traditional scientific congresses. Through Kaggle, which is owned by Google, data scientists and users get together on line to find solutions to problems presented by research teams and private companies.
Opening up science to engage new actors from civil society raises new issues in terms of preserving quality, ensuring proper attribution and ethics (Bonney et al., 2014). Engaging the right audience and promoting effective participation can be a particular challenge, especially when dealing with issues that are value laden. From the policy perspective, defining where citizen science approaches might be most valuable in specific contexts and how best to achieve this will require careful consideration (Box 3.2).
Citizen science is a relatively recent, diverse and evolving approach to research. It presents both opportunities and challenges. Among other issues, more needs to be known about the following:
The quality of scientific output. Concerns exist that valid scientific methods are not followed in some projects managed by non-scientists and that quality control through peer review is often absent.
The types of science project for which citizen science might best be used. Not all research lends itself to citizen participation, which can significantly increase (or decrease) the overall cost of projects.
The trade-off between participant anonymity and the opportunity to earn peer recognition through publication.
The financial implications of crowdsourcing science. In particular, might financial incentives be used to attract firms and participants with specialised talent? Who owns the outputs if they have potential commercial value?
How the efficiency of citizen science might be improved. For instance, software might be used to track participant performance in some tasks, possibly avoiding the need for other participants to replicate these tasks.
There are many actors – institutions and individuals – with different roles and responsibilities in the scientific enterprise. These actors also often have different, and even contradictory expectations, for science. For instance, a dean in a top-ranked research university may be primarily interested in high-impact publications accredited to his/her institution. Conversely, the ministry that provides research funding may be more interested in open data for innovation. Moreover, in the digital world, distance and location matter less than access to data and information. This increases the emphasis on international collaboration and/or competition and presents new challenges for the governance of science as whole. While digitalisation could make science both more inclusive and more productive, in the transition from the old to the new a number of important policy issues need to be addressed. These cut across the whole of science and are manifest at different scales, from local to global as discussed in the following paragraphs.
Policy makers can play an important role in promoting the development and implementation of frameworks, common definitions and standards. As ICTs develop and open new possibilities, it is becoming clear that formal legal frameworks, IP regulations and standard-setting processes are lagging. Commercial actors, and, in some instances, specific research communities are establishing de facto standards for operating in the digital world. These are determining how information, data and technologies are used. In the best cases, community standards are adopted that ensure interoperability and openness. IVOA, for example, established community standards that have enabled researchers and interested citizens across the world to use astronomical data. In other cases, standards may reflect specific interests and severely limit access to and usage of scientific data and information. Likewise, ownership and licensing arrangements for digital information and data can either promote openness or limit access and reuse. With regards to text and data mining, for example, several countries have recently revised their copyright regulations to limit restrictions for research. Maintaining an optimal balance between protecting IP (which can promote innovation) and openness (to improve the efficiency and effectiveness of research) is an ongoing policy challenge. It is interesting that in some areas of medical research, public and private sector actors are building new open science partnerships in which OA and sharing of data, information and downstream technologies are the norm (e.g. SGC, 2019).
Ensuring the provenance and traceability of scientific data and information are important with regards to quality assurance, accountability and accreditation. Digital traces of individuals, research groups, institutions and their scientific outputs are becoming an essential part of the evaluation and impact assessment processes for science (see Chapter 7). These depend on the use of open digital object identifiers (DOIs), including ORCIDs for researchers. Policy makers can play an important role in promoting the routine use of such DOIs.
Mandates and incentives are valuable policy tools to promote open science, provided that they are used carefully. Mandates and incentives are often most effective when used in tandem (OECD, 2015). This is illustrated by the recent launch of Plan S (Science Europe, n.d. b) that aims for full and immediate OA to publications from publicly funded research. All recipients of research funding from any of the coalition partners promoting Plan S will be mandated to publish in compliant journals. At the same time, funders are working together on new incentives for open research. Plan S, for example, refers to the San Francisco Statement on Research Assessment (DORA, n.d.), which states that research needs to be assessed on its own merits rather than on the basis of the venue in which it is published. The proposals for Plan S include a transition period during which journals are expected to become compliant. However, it is clear that reward and recognition systems for research will need to value OA publishing if this is to be broadly adopted by the academic community.
Similarly, for data sharing, new indicators and measures will be required not only to monitor how mandates for enhancing access are being implemented but also to incentivise activities to implement FAIR data (Ali‑Khan, 2018). And societal engagement activities will require similar incentivisation. It is notable that the latest development of the Research Excellence Framework for the evaluation of UK higher education institutions puts increasing emphasis on scientific outputs other than journal publications (REF2021, n.d.).
Digitalisation is transforming science very rapidly and this raises issues with regards to the skills that are required by the current and future scientific workforce. Digital skills are high on the education agenda in all OECD countries and, from an economic perspective, having an appropriately trained population is considered to be one of the key determinants of future productivity and growth (OECD, 2017f). From a research policy perspective the key questions are: what are the additional or specific digital skills that are, or will, be required for data intensive science? How do these skills need map onto the scientific workforce? How will the necessary skills be provided and what does this mean for science education and training?
“Data scientist” is a generic term that encompasses many different skills and roles (see Table 3.1 and also Chapter 2) and although the needs vary from one field to another there is a general consensus that more data scientists are required in public research. It has been suggested that for the European Open Science Cloud to be effective 500 000 data experts will need to be trained over the next five to ten years. The difficulty of meeting such a shortfall is compounded by a number of factors. It is not clear exactly what the needs are, while, at the same time, a plethora of new education and training courses for digital skills for science are appearing. Looking from the opposite perspective, it is not known whether these new educational and training courses are adequately addressing the real needs and gaps. Data science, in its different manifestations (Table 3.1), often does not align well with existing academic credit and reward systems that depend on publication outputs as opposed to code and data products. New career structures and professions will need to be developed, e.g. for data stewards. Moreover, there is intense competition from the commercial sector for digitally skilled individuals who, in “hot” areas such as AI, can earn salaries well above what is offered in academia. A strategic approach, that takes all these various factors into account, is required and this should consider how public and private actors can work together to develop human capital in ways that are mutually beneficial.
Data scientist |
A data scientist is a practitioner of data science. It is a generic term that encompasses many fields of specialised expertise. |
Data analyst |
This is someone who knows statistics. Analysts may know programming or may be expert in Excel. Either way, they can build models based on low-level data. Most importantly, they know which questions to ask of the data. |
Data engineer |
Operating at a level close to the data, data engineers write the code that handles data and moves them around. They may have some machine-learning background. |
Data steward |
A data steward is a person responsible for the management of data objects including metadata. These people think about managing and preserving data. They are information specialists, archivists, librarians and compliance officers. |
Research software engineer |
A growing number of people in academia combine expertise in programming with an intricate understanding of research. Research software engineers may start as researchers who spend time developing software to progress their research. They may also come from a more conventional software development background and are drawn to research by the challenge of using software to further research. |
Note: Depending on the field of research some of these roles may be combined in a given individual. They may be supporting or service provision roles or fully embedded in research projects.
Source: This categorisation and the definitions are derived from the ongoing work of an OECD-GSF Expert Group on Digital Skills for Data Intensive Science and more detailed glossaries for digital science (CASRAI, n.d., Research Data Domain website, https://dictionary.casrai.org/Category:Research_Data_Domain; Science Europe, n.d. a, “Science Europe Data Glossary Main Page”, http://sedataglossary.shoutwiki.com/wiki/Main_Page).
There is a need for long-term strategic planning and effective co‑operation across countries and continents. Many countries are making significant investments in the digital infrastructure necessary to support science. This includes data repositories, as well as cyber-infrastructure such as high-performance or cloud computing. They are also investing heavily in “next-generation” technologies such as quantum computing. Within Europe, there is a major initiative to integrate these national initiatives within the European Open Science Cloud. Similar developments are taking place in the United States, and on a smaller scale, in Africa and other regions. These initiatives include both public and commercial sector service providers, e.g. for data storage and computing, and it will be important going forward to ensure their long-term sustainability and adaptability and avoid the “lock in” that can arise when effective monopolies develop. Global bodies, such as the Research Data Alliance (RDA, n.d.), that bring together data scientists and policy makers to develop community standards, technical fixes and social networks have an important role to play.
Building trust both within the scientific community and between science and society remains the most pressing and difficult challenge for science in the digital world. With regards to the use of personal data in science, the challenges are fairly well known. Solutions are being developed and tested, including new mechanisms of governance and engagement with the public. However, there is a more ubiquitous challenge for science as a whole that will require more complex multifactorial solutions. While open science holds great promise, it arrives at a time when trust in experts is being questioned and “alternative facts” are becoming common currency in social networks and political forums. Open science means more transparency and accountability, but it also means more scrutiny and more questioning from actors for whom access to science was previously restricted. As witnessed in relation to debates on climate change or the safety of vaccination, some groups will readily appropriate, distort or re-interpret scientific information and data to their own ends. It is critically important in today’s more open environment that the integrity of science itself is maintained, and that science is rigorous and published research results are reproducible. Digital technologies such as blockchain and AI can potentially assist in this quest (see Chapter 1). However, appropriately skilled research personnel and the right incentives and reward structures will be even more critical.
By enabling a new paradigm of open science, digitalisation is disrupting long-established scientific practices, norms and institutions. Recent work from the OECD and many other organisations has demonstrated that digitalisation, which has its origins in public research, is also having a huge impact on how this research is being conducted. This is opening up exciting new opportunities and, at the same time, throwing up new challenges.
As with any disruptive change, different actors resist some of the emerging directions. Commercial publishers have served the scientific community well over many decades. They are understandably reluctant to change their business models. Scientists have built careers around “ownership” of data collections and are reluctant to share. Universities are used to being assessed on the number and/or quality of publications rather than data outputs or citizen engagement. Academic peer review, evaluation and promotion processes have been similarly focused on research excellence. Career paths have been designed for researchers in traditional scientific disciplines; they are poorly adapted to the new inter- and transdisciplinary opportunities of the digital world or the need to attract highly skilled data scientists in research support roles. Research funders are used to funding large-scale RIs over the long term. However, their mechanisms are less well adapted to the multitude of distributed data resources and services upon which research increasingly depends.
Science systems as a whole are having to adapt rapidly and this inevitably entails a mix of some things that are completely new, adjustments in much of what already exists and renewal of what cannot or does not adapt. Of course, science continually evolves and so this scenario is not unique. However, the extent and depth of the impacts of digitalisation on science and the speed of change are likely beyond what science systems have experienced since World War II. Strategic planning, flexibility and careful development and implementation of policies will be necessary to ensure that we build on the best of the past in taking forward the future. There is also an opportunity that should be grasped to address and correct some of the emerging problems in science, including lack of reproducibility, lack of diversity in academia and precarity in research careers. This transition is occurring at a critical moment when trust in science needs to be assured. Achieving this will require vision, policy action and joint commitment from multiple stakeholders with an interest in the scientific enterprise. It needs to be both top-down (policy driven) and bottom-up (community led).
Different fields of science and different organisations and countries are at different stages of adaptation to open science in the digital world. This provides important opportunities for mutual learning, exchange of good practices and co‑operation. Scientific policy makers need to crowdsource within their own community, engaging other relevant actors as necessary, to identify existing or new solutions to support the positive evolution of science in the digital age.
Ali-Khan, S.E. et al. (2018), “Defining success in open science [version 2; referees: 2 approved]”, MNI Open Research, Vol. 2/2, https://doi.org/10.12688/mniopenres.12780.2.
ANDS (2018), ANDS/Working with Data – SKILLS, Australian National Data Service, www.ands.org.au/working-with-data/skills (accessed 27 May 2019).
Begley, C.G and L.M. Ellis (2012), “Raise standards for preclinical cancer research”, Nature, Vol. 483, Nature Research, Springer, pp. 531-533, www.nature.com/nature/journal/v483/n7391/full/483531a.html.
Blockchain for Peer Review (n.d.), “Towards a fairer and more transparent peer review process”, webpage, www.blockchainpeerreview.org/ (accessed 27 May 2019).
Bonney, R. et al. (2014), “Next steps for citizen science”, Science, Vol. 343/6178, American Association for the Advancement of Science, Washington, DC, pp. 1436-1437, http://dx.doi.org/10.1126/science.1251554.
Brainard, G. and J. You (25 October 2018), “What a massive database of retracted papers reveals about science publishing’s ‘death penalty’”, Science News blog, www.sciencemag.org/news/2018/10/what-massive-database-retracted-papers-reveals-about-science-publishing-s-death-penalty.
CASRAI (n.d.), Research Data Domain website, https://dictionary.casrai.org/Category:Research_Data_Domain (accessed 27 May 2019).
Cold Spring Harbor Laboratory (n.d.), BioRxiv: The Preprint Server for Biology website, www.biorxiv.org/ (accessed 27 May 2019).
Dai, Q., E. Shin and C. Smith (2018), “Open and inclusive collaboration in science: A framework”, OECD Science, Technology and Industry Working Papers, No. 2018/07, OECD Publishing, Paris, https://doi.org/10.1787/2dbff737-en.
Data Seal of Approval (n.d.), Data Seal of Approval website, www.datasealofapproval.org/en/ (accessed 27 May 2019).
DORA (n.d.), Improving How Research is Assessed website, https://sfdora.org/ (accessed 27 May 2019).
Engage2020 (n.d.), Action Catalogue website, http://actioncatalogue.eu/ (accessed 27 May 2019).
HealthMap (n.d.), HealthMap website, www.healthmap.org/en/ (accessed 27 May 2019).
IRIS.AI (n.d.), “Research discovery with artificial intelligence”, https://iris.ai/ (accessed 27 May 2019).
ISD Scotland (2018), Use of the NSS National Safe Haven, www.isdscotland.org/Products-and-Services/EDRIS/Use-of-the-National-Safe-Haven/ (accessed 27 May 2019).
LSE (n.d.), “LSE blogs expert analysis and debate from LSE website”, webpage, http://blogs.lse.ac.uk/ (accessed 27 May 2019).
Noorden, R.V. (5 February 2014), “Scientists may be reaching a peak in reading habits”, Nature News blog, www.nature.com/news/scientists-may-be-reaching-a-peak-in-reading-habits-1.14658.
Nuffield Foundation (n.d.), “The Ada Lovelace Institute”, webpage, www.nuffieldfoundation.org/ada-lovelace-institute (accessed 27 May 2019).
OECD (2018a), “Enhanced access to publicly funded data for STI”, in Science, Technology and Innovation Outlook 2018, OECD Publishing, Paris, https://doi.org/10.1787/sti_in_outlook-2018-en.
OECD (2018b), Scientific Advice During Crises: Facilitating Transnational Co-operation and Exchange of Information, OECD Publishing, Paris, https://doi.org/10.1787/9789264304413-en.
OECD (2017a), “Business models for sustainable research data repositories”, OECD Science, Technology and Industry Policy Papers, No. 47, OECD Publishing, Paris, https://doi.org/10.1787/302b12bb-en.
OECD (2017b), “Co-ordination and support of international research data networks”, OECD Science, Technology and Industry Policy Papers, No. 51, OECD Publishing, Paris, https://doi.org/10.1787/e92fa89e-en.
OECD (2017c), “Open research agenda setting”, OECD Science, Technology and Industry Policy Papers, No. 50, OECD Publishing, Paris, https://doi.org/10.1787/74edb6a8-en.
OECD (2017d), “Digital platforms for facilitating access to research infrastructures”, OECD Science, Technology and Industry Policy Papers, No. 49, OECD Publishing, Paris, https://doi.org/10.1787/8288d208-en.
OECD (2017e), “Strengthening the effectiveness and sustainability of international research infrastructures”, OECD Science, Technology and Industry Policy Papers, No. 48, OECD Publishing, Paris, https://doi.org/10.1787/fa11a0e0-en.
OECD (2017f), OECD Digital Economy Outlook 2017, OECD Publishing, Paris, https://doi.org/10.1787/9789264276284-en.
OECD (2016), “Research ethics and new forms of data for social and economic research”, OECD Science, Technology and Industry Policy Papers, No. 34, OECD Publishing, Paris, https://doi.org/10.1787/5jln7vnpxs32-en.
OECD (2015), “Making open science a reality”, OECD Science, Technology and Industry Policy Papers, No. 25, OECD Publishing, Paris, https://doi.org/10.1787/5jrs2f963zs1-en.
OECD (2013), “New data for understanding the human condition: International perspectives”, OECD, Paris, www.oecd.org/sti/inno/new-data-for-understanding-the-human-condition.pdf.
OECD (2007), OECD Principles and Guidelines for Access to Research Data from Public Funding, OECD Publishing, Paris, https://doi.org/10.1787/9789264034020-en-fr.
ORFG (n.d.), Open Research Funders Group website, www.orfg.org/ (accessed 27 May 2019).
PARTICIPEDIA (n.d.), PARTICIPEDIA website, https://participedia.net/ (accessed 27 May 2019).
Pubpeer Foundation (n.d.), Pubpeer – The Online Journal Club website, https://pubpeer.com/static/about (accessed 27 May 2019).
RDA (n.d.), Research Data Alliance website, www.rd-alliance.org/ (accessed 27 May 2019).
REF2021 (n.d.), Research Excellence Framework website, www.ref.ac.uk/ (accessed 27 May 2019).
Schapira, M. (2018), “Open lab notebooks to increase impact and accelerate discovery”, 26 January, Springer Nature, Data Dialogues, https://researchdata.springernature.com/users/81403-matthieu-schapira/posts/29655-open-lab-notebooks-to-increase-impact-and-accelerate-discovery.
Science Europe (n.d. a), “Science Europe Data Glossary Main Page”, webpage, http://sedataglossary.shoutwiki.com/wiki/Main_Page (accessed 27 May 2019).
Science Europe (n.d. b), “Why plan S: Open access is fundamental to the scientific enterprise,” webpage, www.coalition-s.org/why-plan-s/ (accessed 27 May 2019).
Science Europe (2018), “Practical guide to the international alignment of research data management”, https://www.scienceeurope.org/our-resources/practical-guide-to-the-international-alignment-of-research-data-management/.
SGC (n.d.), Structural Genomics Consortium website, www.thesgc.org/ (accessed 27 May 2019).
University of Washington (2012), “Gamers succeed where scientists fail: Molecular structure of retrovirus enzyme solved, doors open to new AIDS drug design”, ScienceDaily, 19 September, www.sciencedaily.com/releases/2011/09/110918144955.htm.
Verwulgen, I. (2017), “The ADRN and the public’s voice: Making administrative data available for research while gaining public trust”, International Journal of Population Data Science, Vol. 1/1, p. 155, Swansea University, United Kingdom, https://doi.org/10.23889/ijpds.v1i1.174.
Weebly.com (n.d.), Beall’s List of Predatory Journals and Publishers website, https://beallslist.weebly.com/ (accessed 27 May 2019).
Zooniverse (n.d.), Zooniverse website, www.zooniverse.org/ (accessed 27 May 2019).
← 1. In 2016, more than 1.2 million new papers were published in the biomedical sciences alone, bringing the total number of peer-reviewed biomedical papers to over 26 million. However, the average scientist reads only about 250 papers a year (Noorden, 5 February 2014). By some measures the quality of scientific literature has been in decline. Some recent studies have found that most biomedical papers were not reproducible (Begley and Ellis, 2012).