Michael Keenan, Dmitry Plekhanov, Fernando Galindo-Rueda, and Daniel Ker
Directorate for Science, Technology and Innovation, OECD
Michael Keenan, Dmitry Plekhanov, Fernando Galindo-Rueda, and Daniel Ker
Directorate for Science, Technology and Innovation, OECD
This chapter is based on the OECD Committee for Scientific and Technological Policy project and its exploration of digital science and innovation policy (DSIP) and the challenges it faces. DSIP initiatives refer to the adoption or implementation by public administrations of new or reused procedures and infrastructures relying on an intensive use of digital technologies and data resources to support the formulation and delivery of science and innovation policy. The chapter focuses on three issues in particular. First, it examines the need to ensure interoperability through which diverse data sets can be linked and analysed to aid policy making. Second, it looks at preventing potential misuses of DSIP systems in research assessment practices. Third, it explores management of the roles of non-government actors, particularly the private sector, in developing and operating DSIP infrastructure components and services.
As scientific research and innovation increasingly leave a digital “footprint”, datasets are becoming ever larger, more complex and available at higher speed. At the same time, technological advances – in machine learning (ML) and natural language processing, for example – are opening new analytical possibilities. Science, technology and innovation (STI) policy can benefit from these dynamics (Box 7.1). They can harness the power of digitalisation to link and analyse datasets covering diverse areas of policy activity and impact. For example, initiatives already experiment with semantic technologies to link datasets, with artificial intelligence to support big data analytics and with interactive visualisation and dashboards to promote data use in the policy process.
Over 2017 and 2018, the OECD mapped the landscape of digital science and innovation policy (DSIP) initiatives in OECD countries and partner economies. The OECD DSIP project aimed to help policy makers and researchers assess the transformational potential and possible pitfalls of using digital tools and sources in science and innovation policy making. The project also sought to facilitate learning between countries that are planning, developing or using DSIP systems. The project was carried out under the supervision of the OECD Committee for Scientific and Technological Policy (CSTP) and its Working Party of National Experts on Science and Technology Indicators.
The project included a survey of DSIP initiatives that provides much of the evidence used in this chapter. The survey had three elements:
CSTP delegates identified, and characterised to a basic level, 61 DSIP initiatives in their countries.
Of these 61 initiatives, 39 DSIP initiative managers completed a questionnaire providing further details on the characteristics of their systems, including the data they use, the ways they link data and the main challenges they face.
The OECD Secretariat conducted 20 follow-up interviews with DSIP initiative managers to understand better the origins and dynamics of their systems.
The OECD Secretariat carried out further interviews with leaders of not-for-profit organisations, e.g. Open Researcher and Contributor ID (ORCID) and the Common European Research Information Format. It also met with senior managers from corporate DSIP solutions providers, including Microsoft and Elsevier. The project also included a case study of Norway’s DSIP landscape, as described in Box 7.4.
Figure 7.1 provides a stylised conceptual view of a DSIP initiative and its main components. All of these elements interact in nationally specific ways, reflecting each country’s history and institutional set-up. The main elements consist of various input data sources. These feed into a data cycle enabled by interoperability standards, including unique, persistent and pervasive identifiers (UPPIs). DSIP systems can perform a number of functions and are often used by a mix of users. Box 7.2 outlines several examples of DSIP initiatives from across the world.
Data are predominantly sourced from a mix of administrative data sources held by funding agencies (e.g. databases of grant awards) and organisations that perform research, development and innovation (RD&I). These include Current Research Information Systems (CRIS) in universities, and proprietary bibliometric and patent databases. Some DSIP systems have grown out of these databases. Through integration with external platforms or development of add-on services, they have evolved into infrastructures that can deliver comprehensive data analysis on research and innovation activities. Other systems have been established from the ground up. Several DSIP systems harvest data from the web to build a picture of the incidence and impacts of science and innovation activities. Web sources include, but are not limited to, company websites and social media.
DSIP infrastructures can increase the scope, granularity, verifiability, communicability, flexibility and timeliness of policy analyses. They can lead to the development of new STI indicators (Bauer and Suerdem, 2016), the assessment of innovation gaps (Kong et al., 2017), strengthened technology foresight (Kayser and Blind, 2017) and the identification of leading experts and organisations (Shapira and Youtie, 2006; Johnson, Fernholz and Fosci, 2016; Gibson et al., 2018). Furthermore, in some countries, researchers and policy makers have started to experiment with natural language processing and ML. They are using it to track emerging research topics and technologies (Wolfram, 2016; Mateos-Garcia, 6 April 2017) and to support RD&I decisions and investments (Yoon and Kim, 2012; Park, Yoon and Kim, 2013; Yoon, Park and Kim, 2013). Box 7.3 outlines the range of goals set for DSIP initiatives.
In Belgium, the Flemish department of Economy, Science and Industry, in co‑operation with data providers and information technology partners, developed the Flanders Research Information Space (FRIS) in 2011. It aims to accelerate innovation, support science and innovation policy making, share information on publicly funded research with citizens and reduce the administrative burden of research reporting. The FRIS is a single window on all Flemish research. It can be used by government agencies in several ways. First, it is a tool to improve the visibility of research funding programmes. Second, it is a resource for in-depth analyses of scientific and technological trends and the development of statistical indicators on STI.
In Brazil, the National Council for Scientific and Technological Development established Lattes Platform with support from the Ministry of Science and Technology, the Ministry of Education and the government body “Co‑ordination for the Improvement of High-Level Personnel”. The platform supports policy design and formulation, management of research funding programmes and strategic planning. It is based on integrations of a variety of digital resources of Brazilian government agencies and higher education institutions (HEIs). Aside from visualising Brazilian STI datasets, Lattes Platform enables the design of add-on analytical solutions to better serve the needs and expectations of science and innovation policy makers.
In Poland, the Ministry of Science and Higher Education launched the POL-on system using financial support from the European Union and the technical assistance of three private companies. POL-on is an integrated information system for higher education. It supports the work of the Ministry of Science and Higher Education, as well as other ministries and institutions of science and higher education. Its main task is to create a database of scientific institutions, universities and Polish science. Information collected through the system supports the decision-making process of the Ministry of Science and Higher Education regarding Polish universities and research units. Certain parts of datasets collected by the system are made available to the public.
In Argentina, the Ministry of Science, Technology and Productive Innovation uses SICYTAR (Sistema de Información de Ciencia y Tecnología Argentino) to evaluate and assess STI policy initiatives, project teams and individual researchers. The system aggregates several databases, covering researchers’ curriculum vitae; funded research and development (R&D) projects; information on public and private institutions performing R&D activities in Argentina; and, information on large research equipment.
In Estonia, a number of stakeholders launched the Estonian Research Information System (ETIS). These include the Ministry of Education and Research, the Estonian Science Foundation, the Scientific Competence Council, public organisations that perform RD&I, and the Archimedes Foundation. Based on multi-partner co‑operation, ETIS serves as a large-scale national digital system that unites data management efforts. HEIs use ETIS as an internal system for research information management and as a tool to showcase their research. Public funders use the system to evaluate and process grant applications. ETIS is also used in national research assessments and evaluations by providing data on STI indicators, e.g. R&D revenue per research and teaching staff member and the percentage of women among scientists.
In the Netherlands, the National Academic Research and Collaborations Information System (NARCIS) collects data from multiple sources. These include funder databases, CRISs, institutional repositories of research performers and the Internet. Data on research outputs, projects, funding, human resources and policy documents collected by NARCIS inform policy makers on research in the Netherlands and monitor the openness of access to data. Funders also use the system to identify research gaps to improve resource planning. NARCIS also serves as an important research directory, providing researchers, journalists, and the domestic and international public with information on the status and outputs of Dutch science.
In Norway, the research-reporting tool Cristin collects information from research institutions, the Norwegian Centre for Research Data and ethics committees. Cristin serves as a resource for the performance-based funding model of the Ministry of Research and Education. It provides numerous users from government, industry, academia and civil society with verified information on the current status of Norwegian research.
In Japan, the National Graduate Institute for Policy Studies designed the SciREX Policymaking Intelligent Assistance System (SPIAS) to strengthen national evidence-informed STI policy making. SPIAS uses big data and semantic technologies to process data on research outputs and impacts, funding, R&D-performing organisations and research projects, with a view to mapping the socio-economic impacts of research. SPIAS has been used to analyse leading Japanese scientists’ performance before and after receiving grants from the Japan Science and Technology Agency. It has also been used to assess the impact of regenerative medicine research in Japan, and to map emerging technologies.
In Spain, Corpus Viewer, developed by the State Secretariat for Information Society and Digital Agenda, processes and analyses large volumes of textual information using natural language processing techniques. Policy makers use results to monitor and evaluate public programmes, and to formulate science and innovation policy initiatives. The system is restricted to government officials.
Optimisation of administrative workflows. Digital tools can help streamline potentially burdensome administrative procedures and deliver significant efficiency gains within agencies. These benefits can also extend to those using public agencies’ services, including researchers or organisations applying for (or reporting on) the use of research grants. For example, they can use interoperability identifiers to link their research profiles to grant applications. As the digital gateway to the Estonian research system, ETIS (Box 7.2) incorporates tools for grant application submissions and research reporting, thereby streamlining administrative workflows at Estonian research-performing organisations.
Improved policy formulation and design. Digitalisation offers new opportunities for more granular and timely data analysis to support STI policy; this should improve the allocation of research and innovation funding. Furthermore, DSIP systems often link data collected by different agencies. In this way, they provide greater context to policy problems and interventions, and offer possibilities for more integrated interagency policy design at the research or innovation system level. To give a country example: the Japanese Ministry of Education, Culture, Sports, Science and Technology and the National Institute of Science and Technology Policy have launched the SciREX data and information infrastructure to improve STI policy formulation and design. The system provides datasets to support STI policy studies. It aims to improve the accountability and transparency of public investments in R&D and strengthen the methodological frameworks used in policy evaluations.
Support of performance monitoring and management. DSIP systems offer the possibility of collating real-time policy output data. For example, in Colombia, the SCIENTI Technological Platform has developed STI indicators and metrics that support the monitoring and assessments of government-funded research. DSIP systems can allow more agile short-term policy adjustments. They can improve insights into the policy process for accountability and learning in the medium to long term, so that evaluation becomes an open and continuous process. Policy makers and delivery agencies can consider the circumstances that make it possible and meaningful to use other digitally enabled data resources, such as altmetrics of research outputs and impacts (Priem et al., 2010; Sugimoto and Larivière, 2016). They can also rely on other data collection approaches (e.g. web scraping) to complement and enhance existing approaches to assessing research.
Anticipatory intelligence. Technologies like big data analytics can help detect patterns in data that could be useful for policy, e.g. emerging research areas, technologies, industries and policy issues. Digital technologies can also support short-term forecasting of policy issues and contribute to strategic policy planning (Choi et al., 2011; Zhang et al., 2016; Peng et al., 2017; Yoo and Won, 2018). For instance, DSIP systems could identify labour demand in specific STI fields and address potential mismatches on the supply side of the labour market. In the Russian Federation, for example, the Institute for Statistical Studies and Economics of Knowledge of the National Research University Higher School of Economics, has developed the iFORA system to support foresight studies. Underpinned by advanced computational techniques, iFORA analyses large volumes of administrative data and web data to provide insights on STI breakthroughs, weak signals of change, centres of excellence and emerging technologies.
General information discovery. DSIP systems often include data on a wide range of inputs, outputs and activities. Policy makers and funders can use these data to identify leading experts in a given field (e.g. identify reviewers for project proposals), as well as centres of excellence (Guo et al., 2012; Sateli et al., 2016). This kind of information also helps researchers and entrepreneurs to identify new partners for collaboration and commercialisation. For example, the Ministry of Business, Innovation and Employment of New Zealand has developed the New Zealand Research Information System (NZRIS). It aims to raise the quality of RD&I data and improve information discovery on issues related to research and innovation. The NZRIS provides information on levels of public investments in different research areas, research collaboration networks, and leading researchers and organisations. In doing so, it aims to accelerate research commercialisation and foster close partnerships between academia and industry.
Promotion of inclusiveness in STI policy agenda setting. DSIP systems can contribute to debate with stakeholders on policy options by providing detailed information about a policy problem in an accessible way, e.g. through interactive data visualisation. The increased transparency provided by DSIP systems can empower citizens by providing them with knowledge about the nature and impacts of ongoing research and innovation. Thus, DSIP may be instrumental in building trust and securing long-term sustainable funding for research and innovation. Costa Rica, for example, has launched the Hipatia platform to help citizens better understand national scientific capabilities and the impacts of publicly funded research. Hipatia is an integrated platform created atop a variety of Costa Rican administrative databases. As a “one-stop shop” for research information in Costa Rica, Hipatia aims to improve the transparency and accountability of publicly funded research.
Source: Based on OECD (forthcoming a), Digital Science and Innovation Policy and Governance.
Realising the potential of DSIP involves overcoming several possible barriers. In their responses to the OECD questionnaire, DSIP administrators identified data quality, interoperability, sustainable funding and data protection regulations as the biggest challenges facing their initiatives (Figure 7.2). Other challenges cited less often were access to data, the availability of digital skills and trust in digital technologies. Policy makers wishing to promote DSIP face further systemic challenges. These include overseeing fragmented DSIP efforts and multiple (often weakly co-ordinated) initiatives (see Box 7.4, which summarises a case study of Norway’s DSIP ecosystem); ensuring responsible use of data generated for other purposes; and balancing the benefits and risks of private sector involvement in providing DSIP data, components and services.
The Norwegian Ministry of Education and Research requested the OECD Secretariat to conduct a case study of the Norwegian landscape for DSIP. This study took place in the context of the OECD DSIP project. It involved an extensive literature review of policy issues and technological trends. The authors also analysed policy documents and reports related to Norway’s DSIP landscape. In addition, they interviewed key stakeholders during a one-week mission to Norway in April 2018. Interviewees included data providers, regulators, administrators, and developers of digital infrastructures and their users.
The case study describes Norway’s DSIP landscape, including its initiatives and main actors, the objectives followed and their outcomes, the level of devoted resources and future development perspectives. It shows that Norway has built substantial capabilities in preservation, access and use of comprehensive administrative datasets that could power analytical solutions used in DSIP systems.
The DSIP landscape in Norway comprises a number of digital infrastructures that collect, preserve and provide access to data on research and innovation activities. These include a digital infrastructure for sharing datasets across Norwegian government agencies; databases of Research Council Norway and Innovation Norway that include data on research inputs and outputs; and Health&Care21 research and innovation monitor, which aims to facilitate decision making on healthcare research.
One of the key elements of the Norwegian DSIP landscape is Cristin, Norway’s national CRIS. Cristin is interoperable with several external digital systems managed by Norwegian government agencies and effectively serves as a major data hub on Norwegian research. Cristin provides the evidence base on which the Norwegian government performs its assessments of research performance. Apart from government bodies, all higher education institutes, research hospitals and public research institutions that receive public funding use the system to support research and strategic planning.
A distinguishing feature of Norway is its trust-based social consensus. Individuals and organisations are willing to share data about themselves with the government to improve the quality of policy making and to create more value for citizens. High levels of trust, accountability and transparency in the Norwegian government, combined with a consensus-based culture of decision making, create an excellent environment for developing DSIP initiatives.
Nevertheless, there is considerable fragmentation of efforts around DSIP. For example, several Norwegian ministries and agencies are experimenting with ML algorithms. They wish to extract actionable knowledge from fragmented datasets to support the development of statistical indicators. These, in turn, could help steer science and innovation policy initiatives in a more effective and efficient way. In some cases, these experiments – often in co‑operation with external providers – have already helped design early versions of DSIP solutions. However, such efforts could benefit from a more systematic approach, involving greater co‑ordination across government.
Source: OECD (forthcoming b), “OECD case study of Norway’s digital science and innovation policy and governance landscape”.
Research and innovation activities, by their nature, have high levels of pervasiveness and are shaped by a large number of stakeholders. As a result, data on the incidence and impacts of research and innovation are dispersed across a variety of public and private databases and the web. Harvesting these datasets from external sources requires the development of common data formats and other interoperability enablers including, but not limited to, application programming interfaces (APIs), ontologies, protocols and UPPIs.
An integrated and interoperable system leads to a considerable reduction in the reporting and compliance burden, freeing up time and money for research and innovation. In addition to the reduced administrative burden, interoperability allows quicker, cheaper and more accurate data matching. This, in turn, makes existing analyses less costly and more robust, and facilitates new analyses. Interoperability can produce more timely and detailed insights, enabling more responsive and tailored policy design. Furthermore, the gradual emergence of internationally recognised identifiers makes it easier to track the impacts of research and innovation activities across borders, and map international partnerships.
Interoperability raises several types of questions. On a technical level, policy makers must ask what kind of digital system can be put in place to make existing and new data interoperable. On a semantic level, they must grapple with metadata and language issues. With respect to governance, they must reflect on how all stakeholders can be aligned to agree upon an interoperability system. A specific issue concerns the role and effectiveness of data standards, particularly in a mixed ecosystem containing both legacy and new systems. In this regard, some DSIP systems use national identifications (IDs) – e.g. business registration and social security numbers – as well as country-specific IDs for researchers (Figure 7.3).
Type |
Examples |
---|---|
UPPIs for STI actors |
● Open Researcher and Contributor ID (ORCID) ● Digital object identifier (DOI) ● Global Research Identifier Database (GRID) ● International Standard Name Identifier (ISNI) ● Ringgold ID |
Author IDs generated by publishers/indexers |
● Researcher ID ● Scopus Author ID |
Management standards for data about STI |
● Common European Research Information Format (CERIF) ● Consortia Advancing Standards in Research Administration Information (CASRAI) Dictionary ● VIVO ontology |
Protocols |
● Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) |
Source: OECD (2018), OECD Science, Technology and Innovation Outlook 2018: Adapting to Technological and Societal Disruption, https://doi.org/10.1787/sti_in_outlook-2018-en.
In recent years, attempts have been made to establish international standards and vocabularies to improve the international interoperability of DSIP infrastructures (Table 7.1). These include UPPIs, which assign a standardised code unique to each research entity, persistent over time and pervasive across various datasets. Box 7.5 sets out the desirable characteristics for successful UPPIs. Some UPPIs exist as an integral part of, or support for, commercial products such as publication/citation databases, research information systems, supply-chain management services, etc. Others exist solely to provide a system of identifiers for wide adoption and use. One example is ORCID, which aims to resolve name ambiguity in scientific research by developing unique identifiers for individual researchers. These systems provide a simple register of UPPIs and basic associated identity information (e.g. name and organisational affiliation for individuals, name and location for organisations). In addition, they often directly include, or incorporate links to, a wide range of further information. For example, ORCID records allow details of education, employment, funding and research works to be added manually or brought in by linking to other systems including Scopus and ResearcherID.
McMurry, Winfree and Haendel (6 July 2017) propose various desirable characteristics for identifiers. These have been adapted here for the specific use case under consideration (identifying individuals and organisations):
Defined. The identifiers should follow a formal pattern (regular expression) that will also determine the total set of assignable identifiers; this facilitates validation and use (including by machines).
Persistent, stable. The identifier should stay the same over time, wherever possible, and should never be deleted; this avoids difficulty locating records. In support of this, it is not recommended to include unnecessary detail or information liable to change in the identifier format chosen (e.g. by using a random alphanumeric code of a fixed length and structure).
Unambiguous. The identifier must relate to no more than one entity locally; to avoid confusion between different entities. The identifier format chosen should seek to avoid ambiguity. For example, if an alphanumeric identifier is used, either the number zero or letter “o” should be allowed as these are easily confused by users.
Unique. One entity should ideally be associated with no more than one identifier (and identifiers should never be “recycled” to apply to another entity).
Version-documented. Where important changes occur, these should be clearly logged and, if necessary, new identifiers issued.
Web-friendly. The id should avoid use of characters that perform specific functions html and exchange formats (e.g. XML) such as “:”, “/”, “.” to make the identifiers easier to use, search, etc.
Web-resolvable. The identifier must be resolvable to a web address where the data or information about the entity can be accessed. In practice, this means the identifier should consist of a uniform resource identifier (URI) pattern (e.g. http://orcid.org/) and a local id relating to the specific record (e.g. 0000-0002-2040-1464). When used together, the URI and local id create a resolvable web address (e.g. http://orcid.org/0000-0002-2040-1464). This allows the identifier to be easily checked to ensure it relates to an actual record and that the record relates to the correct entity.
Free to assign. The identifier should ideally be assigned at no costs; this reduces barriers to adoption.
Open access (OA) and use. The identifier appropriate metadata (e.g. the name of the entity to which it relates) should be able to be transparently referenced and actioned (e.g. in a public index or search) anywhere, by anyone, and for any reason; this enables integration on the basis of practical usefulness.
Documented. The identifier scheme, its operation, etc. should be clearly documented; this enables users to understand the system and encourages consistent use. Documented privacy and dispute resolution policies are also important factors.
Source: McMurry, Winfree and Haendel (6 July 2017), “Bad identifiers are the potholes of the information superhighway: Take-home lessons for researchers”, http://blogs.plos.org/biologue/2017/07/06/bad-identifiers-potholes-of-information-superhighway/.
As an UPPI system gains traction there may be a “network effect”, whereby each additional registrant increases the value of the system to all users. Eventually the UPPI system may become a generally expected way for entities to unambiguously identify each other. This results in strong incentives to join for those not yet registered.
Besides UPPIs, APIs have become an industry standard for integrating data. They enable machine-to-machine interactions and data exchanges. Within a framework of digital government initiatives, several countries have started to proliferate APIs across the whole landscape of government websites and databases, improving data reuse. Improvements in data access to administrative datasets have positive impacts on the functionality and reliability of the results of analyses delivered by DSIP systems.
Aside from government agencies and other public funders, RD&I-performing organisations store a significant share of research and innovation data. The Common European Research Information Format (CERIF) and metadata formats by Consortia Advancing Standards in Research Administration Information (CASRAI) were originally designed to serve the needs of HEIs in data management. Some DSIP systems use them to harvest curated data from research institutes and directly apply them in analysis (Box 7.6).
CERIF is a standard maintained by the international not-for-profit organisation EuroCRIS since 2002. It ensures a uniform management and exchange of research information by providing data models (entities, attributes and relationships), exchange models, metadata models and controlled vocabulary terms. CERIF covers related information on publications, projects, organisations, equipment, events, individuals, language, facilities, patents, products and services (Jörg et al., 2012). An important feature of CERIF is the provision of connectivity among different metadata standards by enabling conversion of one standard into another (Jeffery and Asserson, 2016).
CASRAI is an international not-for-profit organisation founded in 2006. It helps key stakeholders in data curation to develop standard agreements for making research information exchange more efficient. Agreements entail standards for managing the full data cycle. Implementation of CASRAI standards can help organisations to improve data quality, interoperability and accessibility. They do this by filtering information (agreements on report format templates) and disambiguating it (agreements on shared glossaries). CASRAI is mainly used in Europe, the United States and Canada; the rest of the world tends to use other standards. Even still, a large number of digital tools in one way or another use CASRAI standards. For example, ORCID uses CASRAI research-output report formats and glossaries, and Snowball Metrics uses CASRAI standard information agreements (CASRAI, 2016).
Semantic ontologies can also help address the problem of interoperability in DSIP infrastructures. Launched in 2003 by Cornell University, the VIVO project aims to develop an open-source software and an ontology for research information enabling federated search for research partners. The VIVO ontology includes information on organisations, researchers, activities and their relationships. It builds linkages among various data items to provide a consistent and connected perspective on research and enables more effective data reuse. In a similar vein to VIVO, other initiatives like Semantic Web for Research Communities and Advanced Knowledge Technologies also provide ontologies for scientific research.
Source: OECD (forthcoming a), Digital Science and Innovation Policy and Governance.
In recent years, research funders, research-performing institutes and researchers have faced increasing pressure to demonstrate the value and impact of research. Budgetary discussions implicitly or explicitly compare the value of the marginal dollar placed in science versus other policy areas. All policy areas try to make their best possible case, and data-based assessment has become a core component of evidence-based policy and strategy discussions. As a particular class of evidence-based assessment, data-driven assessments are responding to the complexity of research and innovation systems, and the need for more efficient and faster decisions. They use the opportunity provided by the digital trace of many scientific research activities, as well as growing data processing capacities.
However, there are significant risks that the procedures of data-based assessment fail to meet their intended objectives. A key risk of data-based systems is giving up control over what drives assessment. Decisions, for example, are based on what data are available in a quantitative, apparently compact fashion. Data-based assessment can provide a valid perspective only as long as available data encompass all the relevant parts of the phenomena of interest. Two steps can address this concern. First, policy makers need a broad sense of what science actors of different types do and the extent to which existing data capture these activities and their outcomes. Second, they need to identify to what extent such data can be actually deployed for assessment. This will depend on their accessibility and interoperability with other data sources.
The level of analysis at which impact is examined is critical. One of the great advantages of digitalisation is the technical ability to operate with large, linked databases at very fine levels of granularity so that information is not necessarily lost in the process of aggregation. This micro perspective has, however, somewhat contributed to a loss of perspective in terms of what can be concluded from inferences in such data. One prominent example of a disconnect between data users and producers relates to the confusion between using data to assess features in the performance of individual researchers, their institutions and the country or broader area as a whole, as highlighted in Figure 7.4. For example, while the journal Impact Factor was born out of the need to inform librarians’ decisions concerning what titles to acquire and store, over time this became a surrogate measure used to assess the quality of individual researchers and their research outputs. Despite extensive academic discussion of the limitations of journal-based metrics (Moed et al., 2012), these continue to be widely used. Such misuses of data have generated calls for concerted efforts to create an open, sound and consistent system for measuring all activities that contribute to academic productivity.
DSIP infrastructures could reinforce existing misuses of data, which could distort the incentives and behaviour of individual researchers and research-performing organisations (Hicks et al., 2015; Edwards and Siddhartha, 2017). But DSIP also brings with it the promise that one day most, if not all, relevant dimensions of research activity and interaction might be represented digitally. This can be described as the “promise of altmetrics”. Some argue the emergence of web-based new data sources, especially those generated within online social media platforms, can provide timelier insights into relevant and hitherto unknown dimensions (Priem et al., 2010).
It has been argued that altmetrics could support the assessment of increasingly important, non-traditional scholarly products like datasets and software, which are under-represented in the citation indices frequently used for assessment. Altmetrics could also reward impacts on wider audiences outside the publishing core, such as practitioners or the public in general. The altmetrics movement promotes the use of metrics generated from social media platforms as a source of evidence of research impact broader and timelier than citations. Altmetrics have also been advanced as part of the infrastructure required to facilitate open science, and as an aid to filtering fast-growing amounts of information outside or at the margins of traditional peer-review mechanisms. However, as with more traditional metrics, such as citation counts, questions remain over the extent to which altmetrics qualify as signals of research impact.
More than half of the DSIP systems surveyed play a role in research assessment. Nearly 90% collect information on research outputs and more than one-third gather information on research impacts (Figure 7.5). Some, like the Cristin system in Norway, the Lattes Platform in Brazil and the METIS system in the Netherlands, are the primary sources of data for national research assessments. Few use altmetrics in their research assessments.
Non-government actors are emerging as one of the main forces for digitalisation of science and innovation policy. Solutions developed by commercial companies and not-for-profit organisations can provide governments with essential capabilities in data management and analytics at an agreed cost and within a required timeframe.
The private sector exerts manifold impacts on the development of DSIP initiatives. DSIP systems can potentially use digital products and services designed by the private sector as building blocks. Essentially, they can extend the functionality and increase the value of DSIP systems for their stakeholders. The private sector designs technological architectures, develops digital tools for data management and provides consulting services related to launching and maintaining digital infrastructures. However, co‑operation between the public and private sectors is multidimensional. It is not confined to the purchase of off-the-shelf solutions; there is also considerable co‑operation in developing new solutions. For instance, administrators of the Flanders Research Space DSIP system are co‑operating with IBM to develop a web-scraping tool that retrieves information on research activities scattered across the web.
The large academic publishers, Elsevier and Holtzbrinck Publishing Group, together with the analytics firm, Clarivate Analytics, are particularly active. They are developing and bundling a mix of products and services into platforms that mimic many features of fully fledged DSIP systems (Figure 7.6 shows the example of Elsevier). Several products developed by these firms, including bibliographic databases, unique identifiers and organisational CRIS (Box 7.2), are often key components in governments’ DSIP systems.
In addition, digital giants like Alphabet and Microsoft, and national technology companies such as Baidu (People’s Republic of China) and Naver (Korea) have all designed platforms to search academic outputs. The impact of these companies on the digitalisation of science and innovation policy is limited. However, given their coverage of information on research outputs, these platforms could become key elements in national DSIP systems. For instance, the Academic Knowledge API of Microsoft Academic Graph enables the retrieval of information on publications, citations, contributors, institutions, fields of study, journals and conferences (API, n.d.; Microsoft, n.d.). Developers of DSIP systems can use these data for further analysis, which could spark competition with other established commercial databases of bibliographic information (such as Scopus). Academic search engines (Google Scholar, Microsoft Academic, Baidu Scholar and Naver Academic) could collect information on research publications and citations. This could potentially support research assessments and international benchmarking of research participants, including university rankings (Daraio and Bonaccorsi, 2017; Kousha, Thelwall and Abdoli, 2018).
Another group of firms active in the DSIP area are providers of research administration tools for public funders and research-performing organisations. These tools provide the evidence base for national research assessments and support decisions on allocation of public funding. Some of these companies are involved in consultancy projects to support evidence-informed science and innovation policy making. Science-Metrix, a subsidiary of the Canadian research information management firm 1Science Inc., is a case in point. It was commissioned in 2018 to develop methods and indicators of research and innovation activities for the US National Science Foundation (Côté et al., 2018).
Harnessing these private sector developments for use in public DSIP systems has many potential benefits. Solutions can be implemented quickly and at an agreed cost, sparing the public sector the need to develop the necessary in-house skills beforehand. Private companies can also promote interoperability through their standards and products, which can expand the scope and scale of data within a DSIP system. But there are also risks. For example, outsourcing data management activities to the private sector may result in a loss of control over the future development of DSIP systems. In addition, reliance on proprietary products and services may lead to discriminatory access to data, even if these concern research activities funded by the public sector. Finally, the public sector’s adoption of commercial standards for metrics may drive the emergence of private platforms exhibiting network effects that are difficult to contest.
Charities and not-for-profit organisations also contribute to DSIP, as shown above in the discussion of interoperability enablers. These organisations can also directly fund and design DSIP solutions. For instance, the Alfred P. Sloan Foundation has financially supported projects to collect systematic evidence on impacts of publicly funded research (e.g. ETOILE, UMETRICS), to provide free access and sharing of research outputs (e.g. arXiv.org, FORCE11, Impactstory) and to aid data disambiguation (improvement of citations, development of unique identifiers). In another example, an Australian-based not-for-profit social enterprise, Cambia, in co‑operation with Queensland University of Technology, has launched the Lens, an open platform for “innovation cartography”. The platform aggregates data from databases of several national and international patent offices and scholarly datasets including PubMed, Crossref and Microsoft Academic to provide OA to disambiguated and linked patent information (Lens, n.d.). A number of add-on tools and services provide actionable intelligence that decision makers can use. For example, policy makers can use Lens PatCite to identify, disambiguate and link scientific articles cited in patents (Lens PatCite, n.d.).
Due to their free pricing and high levels of functionality, digital solutions designed by not-for-profit organisations are widely adopted by public organisations, as well as commercial firms. Indeed, in many cases, they serve as important elements of commercial DSIP solutions, enhancing their functionality and contributing to their interoperability. Administrators of several surveyed DSIP systems opted mostly for open software and free digital solutions to better ensure the financial sustainability of their operations and to mitigate the risks of vendor lock-ins.
The digital transformation of STI policy and its evidence base is still in its early stages. This means STI policy makers can take an active stance in shaping DSIP ecosystems to fit their needs. This will require strategic co-operation, through significant interagency co-ordination and sharing of resources (such as standard digital identifiers), and a coherent policy framework for data sharing and reuse in the public sector. Since several government ministries and agencies formulate science and innovation policy, DSIP ecosystems should be founded on the principles of co-design, co-creation and co-governance (OECD, 2018).
This chapter has highlighted some of the challenges facing DSIP. Interoperability remains a major barrier, despite the recent proliferation of identifiers, standards and protocols. There is the potential opportunity for policymakers to influence the development of international UPPI systems in terms of target populations, information captured, compatibility with statistical systems, and especially adoption both by entities and by potential users. In particular, international efforts related to data documentation and the development of standards for metadata could be consolidated to improve data interoperability.
DSIP systems can help broaden the evidence base on which research is assessed by, for example, incorporating altmetrics. They can also empower a broad group of stakeholders to participate more actively in the formulation and delivery of science and innovation policy. However, there is also the danger that these systems reinforce existing data misuses. DSIP systems should uphold and endorse recent initiatives that promote best practices in the responsible use of data. These include the San Francisco Declaration on Research Assessment (ASCB, n.d.) and Leiden Manifesto (Hicks et al., 2015).
Finally, governments can usefully co‑operate with the private and not-for-profit sectors in developing and operating DSIP systems. However, they should ensure public data remains outside of “walled gardens” and open for others to readily access and reuse. They should also avoid vendor lock-ins, deploying systems that are open and agile. In a fast-changing environment, this will provide governments with greater flexibility to adopt new technologies and incorporate unexploited data sources in their DSIP systems.
API (n.d.), “Academic knowledge”, webpage, https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/home (accessed 16 July 2018).
ASCB (n.d.), “San Francisco Declaration on Research Assessment”, webpage, www.ascb.org/dora (accessed 18 August 2016).
Bauer, M.W. and A. Suerdem (2016), “Relating ‘science culture’ and innovation”, presentation at the OECD Blue Sky Forum on Science and Innovation Indicators, Ghent, 19-21 September.
CASRAI (2016), “CASRAI Impacts – 10-year anniversary, Building bridges for research information users”, CASRAI.
Choi, S. et al. (2011), “SAO network analysis of patents for technology trends identification: A case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells”, Scientometrics, Vol. 88/3, Springer International Publishing, Cham, Switzerland, pp. 863-883, https://doi.org/10.1007/s11192-011-0420-z.
Côté, G. et al. (2018), Bibliometrics and Patent Indicators for the Science and Engineering Indicators 2018 – Technical Documentation, Science-Matrix, Montreal.
Daraio, C. and A. Bonaccorsi (2017), “Beyond university rankings? Generating new indicators on universities by linking data in open platforms”, Journal of the Association for Information Science and Technology, Vol. 68/2, Wiley Online Library, pp. 508-529.
Edwards, M. and R. Siddhartha (2017), “Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition”, Environmental Engineering Science, Vol. 34/1, Mary Ann Liebert Inc., pp. 51-61, http://online.liebertpub.com/doi/pdf/10.1089/ees.2016.0223.
Gibson, E. et al. (2018), “Technology foresight: A bibliometric analysis to identify leading and emerging methods”, Foresight and STI Governance, Vol. 12/1, National Research University Higher School of Economics, Moscow, pp. 6-24.
Guo, Y. et al. (2012), “Text mining of information resources to inform forecasting innovation pathways”, Technology Analysis & Strategic Management, Vol. 24/8, Routledge, London, pp. 843‑861, https://doi.org/10.1080/09537325.2012.715491.
Hicks, D. et al. (2015), “Bibliometrics: The Leiden manifesto for research metrics”, Nature, Vol. 520/7548, Nature Research, Springer, pp. 429-431, www.nature.com/news/bibliometrics-the-leiden-manifesto-for-research-metrics-1.17351.
Jeffery, K. and A. Asserson (2016), “Position paper: Why CERIF?”, presentation at smart descriptions and smarter vocabularies workshop, Amsterdam, 30 November-1 December, www.w3.org/2016/11/sdsvoc/SDSVoc16_paper_15.
Johnson, R., O. Fernholz and M. Fosci (2016), “Text and data mining in higher education and public research. An analysis of case studies from the United Kingdom and France”, Association des directeurs et personnels de direction des bibliothèques universitaires et de la documentation, Paris, https://adbu.fr/competplug/uploads/2016/12/TDM-in-Public-Research-Final-Report-11-Dec-16.pdf.
Jörg, B. et al. (2012), “CERIF 1.3 full data model (FDM): Introduction and specification”, 28 January, EuroCRIS, www.eurocris.org/Uploads/Web%20pages/CERIF-1.3/Specifications/CERIF1.3_FDM.pdf.
Kayser, V. and K. Blind (2017), “Extending the knowledge base of foresight: The contribution of text mining”, Technological Forecasting and Social Change, Vol. 116C, Elsevier, Amsterdam, pp. 208‑215.
Kong, D. et al. (2017), “Using the data mining method to assess the innovation gap: A case of industrial robotics in a catching-up country”, Technological Forecasting and Social Change, Vol. 119, Elsevier, Amsterdam, pp. 80-97.
Kousha, K., M. Thelwall and M. Abdoli (2018), “Can Microsoft Academic assess the early citation impact of in-press articles? A multi-discipline exploratory analysis, Journal of Informetrics, Vol. 12/2, arXiv:1802.07677, Cornell University, pp. 287-298.
Lens (n.d.), “About the Lens”, webpage, https://about.lens.org/ (accessed 31 August 2018).
Lens PatCite (n.d.), “Lens PatCite” webpage, www.lens.org/lens/patcite (accessed 31 August 2018).
Mateos-Garcia, J. (6 April 2017), “We are building a formidable system for measuring science – but what about innovation?”, Nesta blog, www.nesta.org.uk/blog/we-are-building-a-formidable-system-for-measuring-science-but-what-about-innovation/.
McMurry, J., L. Winfree and M. Haendel (6 July 2017), “Bad identifiers are the potholes of the information superhighway: Take-home lessons for researchers”, PLOS Biologue Community blog, http://blogs.plos.org/biologue/2017/07/06/bad-identifiers-potholes-of-information-superhighway/.
Microsoft (n.d.), “Microsoft Academic Graph”, webpage, www.microsoft.com/ (accessed 16 July 2018).
Moed, H.F. et al. (2012), “Citation-based metrics are appropriate tools in journal assessment provided that they are accurate and used in an informed way”, Scientometrics, Vol. 92/2, Springer, pp. 367-376.
OECD (forthcoming a), Digital Science and Innovation Policy and Governance, OECD Publishing, Paris.
OECD (forthcoming b), “OECD case study of Norway’s digital science and innovation policy and governance landscape”, OECD Science, Technology and Innovation Policy Papers, OECD Publishing, Paris.
OECD (2018), OECD Science, Technology and Innovation Outlook 2018: Adapting to Technological and Societal Disruption, OECD Publishing, Paris, https://doi.org/10.1787/sti_in_outlook-2018-en.
Park, H., J. Yoon and K. Kim (2013), “Using function-based patent analysis to identify potential application areas of technology for technology transfer”, Expert Systems with Applications, Vol. 40/13, Elsevier, Amsterdam, pp. 5260-5265.
Peng, H. et al. (2017), “Forecasting potential sensor applications of triboelectric nanogenerators through tech mining”, Nano Energy, Vol. 35, pp. 358-369, Elsevier, Amsterdam, https://doi.org/10.1016/j.nanoen.2017.04.006.
Priem, J. et al. (2010), “Altmetrics: A Manifesto”, webpage, http://altmetrics.org/manifesto (accessed 5 February 2017).
Sateli, B. et al. (2016), “Semantic user profiles: Learning scholars’ competences by analyzing their publications”, in A. González-Beltrán, F. Osborne and S. Peroni (eds.), Semantics, Analytics, Visualization. Enhancing Scholarly Data, SAVE-SD 2016, Lecture Notes in Computer Science, Vol. 9792, Springer, Cham, Switzerland, https://doi.org/10.1007/978-3-319-53637-8_12.
Shapira, P. and J. Youtie (2006), “Measures for knowledge-based economic development: Introducing data mining techniques to economic developers in the state of Georgia and the US South”, Technological Forecasting and Social Change, Vol. 73/8, Elsevier, Amsterdam, pp. 950-965.
Sugimoto, C. and V. Larivière (2016), “Social media indicators as indicators of broader impact”, presentation at the OECD Blue Sky Forum on Science and Innovation Indicators, Ghent, 18 September, www.slideshare.net/innovationoecd/sugimoto-social-media-metrics-as-indicators-of-broader-impact.
Wolfram, D. (2016), “Bibliometrics, information retrieval and natural language processing: Natural synergies to support digital library research”, in Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries, ACL Anthology, www.aclweb.org/anthology/.
Yoo, S.H. and D. Won (2018), “Simulation of weak signals of nanotechnology innovation in complex system”, Sustainability, Vol. 10/2/, MDPI, Basel, pp. 486, https://doi.org/10.3390/su10020486.
Yoon, J. and K. Kim (2012), “Detecting signals of new technological opportunities using semantic patent analysis and outlier detection”, Scientometrics, Vol. 90/2, Springer, pp. 445-461.
Yoon, J., H. Park and K. Kim (2013), “Identifying technological competition trends for R&D planning using dynamic patent maps: SAO-based content analysis”, Scientometrics, Vol. 94/1, Springer, pp. 313‑331.
Zhang, Y. et al. (2016), “Technology roadmapping for competitive technical intelligence”, Technological Forecasting and Social Change, Vol. 110, Elsevier, Amsterdam, pp. 175-186, https://doi.org/10.1016/j.techfore.2015.11.029.