Stefaan G. Verhulst
The GovLab at New York University
Development Co-operation Report 2021
30. Reusing data responsibly to achieve development goals
Abstract
To harness and accelerate the value of data for development, new mechanisms and partnerships to access and reuse data that have already been collected will need to be established. Data collaboration is a cost‑effective and innovative way to multiply the development impact of data. By combining and triangulating data from various sources, data collaboratives can generate new insights and overcome data inequalities. Establishing and operationalising structures and frameworks for responsible use and reuse of data, including addressing concerns about data misuse, should be an urgent priority to truly unlock the promise of digital data for development.
Key messages
Developing countries can offset their limited resources and data capacity through data‑collaboration and data reuse partnerships.
Unless concerns about data misuse and privacy are addressed, data collaboration will not reach its full potential to inform and advance development.
Development actors and other stakeholders should help develop data governance frameworks that balance risk and rewards of data use and reuse and engage the public in creating accountability mechanisms.
Development co-operation can support data capacity building, expanded digital literacy and processes to better identify needs and priorities for data reuse.
For better or for worse, the ongoing digital transformation makes it easier to capture a wide variety of data points and store and analyse them. The challenge is to use this proliferation of data wisely, responsibly and in the public interest. There are many examples of data being used for development objectives – to improve agricultural outcomes, direct humanitarian aid where it is most needed, manage migrant flows and measure illiteracy, to name just a few. But capacity to access, use and govern data that have already been collected varies widely across countries.
The unique value of digital data is that they can be repurposed. Data reuse offers opportunities for low‑income countries and others to share data costs and generate new insights and knowledge that can be put to work for sustainable development. Data collaborations are still stymied by concerns over data privacy, possible misuse and uneven governance. But data sharing, if done wisely and responsibly, can lead to better decision making in the public interest and for development. Support from development actors is needed for a framework for responsible, systemic and sustainable data reuse. The rise of social media, the Internet of Things, and the growing incursion of artificial intelligence and machine learning into everyday life have led to a process of datafication that can effect positive social change – if leveraged responsibly (Lupton and Williamson, 2017[1]).
Greater data capacity leads to greater development impact
Private and public actors generally collect data for particular purposes, typically commercial or administrative. Some of the most common are customer profiling (Poullet, 2021[2]), tracking movement and locations (The GovLab and Cuebiq, 2021[3]), and targeting social and other government services (Verhulst, Young and Zahuranec, 2019[4]). But capacity to collect, use and govern data varies widely across countries, mirroring global economic disparities. Building capacity in low-income countries to generate and use data can help achieve broader Sustainable Development Goals: economic development can lead to increased data capacity; greater data literacy is essential for development; and functional access to data for reuse can potentially spur economic development. The following examples highlight the value of digital data for development, from informing economic decision making to directing humanitarian aid:
Sharing data for better agricultural decisions in Colombia: The Ministry of Agriculture and the Colombian Climate and Agricultural Sector (Clima y Sector Agropecuario Colombiano) shared data and insights with farmers on the economics and agronomy of rice cultivation, enabling them, for example, to avoid planting crops that would fail. This project allowed farmers to sustain their traditional lifestyles and yielded an estimated USD 3.6 million in savings in the year following the launch of the initiative (Young and Stefaan, 2017[5]).
Mapping population movements to direct humanitarian aid in Haiti: Following the 2010 cholera outbreak in Haiti, the telecom provider Digicel Haiti shared data with researchers at the Karolinska Institute in Sweden and Columbia University in New York. Using anonymised data from 2 million mobile phones, the researchers established population movement patterns that helped make aid delivery more effective and efficient. Similar methods have been used elsewhere (Young and Stefaan, 2016[6]).
Measuring illiteracy in Senegal through cell phone records: The international development company Knuper acquired call detail record data of some 9 million subscribers in Senegal from the telecommunications company Orange Sonatel. Knuper used the data in a study to determine the usability of call detail records to improve measurements of illiteracy in developing countries. The project is a good illustration of data repurposing (The GovLab, 2019[7]).
Crowdsourcing new uses for data in West Africa: In Côte d’Ivoire and Senegal, Orange Telecom hosted the Data 4 Development Challenge, an international competition that offered anonymised data to researchers seeking to address development problems, effectively crowdsourcing expertise and insights to determine new and previously unrecognised uses for privately held data (The GovLab, 2017[8]). Among the winners in Senegal were projects exploring the potential of mobile phone data for electrification, planning, how mobile phone access affects millet prices and how waterborne parasites spread through human movement.
The rush to acquire data and data capacity should not, however, lead to data (re)use being seen as a zero‑sum game. Collaboration is key to successful data initiatives – ones that generate relevant insights and lead to genuine, and positive, social change.1
Data collaboratives offer a model for responsible reuse of data
The added value of digital data is that they can be reused by others for similar or different purposes (Verhulst and Young, 2018[9]), multiplying their potential value to development. An emerging model for data reuse are data collaboratives, a form of partnership between data holders and data users (as well as those who can act upon the insight generated and data scientists) to reuse disparate forms of data to generate new insights that can serve the public good (Verhulst et al., 2019[10]; Young and Verhulst, 2020[11]). This new approach to systematic, sustainable and responsible reuse of data has universal applicability. The GovLab has collected more than 200 examples of data collaboratives, many of them in low-income countries.2
Data collaboratives can be cost-effective, innovative and inclusive
The data collaborative approach offers three broad advantages:
1. More cost efficient. It is expensive to collect, store and use data, especially once the cost of analysing them is factored in. A 2020 study for McKinsey Digital estimated that a mid-sized institution in the United States is spending nearly USD 250 million on data annually, with spending rising by almost 50% per year (Grande et al., 2020[12]). Providing data for the United Nations High Level Panel’s envisaged “data revolution” could cost a staggering USD 254 billion, according to a study by Jerven (2014[13]) for the Copenhagen Consensus Center. For low-income countries, spending of such magnitude is unimaginable. Data reuse can bring down the financial costs of data initiatives. The McKinsey Digital report cited the example of a bank that reduced its data costs by 20% by reusing data and, more generally, improving data governance (Grande et al., 2020[12]).
Data reuse can bring down the financial costs of data initiatives. The McKinsey Digital report cited the example of a bank that reduced its data costs by 20% by reusing data and, more generally, improving data governance.
2. Generates fresh insights for better policy. Combining data from various sources by breaking down silos has the potential to lead to new and innovative insights that can help policy makers take better decisions. Satellite data originally collected to predict weather can help manage crop prices and address poverty and hunger (Young and Stefaan, 2016[6]); cell phone data can be used to measure population movements, which in turn can help control migrant flows and address existing or emerging pandemics (The GovLab, 2017[14]). Digital data can also be triangulated with existing, more traditional sources of information (e.g. census data) to generate new insights and help verify the accuracy of information.
3. Overcomes inequalities and asymmetries. Social and economic inequalities, both within and among countries, are often mapped onto data inequalities (UN, 2020[15]; World Bank, 2021[16]; Vieira, 24 February 2018[17]; Alonso, Kothari and Rehman, 2 December 2020[18]). The cost of producing data and the technology needed to process these data are increasingly burdensome for low- and middle-income countries. In data collaboratives, these costs and analytical tools and techniques can be shared. For example, cloud computing, which allows analytical and other technical tools to be easily shared and accessed, can play a vital role in enabling the transfer of skills and technologies across actors and countries.
Concerns over governance and misuse of data stymie greater collaboration
Despite its promise, data collaboration is not yet widespread, particularly in low- and middle-income countries. There are persistent concerns about weak regulation and potential misuse of shared data and a limited evidence base of examples of data reuse. These obstacles highlight the need to ensure that data are managed responsibly at each step, from collection to storage to use and reuse. The challenges to greater data collaboration fall into three main categories:
1. Finding the right governance model. The challenge for most countries today is not whether to regulate the digital sphere. Rather, it is how to design regulatory and institutional frameworks that can unleash the positive potential of data while limiting their potential for harm. Governance remains, at best, a work in progress. To the extent such frameworks exist, they often suffer from regulatory capture, political pressures, and insufficient knowledge or skills on the part of policy makers (Verhulst and Sloane, 2020[19]). These problems may be particularly acute in low-income countries, where regulatory capacity and independence are often weaker.
2. Addressing concerns about misuse. Concerns over data misuse and privacy remain one of the most significant obstacles to greater data collaboration. These are often valid concerns on the part of data holders and data subjects, and data reusers themselves. A multi-pronged strategy to address these concerns should focus on raising awareness within organisations of the risks of data misuse and establishing effective institutional and legal frameworks to ensure accountability and responsible data reuse.
3. Building and sharing evidence from data reuse. Not enough is known – and shared – about how data are being reused, what works and doesn’t work in data collaborations, and emerging lessons and principles for success (Verhulst et al., 2019[10]). A systematised knowledge base could help reduce duplication of effort, a daunting challenge when resources are scarce, and inform more successful initiatives. The GovLab’s repository of case studies is contributing to building a solid knowledge base.
Maximising the positive potential of data reuse
The absence of clear data-sharing frameworks limits the possibilities for data collaboration to enhance development. Rather than maximising the potential of data reuse and minimising its possible harm, the existing fragmented, ad hoc regulations and policies often do the opposite. Fresh approaches are needed – to protect privacy and prevent data misuse, improve decision making, and develop the necessary human resources to effectively manage data.
Replace outdated data governance mechanisms and structures
Existing models and policies to protect privacy are largely outdated and often predicated on a risk-reduction rather than a rewards-maximisation approach.3 Policy makers, whether in government or private bodies, need new ways of balancing risk and reward, reinvigorated institutional models and forms, and fresh ways of ensuring accountability. The following are some of the specific features required:
Innovative risk assessment and mitigation methods across the data life cycle can better balance risk and reward (Young, Campo and Verhulst, 2019[20]).
Data responsibility by design approaches can ensure that privacy and other protections are built into technical and institutional architectures, e.g. integrated technical means to prevent or mitigate violations such as differential privacy and other privacy-enhancing tools (The GovLab, 2021[21]).
The development and dissemination of model data-sharing agreements (Contracts for Data Collaboration, 2021[22]) could provide templates for organisations seeking to share data or access shared data. Low-income countries, which may lack the technical and human resources to design such agreements, could find these especially valuable.
Establishing ethics review boards to oversee how data are being reused and “data stewards” who can steer the process of sharing and reusing data can address data governance challenges (Verhulst et al., 2020[23]).
A global governance framework that smooths cross-border data flows for development and other social purposes is essential.4
Greater public engagement through citizen assemblies, awareness-raising campaigns and educational strategies can help establish a so-called “social license” to reuse data and avoid one‑size-fits-all approaches, an issue of particular salience in low-income countries (Young et al., 2020[24]).
Improve decision making on data reuse priorities
Data sharing is a largely reactive process, driven less by public need than by what data are available or shared. Data collaboration will have more impact if driven by demand rather than supply, though. This entails asking the right questions to identify priorities and share data accordingly.5 An effective questions-based approach to data sharing combines expert knowledge with broad public engagement to pinpoint priority public needs that data can address. Such a “new science of questions” is needed in every context, but it is arguably most relevant for low-income countries where (sometimes contradictory) public priorities compete for limited resources and difficult trade-offs must be made among competing Sustainable Development Goals.6 Asking the right questions can help establish a more systematic, unbiased and scientific approach to identifying needs and channelling scarce public resources. A science of questions is also essential to ensure that goals and initiatives are contextually sensitive – always an important concern for development projects.
a “new science of questions” is needed in every context, but it is arguably most relevant for low-income countries where (sometimes contradictory) public priorities compete for limited resources and difficult trade-offs must be made among competing Sustainable Development Goals
Increase availability of talent and create new professional and institutional positions, such as data stewards
There are a variety of technical means (e.g. digital auditing mechanisms, decision provenance mapping tools) to help strengthen any framework for responsible data use, but data governance ultimately relies on people. Yet, technical advances are outpacing capacity to keep up. Low-income countries in particular need support to bolster their capacity to oversee responsible and systemic data sharing. Several factors are particularly important.
First, training and education can be integrated with existing formal educational systems and supplemented by more flexible, local initiatives; for example, those designed by civil society organisations to raise general awareness among the population of the risks and opportunities of data sharing (Young, Campo and Verhulst, 2019[20]).
Second, capacity building should encompass a mix of goals for different segments of the population. While countries benefit from more and better-trained data scientists, for instance, policy makers, business leaders, journalists and other societal groups also require training and upskilling. Increasing overall data literacy among the general population should be a key goal to raise citizen awareness and increase trust and buy-in of data collaboration.
Third, ensuring accountability and oversight of data and data-sharing initiatives requires the creation of new institutional positions. Organisations can create roles for individuals or bodies as data stewards empowered to oversee how data are managed, identify opportunities for data sharing and enforce accountability across the data chain. These positions are increasingly common in private organisations, but are equally important for government, civil society and educational institutions (Verhulst et al., 2020[23]).
Development actors should actively support responsible data governance frameworks to support sustainable development
Capacity to produce and use data can lead to more informed policies, and reusing data in collaborative partnerships can be a cost-effective way to generate new insights and decisions on development. The promise and challenges posed by data access and data reuse are both heightened in low-income countries. Their more limited human and financial resources can undermine data governance, fail to protect privacy and prevent data misuse, and miss opportunities to improve the well-being of their citizens.
Thus, establishing and operationalising a framework for responsible, systemic and sustainable data reuse should be an urgent priority for policy makers and all stakeholders involved in development. Updated and innovative governance mechanisms to manage data can proactively address risks and maximise the positive potential of data. The creation and training of data stewards can help create the human capital needed to design and implement responsible data collaborations that are fit-for-purpose. These governance mechanisms and professions need to be developed in a strategic and collaborative way that recognises the role data can fill for private and public organisations across society.
References
[18] Alonso, C., S. Kothari and S. Rehman (2 December 2020), “How artificial intelligence could widen the gap between rich and poor nations”, IMF Blog, https://blogs.imf.org/2020/12/02/how-artificial-intelligence-could-widen-the-gap-between-rich-and-poor-nations (accessed on 21 September 2021).
[22] Contracts for Data Collaboration (2021), “C4DC”, web page, https://contractsfordatacollaboration.org (accessed on 21 September 2021).
[12] Grande, D. et al. (2020), “Reducing data costs without jeopardizing growth”, McKinsey Digital, https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/reducing-data-costs-without-jeopardizing-growth (accessed on 21 September 2021).
[13] Jerven, M. (2014), Benefits and Costs of the Data for Development Targets for the Post-2015 Development Agenda, Copenhagen Consensus Center, Lowell, MA, https://www.copenhagenconsensus.com/sites/default/files/data_assessment_-_jerven.pdf.
[1] Lupton, D. and B. Williamson (2017), “The datafied child: The dataveillance of children and implications for their rights”, New Media & Society, Vol. 19/5, pp. 780-794, http://dx.doi.org/10.1177/1461444816686328.
[2] Poullet, Y. (2021), Profiling in the Age of AI, AIEthicsCourse.org, https://aiethicscourse.org/lectures/profiling-in-the-age-of-ai (accessed on 20 September 2021).
[21] The GovLab (2021), “Data responsibility journey: Risks & responsibilities throughout the data lifecycle”, web page, https://dataresponsibilityjourney.org (accessed on 21 September 2021).
[7] The GovLab (2019), “Knuper data upcycling in Senegal”, Data Collaboratives Cases, https://datacollaboratives.org/cases/knuper-data-upcycling-in-senegal.html (accessed on 21 September 2021).
[8] The GovLab (2017), “Orange Telecom Data for Development Challenge (D4D)”, Data Collaboratives Cases, https://datacollaboratives.org/cases/orange-telecom-data-for-development-challenge-d4d.html (accessed on 21 September 2021).
[14] The GovLab (2017), “Tracking malaria in Namibia with cell phone data”, Data Collaboratives Cases, https://datacollaboratives.org/cases/tracking-malaria-in-namibia-with-cell-phone-data.html (accessed on 21 September 2021).
[3] The GovLab and Cuebiq (2021), The Use of Mobility Data for Responding to the COVID19 Pandemic: DATA4COVID19 Deep Dive, Open Data Institute, London, http://theodi.org/wp-content/uploads/2021/04/Data4COVID19_0329_v3.pdf.
[15] UN (2020), Global Issues: Big Data for Sustainable Development, United Nations, New York, NY, https://www.un.org/en/global-issues/big-data-for-sustainable-development (accessed on 21 September 2021).
[19] Verhulst, S. and M. Sloane (2020), “Realizing the potential of AI localism”, Project Syndicate, web page, Project Syndicate, https://www.project-syndicate.org/commentary/local-regulation-of-artificial-intelligence-uses-by-stefaan-g-verhulst-1-and-mona-sloane-2020-02?barrier=accesspaylog (accessed on 21 September 2021).
[9] Verhulst, S. and A. Young (2018), Toward an Open Data Demand Assessment and Segmentation Methodology, The GovLab, New York, NY, https://thegovlab.org/static/files/publications/Data+Demand.pdf.
[10] Verhulst, S. et al. (2019), Leveraging Private Data for Public Good: A Descriptive Analysis and Typology of Existing Practices, The GovLab, New York, NY, https://datacollaboratives.org/static/files/existing-practices-report.pdf.
[4] Verhulst, S., A. Young and A. Zahuranec (2019), “Circular data for a circular city: Value propositions for economic development”, The Circular City Research Journal, Vol. 1, http://files.thegovlab.org/Circular_Data.pdf.
[23] Verhulst, S. et al. (2020), “Wanted: Data stewards: (Re-)defining the roles and responsibilities of data stewards for an age of data collaboration”, The GovLab Blog, https://blog.thegovlab.org/post/wanted-data-stewards-re-defining-the-roles-and-responsibilities-of-data-stewards-for-an-age-of-data-collaboration (accessed on 21 September 2021).
[17] Vieira, H. (24 February 2018), “Without urgent action big data may widen inequality”, LSE Blogs, https://blogs.lse.ac.uk/businessreview/2018/02/24/without-urgent-action-big-data-may-widen-inequality (accessed on 21 September 2021).
[16] World Bank (2021), World Development Report 2021: Data for Better Lives, World Bank, Washington, DC, https://doi.org/10.1596/978-1-4648-1600-0.
[20] Young, A., S. Campo and S. Verhulst (2019), Responsible Data for Children: Synthesis Report, United Nations Children’s Fund/The GovLab, New York, NY, https://rd4c.org/assets/rd4c-synthesis-report.pdf.
[5] Young, A. and V. Stefaan (2017), “Aclímate Colombia: Open data to improve agricultural resiliency”, Open Data for Developing Economies Case Studies, The GovLab, New York, NY, https://odimpact.org/case-aclimate-colombia.html.
[6] Young, A. and V. Stefaan (2016), “Aclímate Colombia: Open data to improve agricultural resiliency”, Open Data for Developing Economies Case Studies, The GovLab, https://odimpact.org/case-aclimate-colombia.html. (accessed on 20 September 2021).
[11] Young, A. and S. Verhulst (2020), “Data collaboratives”, in The Palgrave Encyclopedia of Interest Groups, Lobbying and Public Affairs, Palgrave Macmillan, New York, NY, https://doi.org/10.1007/978-3-030-13895-0_92-1.
[24] Young, A. et al. (2020), The Data Assembly: Responsible Data Re-Use Framework, The GovLab, New York, NY, https://thedataassembly.org/files/nyc-data-assembly-report.pdf.
Notes
← 1. Over several years of research and work regarding data sharing and reuse, this author has boiled down the importance of collaboration into the following three maxims or principles: 1) the data one needs, one likely doesn’t possess; 2) the domain and data expertise one needs, someone else probably possesses; and 3) the computational power and technical infrastructure required to process the data likely demand access to third-party platforms.
← 2. The benefits and challenges outlined in this section draw from case studies and several hours of interviews, desk analysis and other research conducted over the years by The GovLab. Its case study repository is available at: http://datacollaboratives.org/explorer.html.
← 3. On the difficulties of balancing risk and reward, see, for example: https://www.eesc.europa.eu/en/news-media/news/big-data-how-minimise-risks-while-maximising-benefits-all. On rewards maximisation, discussion often involves consideration of the harms and dangers of what Open Data Watch calls “open by default” data policies; see, for example: https://opendatawatch.com/publications/maximizing-access-to-public-data-striking-the-balance.
← 4. For a discussion on the need to smooth cross-border data flows, see: https://www2.itif.org/2017-cross-border-data-flows.pdf and https://www.cigionline.org/publications/data-different-why-world-needs-new-approach-governing-cross-border-data-flows.
← 5. For more information, see the 100 Questions project web page at: https://the100questions.org.