Fraud is by nature a hidden activity, so how can authorities detect and mitigate risks effectively? This report identifies ways for Spain’s General Comptroller of the State Administration (Intervención General de la Administración del Estado, IGAE) to tackle this challenge, using state-of the-art machine learning models, and effectively target its control activities to the highest fraud risks found in public grants and subsidies.
There are few reliable figures for country-level fraud levels, given the complexities of measuring something that is intentionally concealed. Often countries rely on broader proxy measurements, such as the extent of reported irregularities in specific programmes or sectors. Nonetheless, available figures suggest considerable challenges and fraud risks for governments. For instance, in many countries that assess the extent of fraud in social benefit programmes, such as France, the United Kingdom and the United States, estimates of fraud reach into the hundreds of millions of euros. In its 32nd Annual Report on the protection of the European Union’s financial interests: Fight against fraud 2020, the European Commission reported EUR 375 million as fraudulent linked to revenue and expenditures. Fraud levels in EU Member States are likely to be much higher when taking into account national funds and public expenditures.
Control bodies, such as the IGAE, are on the frontline of the governmental efforts to prevent and detect fraud. They have a unique government-wide vantage point to spot fraud risks and strengthen the effectiveness, efficiency and economy of government spending through ex-ante and ex-post evaluations. To do this job effectively in a digital age, oversight bodies face considerable pressure to keep pace with both evolving risks and new technologies. In Spain, like other EU Member States, the Recovery, Transformation and Resilience Plan puts specific emphasis on the need to improve the mechanisms and tools to prevent, detect and correct for the risks inherent in public grants, including fraud, corruption, conflicts of interest and double funding.
In this context, the IGAE and the OECD, with the support of the European Commission, worked together to identify ways for the IGAE to strengthen its assessments of fraud risks in public grants and subsidies, with the ultimate goal of more targeted control activities. The project focused on supporting the IGAE in making use of existing data, and identifying ways that it could expand its analysis to consider new data sources, fraud risks and methodologies. Chapter 1 briefly describes the IGAE’s context and mandate, as well as its approach for assessing risks and planning its control activities. It also highlights several overarching considerations for the IGAE to enhance its use of data and analytics, regardless of whether it adopts the machine learning model in Chapter 2, with a focus on assessing grant fraud risks. They include:
Strengthen data governance and management for assessing grant fraud risks, starting with quick wins like improving its data dictionaries, clarity of unique identifiers and data controls specifically for fraud risk analysis.
Build capacity for data-driven risk assessments, in particular, developing structured datasets and ideally a capacity that brings together expertise related to grant-making processes, fraud risks, analytics and visualisation.
Beware of pitfalls concerning composite risk indicators as well as biases, which can include biases in machine learning models.
Chapter 2 lays out a proof-of-concept for a data-driven risk model for the IGAE to adopt in part or in its entirety. The methodology makes use of data at the IGAE’s disposal, thereby implicitly accounting for the IGAE’s current context. The machine learning model accounts for risks across the grant cycle to the extent the data allowed. The process of developing the proof-of-concept for the risk model led to several insights and the identification of areas for improvement, including the following:
Establish a ready-made dataset for fraud risk identification, which this project has started as a pilot and can form the basis for future risk analysis with fewer investments in resources and time.
Expand the IGAE’s use of indicators across the entire grant cycle, including enhancing data and indicators that go beyond descriptive features and reveal behaviours (e.g. conflicts of interest).
Invest in continuous improvement of the machine learning risk model, if adopted, to ensure a truly random sample, account for new data and risks, and address biases, among other considerations.
Consider network analyses and making use of a broader set of methodologies, including those that take advantage of company data.
Finally, Chapter 3 offers a roadmap for complementing existing IGAE grants data in order to improve its risk assessment models. Specifically, it outlines datasets that can be matched to existing IGAE grants data, thereby enhancing the analytical sophistication and improving the precision of the IGAE’s risk assessment. The guidance and recommendations in the report draw from OECD fact-finding interviews, analyses of the IGAE’s context and available data, the experiences of other government entities and international leading practices.