The use of AI can help integrity actors to efficiently prevent and detect corruption and fraud by drawing insights from large and complex datasets that would have been impossible to analyse otherwise. However, poor data quality can limit or undermine these efforts, potentially resulting in wasted resources or scepticism about the benefits of leveraging AI. Poor data quality has implications for a range of anti-corruption and anti-fraud activities. For instance, the pre-processing of data used for conducting fraud and corruption risk assessments, including assessing and addressing reliability issues, can be even more time-consuming than conducting the “analytics” itself (OECD, 2019[79]). Similarly, many integrity actors face the routine challenge of managing missing or error-prone data linked to a variety of critical data sources, such as registries for asset declarations or lobbying disclosures, as well as public procurement data.
Moreover, when the data used to train models are unreliable and incomplete, existing assumptions and biases can be perpetuated and even exacerbated (Adam and Fazekas, 2018[80]). These problems can arise due to pre-existing societal bias in the data, incomplete data, small sample sizes, errors in the definition of variables, or the omission or inclusion of flawed variables or proxies (OECD, 2019[9]; OECD, 2023[81]). The use of synthetic data (artificially generated data) for training AI models attempts to overcome some of the quality issues inherent in many complex datasets (Lee, 2024[82]), but there is still no full-proof measure to mitigate these risks. Flawed data matching and the use of error prone algorithms can have profound consequences (The Royal Commission into the Robodebt Scheme, 2023[83]).
Issues concerning data quality and AI can also exacerbate existing challenges around trust in public institutions and the auditability of decision making. AI tools can be seen as ‘black box’ systems, taking an input and producing an output while the process in between is neither visible nor easy to interpret. There is therefore a risk that the public may find it difficult to understand how and by whom decisions in public institutions are being made, with unintended impacts to the integrity and transparency of the process. Public institutions may find it hard to provide meaningful explanations of those AI processes, especially when security issues or intellectual property rights prevent them from doing so (International Public Sector Fraud Forum, 2020[73]). For many integrity actors, these challenges related to the interpretability and explainability of results can undermine the very principles they are meant to uphold, like transparency and accountability in public decision making.
In addition, it can be difficult for audit and integrity bodies to audit AI systems that are making decisions based on poorly trained AI models, or decisions experimentally or intuitively based on big data, and to understand how particular AI tools work. Where auditors do not have the correct access level or expertise, it can be difficult to verify that AI systems are functioning as intended and the necessary risk assessment and treatment mechanisms are in place (OECD, 2023[81]). The challenges associated with explaining and auditing AI systems could make it harder for governments to nurture trust in public decision making and for the public to be confident policymaking is effective and serving the public interest (OECD, 2019[9]; OECD, 2023[81]).