Algorithm |
Algorithms are exact sequential sets of commands that are performed over a designed input to generate an output in a clearly defined format. Algorithms can be represented in plain language, diagrams, computer codes and other languages. |
Beneficiary/Grant recipient/Grantee |
Any individual or organisation that receives grants to support their operations (also referred to as recipients, beneficiaries, or grantees) |
Conflict of interest |
A conflict of interest involves a conflict between the public duty and private interests of a public official, in which the public official has private-capacity interests which could improperly influence the performance of their official duties and responsibilities. |
Control |
Any action taken by management, the board, and other parties to manage risk and increase the likelihood that established objectives and goals will be achieved.1 |
Corruption |
The active or passive misuse of the powers of Public officials (appointed or elected) for private financial or other benefits |
Data analytics |
A process of inspecting, cleaning, transforming, and modelling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision-making. |
Data architecture |
Data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organisations. |
Data cleaning |
A set of procedures designed to identify and correct, when possible, any data errors, inconsistencies and unclear data features. |
Data dictionary |
A data catalogue that describes the contents of a database. Information is listed about each field in the attribute tables and about the format, definitions and structures of the attribute tables. A data dictionary is an essential component of metadata information. |
Data Governance |
Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods. |
Double funding |
A scenario when identical activities and costs are funded twice through the use of public funds. |
Ex-ante control |
A control that that aims to reduce the possibility of an undesirable outcome. |
Ex-post control |
A control meant to identify errors after an event. |
Fraud |
Fraud is economic crime involving deceit, trickery or false pretences, by which someone gains unlawfully. An actual fraud is motivated by the desire to cause harm by deceiving someone else, while a constructive fraud is a profit made from a relation of trust. |
Grant |
Grants are transfers made in cash, goods or services for which no repayment is required. |
Machine learning |
A subset of artificial intelligence in which machines leverage statistical approaches to learn from historical data and make predictions in new situations. |
Misappropriation |
Acts involving the theft or misuse of an organisation’s assets. |
Network analysis |
A set of integrated techniques to identify relations among actors and to analyse the social structures or patterns that emerge from the recurrence of these relations. |
Positive Unlabelled/PU bagging |
Positive unlabelled (PU) learning is a semi-supervised machine learning technique, which allows working with highly unbalanced data. PU learning could be used in cases when the majority of all available observations belongs to unlabelled cases |
Random Forests |
Random forest is a commonly-used machine learning algorithm which combines the output of multiple decision trees to reach a single result. It handles both classification and regression problems. |
SHAP values |
SHAP (SHapley Additive exPlanations) values express the average marginal contributions of all predictors to the predicted outcome. |
Supervised (machine learning) |
Supervised learning is a subcategory of machine learning and artificial intelligence. It is defined by its use of labelled datasets to train algorithms to classify data or predict outcomes accurately. As input data are fed into the model, it adjusts its weights until the model has been fitted appropriately. |
Test dataset |
A randomly selected sample of the dataset which is used to evaluate the quality (e.g. prediction accuracy) of the model estimated on the training dataset. |
Training dataset |
A randomly selected sample of the dataset which is used to estimate (‘train’) the machine learning model. The training and test datasets are mutually exclusive, that is each observations belongs to either the training or test datasets. |