Variation in the coverage of PISA and TIMSS/PIRLS means that school and classroom change effect sizes are therefore not available for all education systems across all of the questions asked. Furthermore, data are missing when certain questions (or questionnaires) were omitted at the national level at certain points in time. This is not an issue when reporting responses to a single question, but it does pose a potential problem when seeking to combine information across questions. In order to analyse as many countries as possible whilst keeping a wide range of questions in the analysis, it has been necessary to manage the missing data through a combination of deletion and estimations processes.
An iterative process has been used to manage observations (education systems) and variables (questions) with missing data, and some systems/countries and questions have had to be omitted in the construction of an index:
1. Education systems that had effect size data for fewer than 20% of the potential question set were excluded.
2. Following this, questions with high proportions of missing data were dropped. Specifically, those questions with effect size missing for more than 50% of the remaining database were excluded.
3. Education systems with less than 60% valid data on the remaining questions were then excluded from the analysis.
Following the deletion process, some of the remaining education systems still had portions of missing data. Data was typically missing when education system had not participated in one of the surveys. As information for a whole dataset was missing, it was not possible to undertake an imputation at the indicator level. However, it was possible to estimate the effect of a missing dataset on the final index.
The estimation process uses information from countries having all the data points in order to estimate the impact of including a dataset on the index computation. We use this information to adjust the indices of countries missing one dataset. The process goes as follows:
For education systems with all the information available, a subset of indices was computed, each one of them excluding one of the datasets from the index computation (). The index including all the data was also calculated (I). For instance if other countries missed PISA, countries with all the information available will have an index excluding PISA ( ) and one with PISA (I).
The ratios of complete index to sub-indices were calculated for each country (.
The cross-country mean ratio of full index to every sub-index was computed, giving us a dataset factor effect for each potential missing data source. (
Finally, countries missing data from one source (A) had their index computed with all their information available (). This index is then corrected by multiplying it by the dataset factor of the corresponding missing database, giving us the final composite index ().