In order to ensure comparability of results between the computer-delivered tasks and the paper-based tasks that were used in previous PISA assessments (and are still in use in countries that use paper instruments), for the test items common to the two administration modes, the invariance of item characteristics was investigated using statistical procedures.
Most importantly, these included a randomised mode-effect study in the PISA 2015 field trial that compared students’ responses to paper-based and computer-delivered versions of the same test items across equivalent international samples1. The goal was to examine whether test items presented in one mode (e.g. paper-based assessment) function differently when presented in another mode (e.g. computer-based assessment). Results of the mode-effect study showed that for the majority of items, the results supported the comparability across the two modes of assessment (i.e. there were very few samples with any significant differences in difficulty and discrimination parameters between CBA and PBA). For some items, however, the computer-delivered version was found to have a different relationship with student proficiency from the corresponding, original paper version. Such tasks had different difficulty parameters (and sometimes different discrimination parameters) in countries that delivered the test on computer. In effect, this partial invariance approach both accounts for and corrects the potential effect of mode differences on test scores.
Table I.A5.2 shows the number of anchor items that support the reporting of results from the computer-based and paper-based assessments on a common scale. The large number of items with common difficulty and discrimination parameters (i.e “scalar invariant”) indicates a strong link between the scales. This strong link corroborates the validity of mean comparisons across countries that delivered the test in different modes.
At the same time, Table I.A5.2 also shows that a large number of items used in the PISA 2022 computer-based tests of reading and, to a lesser extent, science, were not delivered on paper. Caution is therefore required when drawing conclusions about the meaning of scale scores from paper-based tests, when the evidence that supports these conclusions is based on the full set of items. For example, the proficiency of students who sat the PISA 2022 paper-based test of mathematics should be described in terms of the PISA 2012 proficiency levels, not the PISA 2022 proficiency levels. This means, for example, that even though PISA 2022 developed a description of the skills of students who scored below Level 1b in mathematics, it remains unclear whether students who scored within the range of Level 1c on the paper-based tests have acquired these basic mathematics skills.