The methodology underpinning the analysis of trends in performance in international studies of education is complex. In order to ensure the comparability of PISA results across different assessment years, a number of conditions must be met.
In particular, successive assessments of the same subject must include a sufficient number of common assessment items, and these items must retain their measurement properties over time, so that results can be reported on a common scale. The set of items included must adequately cover the different aspects of the framework for each domain.
Furthermore, the sample of students in assessments carried out in different years must be equally representative of the target population; only results from samples that meet the strict standards set by PISA can be compared over time. Even though they participated in successive PISA assessments, some countries and economies cannot compare all of their PISA results over time.
Even when PISA samples accurately reflect the target population (that of 15-year-olds enrolled in grade 7 or above), changes in enrolment rates and demographics can affect the interpretation of trends. For this reason, Chapter 9 in this volume also discusses contextual changes alongside trends in performance, and presents adjusted trends that account for changes in the student population in addition to the basic, non-adjusted performance trends.
Comparisons over time can also be affected by changes in assessment conditions or in the methods used to estimate students’ performance on the PISA scale. In particular, from 2015 onward, PISA introduced computer-based testing as the main mode of assessment. It also adopted a more flexible model for scaling response data, and treated items that were left unanswered at the end of test forms as if they were not part of the test, rather than as incorrectly answered. (Such items were considered incorrect in previous cycles for the purpose of estimating students’ position on the PISA scale.) Instead of re-estimating past results based on new methods, PISA incorporates the uncertainty associated with these changes when computing the significance of trend estimates (see the section on “link errors” below, and Chapter 2).
Finally, comparisons of assessment results through years that correspond to different assessment frameworks may also reflect the shifting emphasis of the test. For example, differences between PISA 2015 (and earlier) and PISA 2018 results in reading, or between PISA 2012 and PISA 2018 results in science reflect not only whether students have become better at mastering the common assessment items used for linking the assessments (which reflect the earlier assessment framework), they also reflect students’ relative performance (compared to other students, in other countries) on aspects of proficiency that are emphasised in the most recent assessment framework.