Performance on school tests reflects what students know and can do. They also show how quickly students process information and how motivated they are to do well on the test. To encourage students who sit the PISA test to do their best through to the end of the assessment, schools and students are reminded how important the study is for their country. At the beginning of the test session, the test administrator reads a script that includes the following sentence:
PISA 2022 Results (Volume III)
Annex A8. Student engagement with the PISA 2022 Creative Thinking assessment
“This is an important study because it will tell us about what you have been learning and what school is like for you. Because your answers will help influence future educational policies in <country and/or education system>, we ask you to do the very best you can.”
However, many students view PISA as a low-stakes assessment: they can refuse to participate in the test with no negative consequences and do not receive any feedback on their performance. There is a risk, therefore, that students do not do their best on the test (Wise and DeMars, 2010[1]).
Several studies in the United States have found that student performance on assessments, such as the United States national assessment of educational progress (NAEP), depends on how they are administered. One study shows that students did not perform as well in regular low-stakes conditions as when students received financial rewards tied to their performance or were told their results would count towards their grades (Wise and DeMars, 2005[2]). In contrast, a study in Germany found no difference in effort or performance measures between students who sat a PISA-based mathematics test under the standard PISA test-administration conditions and students who sat the test in alternative high-stakes conditions tied to performance (Baumert and Demmrich, 2001[3]). In the latter study, experimental conditions included promising feedback on performance, providing monetary incentives contingent on performance, and letting students know that the test would count towards their grades. The difference in conclusions reached by these two studies suggests that students’ motivation on low-stakes tests such as PISA differs significantly across countries. The only existing multi-country study on the effect of incentives on test performance found that offering students monetary incentives to do well on a test such as PISA – something that is not possible within regular PISA procedures – led to improved performance among students in the United States while students in Shanghai (China) performed equally well with or without incentives (Gneezy et al., 2017[4]).
Differences in student engagement in a given test often reveal important variations in test-administration conditions. For example, in 2018, students predominantly concentrated in a small number of schools in a few regions of Spain exhibited anomalous response patterns, performed below expectations, and reported low levels of engagement with the test. Further investigation revealed that the regions in which these schools were located had conducted their high-stakes exams for 10th-grade students earlier in the year than in the past. This meant that the testing period for these exams coincided with the end of the PISA testing window. Students were more negatively disposed towards PISA in schools where the PISA testing day was closer to that of high-stakes exams (OECD, 2020[5]).
Summing up, differences in countries’ and economies’ mean scores in PISA, and comparisons between PISA 2022 results and results from prior assessments may reflect differences not only in what students know and can do but how motivated they were to do their best. Put differently, PISA does not measure students’ maximum potential but what students actually do in situations where their individual performance is monitored only as part of their group’s performance.
This annex computes several indicators of student engagement with the PISA 2022 Creative Thinking items, specifically. The indicators in this annex rely on non-invasive behavioural indicators (i.e. students’ interactions with the test forms). Other indicators of engagement with the PISA test more broadly, using PISA 2022 data from the mathematics, reading and science assessments as well as from the student questionnaire module, are described in Annex A8 of the PISA 2022 Results (Volume I) report (OECD, 2023[6]). As with the indicators described in that annex, the intention of constructing indicators of engagement is not to suggest adjustments to PISA mean scores or performance distributions but to provide richer context for interpreting cross-country differences and trends in performance.
Behavioural indicators of disengagement with the creative thinking items
A number of approaches have been developed to assess differences in students’ motivation in low-stakes tests (Buchholz, Cignetti and Piacentini, 2022[7]) between individuals or groups (e.g. across countries and economies), some of which are based on behavioural indicators. Behavioural indicators are based on the idea that when respondents are disengaged, they do not provide a response that reflects their best judgement or capabilities to the questions asked in the test.
In general, creative work requires task engagement (OECD, 2023[8]). Unlike simple knowledge recognition or reproduction tasks, most tasks in the PISA 2022 Creative Thinking test require students to develop and submit a written or a visual artefact. The complexity of this artefact may vary, from one or a few words to more extended written or visual compositions. In all cases, students must invest time and effort in reading the task prompt, understanding the stimulus material, and actively constructing a response in the format required. This, in turn, implies a minimum level of engagement with and time spent on each task.
In order to examine test-taking effort and potentially identify students who demonstrate disengagement with the creative thinking items, three sets of indicators have been constructed. These include:
Students who rapidly move through a test item without spending a sufficient amount of time to provide a valid response (“rapid responders”);
Students who spend a short amount of time on an item, relative to other students in the same country, and who do not submit a valid response (“relative rapid responders”);1
Students who do not submit a valid response (i.e. missing responses) after spending any length of time on an item.
The first two indicators combine time-on-task information with information on the quality (or lack thereof) of student responses. Measures of engagement based on time-on-task suppose that there is a minimum amount of time that students should spend on any given item to be able to purposefully engage with the content of that item and be able to provide a valid response that is reflective of students’ capabilities. In the context of creative thinking, complex cognitive processing takes time. For example, a meta-analysis of performance in divergent thinking tasks concluded that performance increased linearly with more time spent on task, up to a certain point where performance gains slowed (Paek et al., 2021[9]). It is therefore reasonable to assume that, in general, students who do not spend a minimum period of time on a task will not have been able to adequately engage in the processes of creative thinking. In turn, both non-responses (i.e. missing responses) and responses submitted by students that were not reflective of any skill in creative thinking (i.e. responses that achieved no credit) might be considered to be invalid.
The third indicator relies only on the absence of valid responses as an indicator of disengagement. It supposes that students who do not submit a response to a given item are disengaged as they have made no attempt to provide a response.
Rapid responding behaviours
The first indicator examined here identifies “rapid responding behaviours”. This refers to students who, after being shown one item, quickly move onto the next item without submitting a valid response. For this indicator, the time-on-task threshold is uniformly set to 30 seconds for all items included in the analysis: if students do not spend more than 30 seconds on an item and either do not submit a response (i.e. missing) or submit a response that achieves no credit (i.e. an inappropriate response), then students are considered to exhibit rapid responding behaviour for that item.
On average across OECD countries, rapid responding behaviours were demonstrated on around 4% of items seen by students (Table III.A8.1). In some countries and economies, this percentage was significantly higher. For example, students demonstrated this behaviour on over 15% of all items they saw in Albania and Cyprus. By task grouping (ideation process and domain context).
Table III.A8.2 and Table III.A8.5 show the percentage of rapid responding behaviours across countries/economies by ideation process and by domain context. In general, students exhibited slightly more rapid responding behaviours when tackling evaluate and improve tasks (4.3% of tasks encountered) than generate creative ideas (3.5%) or generate diverse ideas tasks (3.6%) (Table III.A8.2). In most countries and economies, the share of rapid responding behaviours was relatively consistent across task types by ideation process, with differences in the frequency of such behaviours rarely exceeding 2 percentage points between two ideation processes.
When it comes to the domain context, rapid responding behaviours were most frequently observed in visual expression tasks (5.9 % tasks encountered) and least often in social problem-solving tasks (3.3%), on average across the OECD – although differences across domains were also small, in general (Table III.A8.5). In a few countries/economies, differences in the share of rapid responding behaviours observed across tasks in different domains exceeded 5 percentage points. In North Macedonia and Baku (Azerbaijan), students exhibited rapid responding behaviour in over 20% of the items in the visual domain, but around 8 percentage points less frequently in written expression tasks. In Cyprus and Albania students show the highest rate of rapid responding behaviours in the context of scientific problem-solving tasks. By student characteristics (gender and socio-economic status)
On average across all task groupings, girls and advantaged students record significantly less rapid responding behaviours than boys and disadvantaged students, respectively (Tables III.A8.8, III.A8.11, III.A8.14, III.A8.17). In general, boys exhibit rapid responding behaviours on around 2 percentage point more items than girls. Gender differences vary across tasks in different domain contexts, being on average higher in written expression tasks and visual expression tasks and lower in scientific and social problem-solving tasks. Particularly large gender differences in rapid responding behaviours across all items are observed in Albania (around 10 percentage points), and Palestinian Authority (around 9 percentage points).
Interestingly in North Macedonia, while students exhibited rapid responding behaviours in around 14% of all visual expression tasks encountered, there are no significant differences in the rate of these behaviours between boys and girls for these tasks – despite significant gender differences observed in tasks across the other three domains.
On average across the OECD, advantaged students exhibit around 4 percentage point less rapid responding behaviours on all items than disadvantaged students. Disadvantaged students tend to display these behaviours most frequently in evaluate and improve ideas items than in generate creative ideas or generate diverse ideas tasks compared to their advantaged peers.
Across domain contexts, differences between the share of students from advantaged and disadvantaged backgrounds who exhibit rapid responding behaviours is remarkably consistent (between 3 and 4 percentage points) on average across the OECD. However, patterns vary considerably in each country/economy (Table III.A8.17).
Rapid responding behaviours relative to national peers
The threshold for identifying the “minimum” amount of time to spend on a task that is conducive to productive engagement may also be set in relation to the characteristics and demands of a given task, and/or in relation to the effort of other students within the same country/economy. Features that may influence the minimum time required include the required response format (e.g. a single word answer vs. a visual composition vs. an extended paragraph), the familiarity of the task content, and the length of the task instructions and stimuli material. In addition to the characteristics of tasks, adaptations of task content into different languages will also impact the length and potential complexity of the task instructions. Moreover, for most items in the test, students are required to produce a written artefact as a response: the time required to produce such responses may be influenced by the relative complexity and form of the national language.
In sum, more lengthy and complex content will take longer to process and produce than simpler and shorter content. The second indicator of engagement examined here operationalises rapid responding behaviours differently to the first indicator, namely by identifying rapid responses relative to the national sample. For each item, students in the bottom quarter of time-on-task are considered to have spent relatively little time on an item compared to peers within their country/economy. Students in the bottom quarter of time-on-task are considered to exhibit “relatively rapid responding behaviour” for an item if they either do not submit a response (i.e. missing) or if they submit a response that achieves no credit (i.e. an inappropriate response). This indicator takes into account the fact that the minimum threshold of “reasonable” time spent on a task may differ across country and language groupings.
On average across OECD countries, relatively rapid responding behaviours were demonstrated on 14% of items seen by students (Table III.A8.1). It should be expected that the indicator of relative rapid responding behaviours is higher across countries and economies than the indicator of rapid responding behaviours, given that the “minimum” time threshold is set higher than 30 seconds. Nonetheless, large differences in the share of relatively rapid responding behaviours can be observed across countries/economies. In Albania, students demonstrated this behaviour in 39% of all tasks they encountered, and in North Macedonia, Baku (Azerbaijan), Bulgaria and Jordan students did so in more than a fourth of all tasks encountered. Conversely, in Latvia*, Macao (China), Kazakhstan, Singapore and Estonia, students in the bottom quarter of time on task showed relatively rapid responding behaviours in less than 10% of all items they encountered.
By task grouping (ideation process and domain context)
Table III.A8.3 and Table III.A8.6 show the percentage of relatively rapid responding behaviours across countries/economies by ideation process and by domain context. In general, students exhibited slightly less relatively rapid responding behaviours when tackling generate creative ideas tasks (12% of tasks encountered) than in generate diverse ideas tasks (15%) or evaluate and improve ideas tasks (14%) (Table III.A8.3). In most countries and economies, the share of rapid responding behaviours was relatively consistent across task types by ideation process.
When it comes to the domain context, relatively rapid responding behaviours were least frequently observed in visual expression tasks (11% tasks encountered) and most frequently observed in scientific problem-solving tasks (16%), (Table III.A8.6).In few countries/economies, however – Palestinian Authority, North Macedonia, Albania, Panama, Baku (Azerbaijan) and Uzbekistan – students exhibited relatively rapid responding behaviours more frequently in the visual expression domain than in the other domains.
By student characteristics (gender and socio-economic status)
As shown when examining differences by student characteristics in rapid responding behaviours, girls and advantaged students record significantly less relatively rapid responding behaviours than boys and disadvantaged students, respectively, on average across the OECD (Tables III.A8.9, III.A8.12, III.A8.15, III.A8.18). In general, boys exhibit relatively rapid responding behaviours on around 5 percentage point more items than girls (when considering items across all task groupings). Particularly large gender differences in relatively rapid responding behaviours across all items are observed in Palestinian Authority (19 percentage points) and Albania (around 17 percentage points).
Gender differences in relatively rapid responding vary the most across tasks in different domain contexts, with differences in the behaviour of boys and girls reaching around 6 percentage points in written expression tasks and less than 3 percentage points in scientific problem-solving tasks. In 26 countries and economies, there were no significant differences in the share of relatively rapid responding behaviours in the scientific problem-solving domain. In social problem solving, the gender difference in this indicator is the largest in the Palestinian Authority (13 percentage points), Qatar (12 percentage points) and the United Arab Emirates (10 percentage points).
On average across the OECD, advantaged students exhibit around 11 percentage point less relatively rapid responding behaviours on all items than disadvantaged students. Differences in relatively rapid responding behaviours between advantaged and disadvantaged students are particularly large in Bulgaria (around 22 percentage points on average), Romania (over 20 percentage points), and Israel (around 18 percentage points).
Disadvantaged students tend to display these behaviours less frequently in generate diverse ideas items (difference of about 10 percentage points) than in the other two processes, compared to their advantaged peers. Across domain contexts, differences between the share of students from advantaged and disadvantaged backgrounds who exhibit relatively rapid responding behaviours are highest in scientific problem-solving tasks (about 13 percentage points), on average across the OECD, and lowest in visual expression tasks (7 percentage points). However, as with the indicator for rapid responding behaviours, patterns vary considerably in each country/economy (Table III.A8.18).
Non-responding behaviours
The third indicator of engagement examined here refers to non-responding behaviours – or in other words, the percentage of items for which students in a country/economy did not submit any response. A lower share of non-responding behaviours indicates that students within a country/economy have at least made some attempt to engage with tasks, although this measure is not sensitive to other forms of satisficing behaviours that might also indicate disengagement (e.g. off task-exploration, inappropriate responses, random guessing). On the other hand, this indicator includes students who spent a sufficient amount of time on an item but who were unable to attempt any response (i.e. students of low proficiency).
Students across the OECD did not submit a response to around 6% of all test items they viewed, on average (Table III.A8.1). Similar to the other indicators discussed above, the share of non-responding behaviours across countries and economies varies significantly. In 20 participating countries/economies, non-responding behaviours are observed on over 10% of all items encountered by students. Large shares of missing responses on items encountered are observed in Baku (Azerbaijan) (23%), Albania (21%), Jamaica* (18%) and North Macedonia (18%). Less than 2% of items encountered by student in Singapore had no response submitted – a very small share of all tasks encountered by students.
By task grouping (ideation process and domain context)
Table III.A8.4 and Table III.A8.7 show the percentage of non-responding behaviours across countries/economies by ideation process and by domain context. In general, students did not give a response most frequently to items asking them to evaluate and improve ideas (8% of tasks encountered) than generate creative ideas (5% tasks) or generate diverse ideas (6% tasks) (Table III.A8.3). The percentage of non-responses is particularly large for evaluate and improve items in Baku (Azerbaijan) (26 percentage points), Albania (24 percentage points), North Macedonia (21 percentage points) and Jamaica (21 percentage points).
When it comes to tasks across different domain contexts, students did not provide a response most frequently for scientific problem-solving tasks by some margin: across OECD countries, students did not provide a response for nearly 9% of tasks encountered in this domain, compared to 5% tasks in the social problem solving domain (Table III.A8.7). In some countries/economies, students did not provide a response for over one fifth of all items encountered in a domain: this was the case for students in Albania and Baku (Azerbaijan) in the written expression domain; for students in Philippines, Morocco, Albania, Baku (Azerbaijan), Uzbekistan in the visual expression domain (with percentage reaching 36% in Uzbekistan); and for students in North Macedonia, Albania, Baku (Azerbaijan), Cyprus, Jamaica and Bulgaria in the scientific problem-solving domain.
By student characteristics (gender and socio-economic status)
As with the first two indicators of engagement examined in this annex, girls also record less non-responses than boys – although on average across the OECD, differences between boys and girls in this indicator are small (albeit significant). Nonetheless, gender differences of over 8 percentage points in favour of girls in this indicator, when considering all items together, can still be observed in Palestinian Authority and Albania. However, when examining non-responding behaviours between boys and girls on different task types, in some cases differences are not significant: for example, in evaluate and improve items and in the two problem-solving domain contexts (Table III.A8.10 and Table III.A8.13). In fact, in Latvia*, girls provide significantly more non-responses than boys in the problem-solving domains.
In most countries/economies, differences in non-responding behaviours are largest between boys and girls in the written expression domain followed by the visual expression domain (or vice versa). However, in Albania differences between boys and girls are largest in the social problem-solving domain, and in Saudi Arabia, Brunei Darussalam, Chinese Taipei, Macao (China), Qatar, Uzbekistan and France, differences are largest in the scientific problem-solving domain (Table III.A8.13).
In terms of differences in non-responding behaviours amongst students with different socio-economic backgrounds, disadvantaged students exhibit this behaviour in around 7 percentage point more items than advantaged students. Disadvantaged students tend to display these behaviours most frequently in evaluate and improve ideas items (difference of 8 percentage points) than in generate creative ideas (difference of 5 percentage points) or generate diverse ideas tasks (difference of 6 percentage points), compared to their advantaged peers. Across domain contexts, differences between the share of students from advantaged and disadvantaged backgrounds who exhibit non-responding behaviours are highest in scientific problem-solving tasks, on average across the OECD, and lowest in visual expression tasks. However, as with the indicator for rapid responding behaviours, patterns vary considerably in each country/economy (Table III.A8.19).
Table III.A8.1. How much effort did students invest in the PISA test? Annex A8 tables
Table III.A8.1 |
Engagement with the items in the creative thinking test |
Table III.A8.2 |
Engagement of rapid responders with the creative thinking items across ideation processes |
Table III.A8.3 |
Engagement of relative rapid responders with the creative thinking items across ideation processes |
Table III.A8.4 |
Engagement of no responders with the creative thinking items across ideation processes |
Table III.A8.5 |
Engagement of rapid responders with creative thinking items across domain contexts |
Table III.A8.6 |
Engagement of relative rapid responders with creative thinking items across domain contexts |
Table III.A8.7 |
Engagement of no responders with creative thinking items across domain contexts |
Table III.A8.8 |
Engagement of rapid responders with creative thinking items across ideation processes, by gender |
Table III.A8.9 |
Engagement of relative rapid responders with creative thinking items across ideation processes, by gender |
Table III.A8.10 |
Engagement of no responders with creative thinking items across ideation processes, by gender |
Table III.A8.11 |
Engagement of rapid responders with creative thinking items across domain contexts, by gender |
Table III.A8.12 |
Engagement of relative rapid responders with creative thinking items across domain contexts, by gender |
Table III.A8.13 |
Engagement of no responders with creative thinking items across domain contexts, by gender |
Table III.A8.14 |
Engagement of rapid responders with creative thinking items across ideation processes, by socio-economic status |
Table III.A8.15 |
Engagement of relative rapid responders with creative thinking items across ideation processes, by socio-economic status |
Table III.A8.16 |
Engagement of no responders with creative thinking items across ideation processes, by socio-economic status |
Table III.A8.17 |
Engagement of rapid responders with creative thinking items across domain contexts, by socio-economic status |
Table III.A8.18 |
Engagement of relative rapid responders with creative thinking items across domain contexts, by socio-economic status |
Table III.A8.19 |
Engagement of no responders with creative thinking items across domain contexts, by socio-economic status |
References
[3] Baumert, J. and A. Demmrich (2001), “Test motivation in the assessment of student skills: The effects of incentives on motivation and performance”, European Journal of Psychology on Education, Vol. 16/3, pp. 441-462, https://doi.org/10.1007/bf03173192.
[7] Buchholz, J., M. Cignetti and M. Piacentini (2022), “Developing measures of engagement in PISA”, OECD Education Working Papers, https://doi.org/10.1787/19939019.
[4] Gneezy, U. et al. (2017), Measuring Success in Education: The Role of Effort on the Test Itself, National Bureau of Economic Research, Cambridge, MA., https://doi.org/10.3386/w24004.
[6] OECD (2023), Annex A8: How much effort do students put into the PISA test?, OECD Publishing, Paris, https://doi.org/10.1787/53f23881-en.
[8] OECD (2023), PISA 2022 Creative Thinking framework, OECD Publishing, Paris, https://doi.org/10.1787/471ae22e-en.
[10] OECD (2023), PISA 2022 Innovative Domain Test Design and Test Development, OECD Publishing, Paris, https://www.oecd.org/pisa/data/pisa2022technicalreport/PISA-2022-Technical-Report-Ch-18-PISA-Proficiency-Scale-Construction-Domains-Creative-Thinking.pdf.
[5] OECD (2020), Annex A9. A note about Spain in PISA 2018: Further analysis of Spain’s data by testing date (updated on 23 July 2020), OECD Publishing, Paris, https://www.oecd.org/pisa/PISA2018-AnnexA9-Spain.pdf.
[9] Paek, S. et al. (2021), “Is more time better for divergent thinking? A meta-analysis of the time-on-task effect on divergent thinking”, Thinking Skills and Creativity, Vol. 41, pp. 1-15, https://doi.org/10.1016/j.tsc.2021.100894.
[1] Wise, S. and C. DeMars (2010), “Examinee noneffort and the validity of program assessment results”, Educational Assessment, Vol. 15/1, pp. 27-41, https://doi.org/10.1080/10627191003673216.
[2] Wise, S. and C. DeMars (2005), “Low examinee effort in low-stakes assessment: Problems and potential solutions”, Educational Assessment, Vol. 10/1, pp. 1-17, https://doi.org/10.1207/s15326977ea1001_1.
Note
← 1. Three items were excluded from the analysis of rapid and relative rapid responding behaviours based on the response type required, as it was considered that some students could reasonably respond to the items within a short period of time. For two of these three items, students were able to select a response to a previous question akin to a multiple-choice mechanism. For the other remaining excluded item, students were asked to generate a very short written artefact. These three items are the same as those that were excluded from the post data-adjudication treatment of invalidating responses submitted within 15 seconds as described in the PISA 2022 Technical Report (OECD, 2023[10]).