This chapter uses timing indicators to estimate and analyse disengagement with the Survey of Adult Skills assessment. The analysis shows that the incidence of disengagement varies substantially across countries. Respondents with low levels of education and low familiarity with information and communications technology (ICT) are more likely to be disengaged, and respondents are more likely to be disengaged with items that appear in the second module of the assessment. Disengagement strongly reduces the probability of giving a correct answer, which results in disengaged individuals performing worse in the assessment. This relationship holds at both individual and country levels.
Beyond Proficiency
Chapter 5. Measuring disengagement in the Survey of Adult Skills
Abstract
Introduction
Using log-file data to explore disengagement with the Survey of Adult Skills, a product of the Programme for the International Assessment of Adult Competencies (PIAAC) (hereafter referred to as “PIAAC”) assessment, this chapter analyses the distribution of a measure of disengagement based on time on task across test items, countries and respondents – similar to the approach of Goldhammer et al. (2016[1]). It also explores the correlation of this measure with indicators that capture other aspects of disengagement.
What is disengagement
For the purposes of this chapter, participants in an assessment are considered disengaged when they do not devote sufficient effort (or take sufficient care) in responding to test questions to ensure that test results fairly represent their proficiency. Some variation is expected in respondents’ efforts to answer test items. However, while comparisons of proficiency across individuals are inevitably influenced by differences in the amount of effort exerted, they are unambiguously biased once participants are disengaged. It is difficult to provide a rigorous definition of “sufficient or reasonable effort”. In practice, the choice of definition is necessarily driven by considerations related to what can be reliably measured in order to make valid comparisons.
In this chapter, disengagement is analysed in binary terms: respondents are regarded as either engaged or disengaged, with no attempt made to measure the degree or intensity of disengagement.1 One consequence of this choice is that no account is taken of variation in the intensity of disengagement, from the extreme case of a respondent who refuses to carry on and skips all remaining items to that of a respondent who skips a specific item because answering is seen as taking too much time, even though he/she has a good chance of providing a correct answer.
It is difficult to operationalise the notion of disengagement. First of all, effort has a subjective dimension, and individuals’ perceptions of how much effort a task takes will vary. Inferring effort levels from the observation of certain actions undertaken by respondents requires taking these actions as indicators of effort, without taking into account how they are individually perceived. For instance, playful respondents could view the PIAAC assessment as a kind of game and give their best, while actually enjoying the whole process. Nonetheless, in comparing effort across individuals, it is necessary to equate effort with some instrumental actions and processes.
Second, and more importantly, instrumental actions in the course of a cognitive assessment are difficult to observe because they are mental in nature. From the point of view of the respondent, dealing with an item is a succession of choices and a sequence of actions, generally starting with reading the question. Each sequence of action is associated with a duration cost (time spent), an effort cost and a change in the probability of providing a correct answer to the item. At any point in a respondent’s deliberations prior to providing a response, he/she will take into account the costs (in time and effort) and benefits (demonstrating his/her “true” ability) of pursuing any further action, compared to the costs associated with moving to the next item or withdrawing from the assessment.
Personal commitment and cost of effort vary across individuals. Personal commitment will depend strongly on cultural factors (how interviewers and respondents interact and the respondent’s own desire to perform well), fatigue and cost of effort on environmental factors (such as distractions), and levels of effort on item characteristics. One of the virtues of an adaptive assessment, as in PIAAC, is its positive impact on engagement. Adaptive allocation of items alleviates the detrimental effects of fatigue, by limiting the frequency of situations in which the participant has to struggle with difficult items for which he/she is unlikely to get the correct answer.
An alternative to action-based measurement would be to consider self-reported measures of effort. This type of measure is not available in PIAAC, and it is difficult to compare these types of measures across countries and individuals, due to their subjective nature. In particular, if it is true that differences in disengagement across countries are driven by differences in perceptions of what constitutes sufficient or reasonable effort, we might expect self-assessed effort scales to be biased accordingly.
Building on the analysis undertaken in Chapter 2 on time on task, this chapter examines the question of disengagement. Time on task does not provide information on how respondents are using their time and thus cannot distinguish between time spent on task-related actions and time spent on activities or actions unrelated to answering a test item. Nonetheless, it can be safely assumed that a respondent who decides to spend more time on an item is, at the very least, not decreasing the effort exerted to answer an item and may even be increasing the effort.
Why disengagement matters
In PIAAC, disengagement may arise simply because of the low-stakes nature of the assessment. Unlike exams or competitions, performance in PIAAC has no consequences for individual respondents and is not related to any kind of incentive (reward or punishment) to exert high levels of effort. In addition, participants do not receive any feedback about their performance, either during the test or on completion. Nonetheless, by agreeing to participate in PIAAC, respondents can be regarded as having entered into some kind of implicit contract to make a minimum effort during the assessment. As participation in PIAAC is not obligatory, respondents must be sufficiently motivated to agree to devote a fair amount of time to the assessment and hence to make a reasonable effort to respond seriously to the various questions.
Interviewers play a major role in gaining the agreement of respondents and ensuring that participants take it seriously. From this point of view, participants in PIAAC start the assessment with a reasonably high level of personal commitment, and disengagement occurs once the cost of participation in time and effort starts to be deemed too high.
Respondents’ disengagement matters mostly because it is a source of undesirable variation in estimates of proficiency. Disengagement may mean that respondents do not demonstrate their true level of proficiency, which will affect the validity of inferences that can be made from the assessment. In addition, different levels of disengagement between subgroups within countries and between countries may reduce the validity of comparisons.
However, the relationship between disengagement and performance is a complex question that remains beyond the scope of this chapter for a number of reasons. First, disengagement in PIAAC can only be measured with indicators that partially capture the spectrum of disengagement. As a result, any causal impact of a disengagement indicator on performance would only deliver a partial answer. Second, disengagement and low performance are linked in a complex relationship that cannot be easily disentangled. Third, PIAAC proficiency scores already partially account for disengagement by ignoring (in the underlying model) items on which respondents spent less than five seconds without giving an answer. In particular, following the literature on response latencies (Wise and Kong, 2005[2]; Wise and DeMars, 2005[3]), it was decided that instances in which the interaction between the respondents and the item was very brief are not informative, so they are coded as non-reached items rather than missing items. Nonetheless, respondents in such situations were also strongly disengaged, making almost no effort to give a correct answer to the item. As a result, PIAAC proficiency scores are computed on a sample that already excludes the most extreme cases of item disengagement.
The degree to which external factors, such as motivation, influence the results of low-stakes assessments is an active and growing area of research. One approach consists of comparing the performance of similar respondents in high- and low-stakes testing situations. Using an assessment similar to the Programme for International Student Assessment (PISA), Gneezy et al. (2017[4]) conducted experiments in schools in Shanghai and the United States. They showed that a significant proportion of the gap observed between the two countries in official PISA rankings disappears when students are offered monetary incentives.
Another approach consists of decomposing test scores into two components, one capturing initial performance and the other capturing decline in performance during the test (Borghans and Schils, 2012[5]). Initial performance is often interpreted as the true ability of the individual, as it is assumed to not be contaminated by fatigue effects or by decrease in motivation. Decline in performance during the test is often interpreted as a non-cognitive skill, such as the ability of respondent to remain motivated, or to endure fatigue (Borgonovi and Biecek, 2016[6]; Zamarro, Hitt and Mendez, 2016[7]; Anghel and Balart, 2017[8]; Balart, Oosterveen and Webbink, 2015[9]; Brunello, Crema and Rocco, 2018[10]).
Measuring disengagement at the item level
Rapid item skipping
The simplest indicator of disengagement is rapid skipping of an item. Respondents who spend less than a very short amount of time on an item (i.e. do not give themselves enough time to even read and take full note of the item) can be considered to be disengaged. The analysis in this chapter is based on a threshold of five seconds, below which respondents are considered disengaged. This ensures consistency with the PIAAC rule about rapid omission. However, no account is taken of whether or not the respondent provided an answer.
Table 5.1 shows the proportion of item interactions for each country in which respondents spent less than five seconds and, among them, the proportion with an answer and the proportion with a correct answer. It thus gives a first account of disengagement across countries. The proportion of items that are rapidly skipped varies from 0.7% in Norway to 4% in Spain and Italy. For most countries, respondents who spent less than five seconds on an item did so without giving an answer. The proportion of these items that receive an answer is generally below 5%. This confirms that, in the overwhelming variety of cases, item interactions that last less than five seconds are not productive.
Table 5.1. Rapid item skipping across countries
|
Proportion of all item with time on task below 5 seconds |
Among items with time on task below 5 seconds: |
|
---|---|---|---|
Proportion with an answer |
Proportion with a correct answer |
||
Austria |
0.8% |
1.5% |
0.2% |
Germany |
0.8% |
3.8% |
0.3% |
Denmark |
2.1% |
0.1% |
0.0% |
Belgium (Flanders) |
1.5% |
2.4% |
1.2% |
Estonia |
0.1% |
||
Spain |
4.0% |
34.2% |
21.0% |
Finland |
0.8% |
0.9% |
0.1% |
France |
2.2% |
1.5% |
0.3% |
England / Northern Ireland (United Kingdom) |
1.7% |
3.1% |
0.5% |
Ireland |
2.2% |
1.5% |
0.3% |
Italy |
4.0% |
5.7% |
2.7% |
Netherlands |
0.9% |
4.1% |
1.7% |
Norway |
0.7% |
3.4% |
0.0% |
Poland |
2.5% |
6.2% |
2.3% |
Slovak Republic |
1.5% |
16.0% |
10.7% |
United States |
1.5% |
8.8% |
3.3% |
Note: In Estonia, the number of items answered in less than 5 seconds is too small to perform an analysis.
Source: OECD (2017[11]), Programme for the International Assessment of Adult Competencies (PIAAC), Log Files, http://dx.doi.org/10.4232/1.12955.
However, two countries stand out as exceptions. In the Slovak Republic, 16% of these items were answered. The corresponding proportion was even higher in Spain, where it reaches 34%. In both countries, the proportion of correct answers is similarly high, with two-thirds of these answers being correct. Even though some items feature a multiple-choice format that allows for random guessing, this rate is too high to be plausible.
This phenomenon of rapid correct answers, which is restricted to these two countries, is hard to explain. In particular, data from Spain feature both a high rate of rapid skipping and a high rate of correct answers. This combination would be problematic in the analysis that follows. As a result, data from Spain are excluded from the analysis conducted in the rest of this chapter. Spain also displays a rate of rapid skipping without answers, suggesting that disengagement in Spain is among the highest in the sample of countries.
Rapid item skipping is informative, but it fails to take into account less acute forms of item disengagement. It intends to measure the quasi-absence of interaction (and consequently the quasi-absence of effort) between respondent and item. However, disengagement occurs not once the effort is deemed non-existent, but once it is deemed insufficient. Rapid item skipping, thus, does not capture the range of disengaged items that falls in between these extremes.
T-disengagement
A more refined but less strict concept of disengagement is to see it as a situation in which the respondent has not spent enough time on an item to provide a correct answer. Operationalisation of this definition requires defining the minimum time necessary to solve an item without resorting to random guessing.
Goldhammer et al. (2016[1]) use the relationship between the likelihood of giving a correct answer and time on task to compute an item-specific threshold below which respondents can be reasonably assumed to not have seriously attempted to solve the item (in which case they are classified as disengaged). This relationship generally starts from a zero probability of success and remains flat up to some threshold at which the probability of success starts to rise.
This chapter adopts a similar approach, adopting the term T-disengagement to represent situations where a respondent spends less time than an item-specific threshold. For each item, it uses this empirical relationship observed in all countries of the sample together (excluding Spain, as explained above). These thresholds will be the same across all countries. For each item, this minimal time is constrained to be at least five seconds. T‑disengagement is intended here to extend rapid item skipping. After excluding Spain, the occurrence of correct answers in less than five seconds is limited to four countries (Italy, the Netherlands, Poland and the Slovak Republic) and remains a very rare event. In all the other countries, figures remain anecdotic and justify the statement that respondents who spent less than five seconds are disengaged.
Even though the sample sizes for each item are reasonably large (around 15 000 on average), data at the bottom tail of the distribution of time on task (where minimal time to solve will be found) can be sparse for some items. In order to smooth the relationship between time on task and success, the following procedure is applied: 1) to compute the probability of success at time x, observations with a time on task between x and x+10 are used; 2) if this subsample contains more than 200 observations, success on the item is modelled as a linear function of time; 3) if the subsample contains fewer than 200 observations, the probability of success is not estimated. The minimal time to solve an item will eventually be the smallest x (larger than five seconds) for which the estimated probability of success is higher than 10%.
Respondents who spent less than this minimal time may still have extracted enough information from the item to realise that they will not be able to find a solution. This could be the case, for instance, if they do not understand how the question and stimulus are related or if they do not understand the task they are required to do. This situation is referred to as rational skipping, because respondents have no reason to spend time on items that they know they will not be able to solve. T-disengagement may thus also capture situations in which effort is useless rather than insufficient.
Figure 5.1 plots the distributions of minimum time needed to solve an item and T‑disengagement rates for all numeracy, literacy and Problem Solving in Technology-Rich Environments (PSTRE) items. The time required varies between 5 seconds and 3 minutes.2 Based on these thresholds, items can be classified as either “short” or “long”. The shorter the item is, the closer disengaging with this item is to rapid item skipping. PSTRE items are much more time consuming than literacy and numeracy items, with a typical required time for solution of 1 minute. Most literacy and numeracy items can be solved in less than 30 seconds and a good proportion in less than 10 seconds. Only PSTRE items feature minimum times greater than 1 minute. T-disengagement rates vary between 0% and 30% for the most part but reach 60% for one literacy item.
The T-disengagement rate increases in close parallel with the minimum time needed to solve an item. Since the definition of this disengagement indicator is based on the minimum time, this is not all surprising. A respondent could spend 10 seconds on a short item without being considered T-disengaged and 10 seconds on a longer item and be classified as disengaged. Hence, for most items that can be solved in less than 20 seconds, the T-disengagement rate stays below 10%. Items that need more than 20 seconds to solve show higher and more variable rates of T-disengagement. For instance, items that need about 40 seconds to solve have disengagement rates varying between 10% and 25%, meaning that the T-disengagement depends on characteristics other than the time required to solve it, such as type of display, content or difficulty. Among long items, disengagement is more common in the literacy domain. These differences may be driven by booklet selections, as subsamples of respondents to which various items are allocated are not strictly comparable.
Figure 5.2 highlights the importance of module order for T-disengagement. As mentioned earlier, respondents can be assigned an item in the first or the second module, and this allocation is random. This figure plots the difference in disengagement for each item when the item is answered in the second module compared to the first one. When an item is in the second module, the probability that the respondent is disengaged with that item is between 1 and 10 percentage points higher than if the same item was in the first module. For short items (below 20 seconds), the difference is such that disengagement occurs twice as frequently in the second module. For longer items, this difference does not increase as fast and remains below 10 seconds, but it is still equivalent to a 50% increase between modules
This relationship offers support for the conclusion that T-disengagement represents a good indicator of lack of effort, since the increase in disengagement remains associated with very low success rates in disengaged items. Random allocation of respondents to items guarantees the absence of any selection effects. As a result, the difference between T-disengagement in the first and second module can be fully attributed to the fact of having undertaken the items in different modules (i.e., the difference captures a true “module effect”). This could be related to either fatigue or to a learning effect – more rapid identification of items that are likely to be too difficult for the respondent (rational skipping).
T-disengagement across countries
Figure 5.3 presents T-disengagement rates by country. Consistent with the rest of the report, the focus is on literacy and numeracy items. This is because PSTRE items have features that differentiate them from items in other two domains: there are fewer of them, they are longer and they have high disengagement rates. All respondents answer 20 literacy items and 20 numeracy items. Instead of plotting the average proportion of disengaged items, the choice was made to plot the proportion of respondents who disengaged on at least a given proportion of items. This choice is made to simplify the analysis and maintain the useful dichotomy between disengaged and non-disengaged respondents.
Figure 5.3 shows the proportions of the population who T-disengaged on at least 10% and at least 20% of items. Disengagement concerns respondents in all countries, but to a varying extent. Disengagement is much less frequent in northern European countries, such as Finland, Norway or the Netherlands. In these countries, about 8% of the sample disengage on at least 4 items out of 40. This proportion approaches 35% in Italy. The same differences between countries emerge when looking at more severe cases of disengagement, in which respondents disengage on at least 20% of items. The proportion drops below 5% in Finland, Norway and Netherlands, but it remains above 15% in Italy.
These rates give some indication of how PIAAC country scores might be affected by disengagement. They suggest that comparison of countries with low T-disengagement rates, such as Austria, Finland, Germany, the Netherlands and Norway, are probably more reliable than comparisons with countries characterised by higher rates of item-specific disengagement, as the much higher disengagement rates in Italy suggest that proficiency in these countries might be underestimated compared to others. This does not imply that proficiency estimates for disengaged respondents do not convey important and valuable information about them. But they are likely to contaminate the measurement of latent ability, as defined in the conceptual framework of the PIAAC assessment. In the end, a joint analysis of test scores and disengagement rates provide a more accurate and complete picture of the proficiency of respondents in participating countries. The remainder of the chapter further explores T-disengagement to assess the validity of this indicator.
T-disengagement and background characteristics
These important differences across countries could be driven by several factors, reflecting the manner in which respondents interact with items and determine their effort levels. This section describes the association of T-disengagement with individual background characteristics.
Figure 5.4 explores the relationship between disengaging on more than 10% of items and several background characteristics. The figures reported are averages of estimated coefficients across all available countries. These coefficients are estimated in a single ordinary least squares (OLS) regression model, meaning that each estimate takes into account the effect of all other covariates.
A first result is the absence of gender differences, with female and male respondents being equally likely to be T-disengaged in all countries.
The relationship between age and T-disengagement is also relatively weak on average. Young respondents are less likely to be disengaged, than respondents over age 25, while middle-aged respondents are slightly less likely to be disengaged, based on item-specific thresholds. The relationship between T-disengagement and age is highly country-specific. This suggests that biological factors, such as ability to concentrate or fatigue, while possibly explaining the relationship between age and disengagement, are not dominant. Most importantly, in England / Northern Ireland (United Kingdom), there is a steep decrease in T-disengagement with age, with the youngest respondents being 17 percentage points more likely to T-disengage.
Education levels are negatively associated with T-disengagement. This impact is sizeable with rates being, on average, 8.5 percentage points higher for respondents with less than secondary education than for respondents with tertiary attainment. In addition, the relationship is stable across countries, although its magnitude varies. The association between T-disengagement and education might be related to several factors. One reason may be that since respondents with higher education are, on average more proficient, they need to answer fewer items that are (from their point of view) relatively difficult, although the adaptive nature of the assessment partly corrects for this. Another reason could be that they are more accustomed to or have acquired more experience with testing and assessment environments. As a result, they may experience less fatigue than other respondents, even though they spend more time on the assessment (see Chapter 3). This would also suggest that fatigue is related to cognitive demand rather than test length. In addition, more highly educated respondents could also have a stronger sense of commitment to completing the assessment to the best of their ability. Nonetheless, as mentioned earlier, these differences could be driven by rational skipping as well. Less educated respondents may be more likely to not understand some questions or to be aware that they are unable to solve them.
Given that the assessment was taken on a computer, familiarity with information and communications technology (ICT) can plausibly affect respondents’ motivation, fatigue and engagement. The frequency of use of ICT at home is indeed strongly associated with T-disengagement. Respondents in the bottom quartile of the ICT-use index are on average 9 percentage points more likely to be T‑disengaged than those in the top quartile. This effect seems to be concentrated in the lowest quartile and has a straightforward interpretation. Respondents who are not familiar enough with computers, but successfully complete the ICT core and pass the computer-based assessment will have more difficulty undertaking the assessment on a computer than other respondents, due to their lack of familiarity with computers.
The presence of another person during the assessment is associated with an increase of 1.5 percentage points in the probability of disengagement. The presence of another person is an environmental factor that might increase the cost of effort, because respondents’ attention and focus on the assessment could potentially be distracted by communication with the other person. However, the estimated effect is quite small, suggesting that this potential source of disturbance did not play a significant role.
Disengagement is strongly associated with readiness to learn. The readiness-to-learn index is constructed on the basis of a set of questions about the respondent’s perception of himself/herself as a curious and perseverant individual (respondents are asked questions such as ‘Do you get to the bottom of difficult things?’). Respondents in the lowest quartile of this index are more likely to be disengaged than those in higher quartiles, by a margin of 8 percentage points. In so far as this index is associated with how respondents attach value to the search for a correct answer, this relationship exhibits another mechanism through which a respondent decides on the level of effort to exert. The less respondents value success, the lower level of effort they would accept. This association suggests that respondents choose effort levels rationally, by comparing the benefits of actions to the costs.
Respondents who do not engage in voluntary work, as those who agree that they do not have influence on the government are more likely to be T-disengaged. This association is not as high as the association with readiness to learn but it is not negligible. One important source of disengagement is insufficient commitment to the effort required for the assessment. This commitment is eventually the reason why respondents agree to participate in the survey. Although it is difficult to know what respondents are willing to accept, it is logical to assume that respondents with stronger ties to civic life would accept more.
Overall, the relationship between T-disengagement and respondent background variables seems to be in line with a simple model of how respondents choose their effort levels. Moreover, it suggests how disengagement might affect some important socio-demographic gaps. In particular, education proficiency gaps might be smaller than what is featured in PIAAC, and in some countries (such as England and Northern Ireland [United Kingdom]), age differences might be affected by varying levels of disengagement.
Further analysis of T-disengagement across countries
The question then arises of the importance of these background variables in shaping variations across countries and, more generally, of the sources of the differences in T‑disengagement rates.
Figure 5.5 shows country averages for three of the background factors that have the strongest association with T-disengagement (other than educational attainment): 1) the proportion of respondents who disagree that they have an influence on the government; 2) the proportion who belong to the bottom quartile of use of ICT at home; and 3) the proportion of respondents who belong to the bottom quartile of readiness to learn. For two of these factors, variations across countries are sizeable. The share of respondents who disagree that they have an influence on the government varies from 25% of the population (in Denmark) to 65% (in Italy). The proportion of respondents who fall in the bottom quarter of the readiness-to-learn index is lowest in Finland (less than 5%) and highest in the Netherlands (24%). The proportion of respondents who are in the bottom quartile of the ICT-use-at-home index features smaller variations. Country rankings on these variables do not seem to mirror T-disengagement rankings, with the notable exception of the proportion of respondents who disagree that they have an influence on government. This similarity suggests (but does not prove) that disengagement and this factor may be related.
Figure 5.6 plots raw T-disengagement rates along with the rate adjusted for all the factors positively associated with T-disengagement in Figure 5.4, with the exception of age.3 This adjusted rate is thus the predicted rate of T-disengagement for the subpopulation the least likely to be disengaged: a male with tertiary education who agrees that he has influence on government, participates in voluntary activity and belongs to the top quartile of the readiness-to-learn and ICT-use-at-home indices.
This adjustment typically decreases rates by 5 to 15 percentage points. Most surprisingly, there is a large decrease even in countries in which the raw rate is low. As a result, in Austria, Finland or Netherlands, the rate falls close to zero, while in Italy it remains in the 15% to 20% range. Consequently, while T-disengagement seems to be a matter of personal characteristics in the first group of countries, with a fringe of the population not likely to disengage, disengagement in the second group has an endemic component: even the subpopulation with characteristics associated with lower disengagement is likely to disengage.
Figure 5.7 shows T-disengagement across countries for respondents with low and high literacy levels. In all countries, a majority of respondents who score at Level 1 or below are disengaged. This proportion varies from almost 40% in Austria to up to 70% in Italy. There are two strong reasons for these high shares in all countries. Less able respondents are more often required to answer items that are difficult for them than respondents of high ability. Their propensity to T-disengage will thus increase because of accumulated fatigue and because they rationally skip more items than respondents of high ability. Moreover, disengaged items are items that were not successfully completed; as a result the estimated proficiency of disengaged respondents will be mechanically lower. Across the whole proficiency distribution, as measured in PIAAC, the lower end of the proficiency distribution is the most subject to disengagement bias.
The picture for high-proficiency respondents is strikingly different, featuring a pattern very similar to the one found in Figure 5.6. Among respondents with high proficiency in Austria, Finland and Norway, less than 2% are T-disengaged, while more than 10% are T-disengaged in the Slovak Republic. These differences cannot be explained by rational skipping. Rational skipping is mostly related to relative difficulty, and these rates are computed on a population with high proficiency.
Figure 5.8 plots T-disengagement rates against the literacy performance of the subsample on which the T-disengagement rate was computed. Once again, Finland on the one hand and Spain and Italy on the other stand apart. Finland features both high average literacy scores and low disengagement, while the high T-disengagement rates observed in Spain and Italy are associated with much lower literacy performance.
It is not possible to provide causal estimates of the impact of disengagement on literacy performance. Figure 5.8, however, suggests that it is possible to identify a cluster of countries that differ in terms of level of engagement (as measured by this particular indicator). This might serve as a first step in furthering understanding of the role that engagement plays in contributing to cross-country differences in proficiency in low-stakes assessments such as PIAAC.
Comparisons of T-disengagement and other indicators
The indicator described above is only one aspect of disengagement. PIAAC offers other possibilities that might help build a more detailed picture of disengagement variations across countries.
Table 5.2 presents a comparison across countries of T-disengagement with three other indicators. As mentioned earlier, fast item skipping is a more restrictive version of item disengagement based on time on task. The second column shows an indicator that summarises fast item skipping with the proportion of respondents who spent less than five seconds on at least 10% of items. The third indicator considers disengagement during the background questionnaire, rather than during the assessment. The last indicator is based on a question about respondents’ perception of the length of the assessment. This question comes from the observation module, which is completed by the interviewer right after the interview. In describing whether the respondent felt that the length of the assessment was reasonable or not, this question does not indicate disengagement as such but describes one of its potential sources. These three indicators are strong predictors of T-disengagement at the individual level. On average across countries, 97% of respondents who rapidly skip items are also T-disengaged, (compared to 23% for other respondents), 34% of those are among the fastest on Section I of the background questionnaire (compared to 25% for other respondents) and 36% of those thought that the assessment was too long (compared to 22% for other respondents).
Table 5.2. Comparisons of disengagement indicators across countries
|
Proportion who T‑disengaged on more than 10% of items, |
Proportion who spent less than 5 seconds on more than 10% of items |
Proportion among the 25% fastest on Section I of the background questionnaire |
Proportion who thought the assessment was too long |
---|---|---|---|---|
Italy |
33.4% |
13.2% |
41.1% |
47.8% |
Slovak Republic |
24.2% |
5.1% |
46.4% |
34.3% |
Poland |
23.3% |
8.0% |
28.0% |
46.5% |
France |
21.5% |
7.4% |
8.0% |
45.3% |
Ireland |
20.4% |
6.3% |
33.3% |
26.5% |
United States |
18.3% |
6.3% |
33.3% |
26.5% |
England / Northern Ireland (UK) |
17.8% |
6.3% |
15.9% |
23.0% |
Estonia |
17.0% |
6.0% |
2.0% |
26.4% |
Denmark |
14.5% |
6.0% |
4.1% |
14.0% |
Belgium (Flanders) |
14.0% |
4.0% |
25.0% |
15.6% |
Germany |
12.3% |
3.1% |
15.0% |
10.2% |
Austria |
10.3% |
2.2% |
9.4% |
21.3% |
Norway |
9.3% |
2.8% |
4.9% |
13.6% |
Netherlands |
9.3% |
2.8% |
12.7% |
15.8% |
Finland |
8.4% |
3.0% |
6.1% |
13.2% |
Note: For each column, the three highest values are highlighted in dark blue and the three lowest ones in light blue.
Source: OECD (2017[11]), Programme for the International Assessment of Adult Competencies (PIAAC), Log Files, http://dx.doi.org/10.4232/1.12955.
All these indicators exhibit large variations across countries and, most importantly, these variations are closely related to those found for T-disengagement. In Austria, Finland, the Netherlands and Norway, less than 3% of the sample belongs to the category of rapid skippers, while this proportion exceeds 13% in Italy and Spain. In Denmark and Norway, less than 5% are among the fastest on Section I of the background questionnaire, compared with more than 40% in Italy and the Slovak Republic. In all countries, at least a small minority of respondents found the assessment too long. The proportion remains among the lowest (below 14%) in Finland, Germany and Norway, but it is close to a majority in Italy and Poland. For all indicators, Italy and the Slovak Republic rank among the highest, while Norway and Finland rank at the bottom.
The similarity between the country rankings of these various indicators of disengagement shows how disengagement varies across countries. In particular, it confirms that disengagement seems to matter the most in Italy, Poland and the Slovak Republic.
While this chapter has focused so far on disengagement during the interview, potential participants who refuse to be surveyed can also be considered as disengaged, or more precisely, as refusing to engage. The relationship at the country level between the prevalence of disengagement during the interview and overall response rates might thus stem from a trade-off. Respondents who are at the margin of refusing to participate are also among those most likely to disengage during the survey. As a result, improving the response rate might also have the side-effect of increasing disengagement.
Figure 5.9 plots the relationship between T-disengagement and response rates at the country level. The lack of a clearly positive empirical relationship between the two rates proves that country-specific forces that determine both rates dominate the potential trade-off between response rates and the prevalence of disengagement among those who agree to participate. Nonetheless, this figure highlights valuable contrasts. For instance, among countries with low T-disengagement, only Finland and Norway have a satisfying response rate, while Austria, Denmark, Germany and the Netherlands are among the countries with the lowest response rates. France, Ireland and the United States have all high response rates and average T-disengagement rates. And, while the above discussion highlights the Slovak Republic and Italy among countries where disengagement is prevalent, only the Slovak Republic has a high response rate.
Conclusions
Comparisons of performance in a cognitive assessment can produce misleading conclusions if not all participants exert a sufficient amount of effort. Without sufficient effort, performance on the assessment will not accurately represent the underlying ability of the respondent. This problem is particularly relevant in the case of low-stakes assessments, where participants do not have external incentives to perform at their best.
The information contained in log files makes it possible to more precisely observe the behaviour of respondents in the course of such assessments and to construct indicators that can be used to proxy the amount of effort exerted.
This chapter presented and analysed various indicators that can be used to classify respondents as either engaged or disengaged with assessment items. The incidence of disengagement varies substantially across countries. In Norway, Finland, and the Netherlands, less than 10% of respondents are disengaged in at least 10% of items, compared to more than 20% in France, Ireland, Poland and Slovak Republic, and more than 30% in Italy.
Low levels of education and low familiarity with ICT (proxied by the frequency of performance of ICT-related tasks in everyday life) are positively associated with the probability of being disengaged in the course of the assessment. Similarly, respondents who report that they are generally less perseverant are also more likely to be disengaged.
Respondents are also more likely to be disengaged with items that appear in the second module of the assessment rather than in the first. This is consistent with the findings discussed in Chapter 3 that respondents tend to spend less time on items positioned in the second module.
Not surprisingly, disengagement strongly reduces the probability of giving a correct answer, which results in disengaged individuals performing worse in the assessment. This relationship holds at both the individual and the country level.
Indicators of disengagement are, therefore, very useful in two respects. On the one hand, disengagement provides important information on the respondent and can be used to proxy a variety of individual traits (such as conscientiousness or the ability to endure fatigue) that are likely to be important determinants of real-life economic and non-economic outcomes. On the other hand, these traits are not part of the skills cognitive assessments typically try to measure. As a result, the presence of disengagement (or any kind of difference in the effort respondents exert during an assessment) biases the results of assessments and can make comparison of results across countries problematic. In this sense, information on the extent of disengagement is a useful complement to actual estimates of proficiency that can be used to make more accurate comparisons across countries.
References
[8] Anghel, B. and P. Balart (2017), “Non-cognitive skills and individual earnings: New evidence from PIAAC”, SERIEs, Vol. 8/4, pp. 417-473, http://dx.doi.org/10.1007/s13209-017-0165-x.
[9] Balart, P., M. Oosterveen and D. Webbink (2015), “Test scores, noncognitive skills and economic growth”, IZA Discussion Paper, No. 9559, The Institute for the Study of Labor, Bonn, http://ftp.iza.org/dp9559.pdf.
[5] Borghans, L. and T. Schils (2012), The Leaning Tower of Pisa: Decomposing Achievement Test Scores into Cognitive and Noncognitive Components, http://www.sole-jole.org/13260.pdf.
[6] Borgonovi, F. and P. Biecek (2016), “An international comparison of students’ ability to endure fatigue and maintain motivation during a low-stakes test”, Learning and Individual Differences, Vol. 49, pp. 128-137, http://dx.doi.org/10.1016/j.lindif.2016.06.001.
[10] Brunello, G., A. Crema and L. Rocco (2018), “Testing at length if it is cognitive or non-cognitive”, Discussion Paper Series, No. 11603, IZA, Bonn, http://ftp.iza.org/dp11603.pdf.
[4] Gneezy, U. et al. (2017), “Measuring success in education: The role of effort on the test itself”, NBER Working Paper, No. 24004, National Bureau of Economic Research, Cambridge, MA, http://dx.doi.org/10.3386/w24004.
[1] Goldhammer, F. et al. (2016), “Test-taking engagement in PIAAC”, OECD Education Working Papers, No. 133, OECD Publishing, Paris, http://dx.doi.org/10.1787/5jlzfl6fhxs2-en.
[11] OECD (2017), Programme for the International Assessment of Adult Competencies (PIAAC), Log Files, GESIS Data Archive, Cologne, http://dx.doi.org/10.4232/1.12955.
[12] OECD (2015), OECD Survey of Adult Skills (PIAAC) (Database 2012, 2015), http://www.oecd.org/skills/piaac/publicdataandanalysis/.
[3] Wise, S. and C. DeMars (2005), “Low examinee effort in low-stakes assessment: Problems and potential solutions”, Educational Assessment, Vol. 10/1, pp. 1-17, http://dx.doi.org/10.1207/s15326977ea1001_1.
[2] Wise, S. and X. Kong (2005), “Response time effort: A new measure of examinee motivation in computer-based tests”, Applied Measurement in Education, Vol. 18/2, pp. 163-183, http://dx.doi.org/10.1207/s15324818ame1802_2.
[7] Zamarro, G., C. Hitt and I. Mendez (2016), “When students don’t care: Reexamining international differences in achievement and non-cognitive skills”, EDRE Working Paper, No. 2016-18, SSRN, Rochester, NY, http://dx.doi.org/10.2139/ssrn.2857243.
Notes
← 1. In a sense, and with all the caveats discussed in previous chapters, time on task could be interpreted as a continuous measure of the effort respondents exert in solving the items and, therefore, as a measure of the degree of engagement.
← 2. Three minutes were required for two PSTRE items that are not shown in Figure 5.1.
← 3. Age is excluded here, because the effect of age is not homogeneous and the choice of a reference is therefore not natural. For instance, while old respondents are the least disengaged in England / Northern Ireland (United Kingdom), this is not true in all countries.