Computer-based administration of large-scale assessments makes it possible to collect a much richer set of information on test takers than pencil and paper tests. In principle, it is possible to record all interactions between the computer user interface on which the test is taken and a server.
This information about the actions undertaken in the course of the assessment can help policy makers, researchers and educators to better understand the cognitive strategies used by respondents and the underlying causes of low and high performance, and thus to design appropriate interventions.
The information contained in log files (also denoted as process data) can also be used to investigate aspects of respondents’ ability, attitudes and behaviour, over and above the cognitive constructs that test items are designed to measure. For example, timing information can be used as proxies of test-takers’ motivation, engagement and perseverance. As performance in a test is always the combined outcome of the ability of the respondent and the effort exerted in the course of the assessment, information on the motivation and engagement of respondents is essential for interpreting differences in observed performance, especially when respondents do not have any stakes in the assessment.
The analysis and interpretation of process data are, however, not straightforward. As log files are records of the interaction between respondents and items, interpretation of the information contained in log files is necessarily item-dependent. Moreover, existing log files contain only a subset of the respondent-computer interactions, and the choice of which information to record was usually not informed by considerations about the usefulness of the data for subsequent analysis. Finally, many of the actions that a respondent undertakes while solving an assessment item cannot be recorded in log files.
This report, based on data from the Survey of Adult Skills, a product of the Programme for the International Assessment of Adult Competencies (PIAAC), focuses on the analysis of timing indicators. These have the advantage of being available for all items and being easier to interpret in a consistent way across different items. The analysis concentrates on three indicators: 1) time on task (the total time spent on an item by the respondent); 2) time to first interaction (the time elapsed between the moment the item is presented to the respondent and the moment at which he/she first interacts with the testing platform); and 3) time since last action (the time elapsed between the respondent’s last interaction with the platform and the moment at which he/she moves on to the next item). The analysis is limited to the domains of literacy and numeracy because it is only in these domains that timing indicators can be safely generalised across multiple items. The domain of Problem Solving in Technology-Rich Environments provides richer information, because the items it contains are much more interactive, but interpretation of that information becomes then largely dependent on the content and context of specific items.
A first important finding of this report relates to the unexpectedly large cross-country differences in the amount of time respondents spent on the PIAAC assessment. Overall time spent on the assessment is positively correlated with average performance, and negatively correlated with the incidence of missing answers.
Large differences are also found at the individual level. Time spent on the assessment tends to increase with the age and the education level of respondents, despite the fact that older individuals also display a higher propensity to skip items. Gender differences are small. Respondents reporting greater familiarity with information and communications technology (ICT) tend to complete the assessment more rapidly, but the difference disappears after controlling for other observable characteristics. Familiarity with ICT is also associated with a shorter time to first interaction and a longer time since last action. Nonetheless, large differences persist between individuals with similar socio-demographic characteristics.
The time spent on different items is closely related to intrinsic item characteristics, most notably item difficulty. Respondents devoted a significantly smaller amount of time to items administered in the second half of the assessment. This was accompanied by an increase in the proportion of missing answers and a decrease in performance. Respondents tend to spend the most time on items that are challenging but feasible, while spending little time on items that, in relation to their estimated proficiency, are either very easy or very difficult.
Timing information can be used to construct indicators of disengagement. Respondents are considered as disengaged with an item if they spend too little time on it. In such situations, it can be assumed that the respondent has not even devoted the effort necessary to understand the item and has skipped it without even evaluating his/her chances of answering the item correctly. The incidence of disengagement varies substantially across countries. Disengagement is more likely to be observed in items presented in the second half of the assessment, consistent with the analysis of time allocation to different items. Adults with low levels of education and adults who are less familiar with ICT are more likely to become disengaged in the course of the assessment.
Research using log files is still in its infancy. PIAAC was the first large-scale international assessment delivered primarily on computers. The information available from PIAAC has been used in a number of analyses. It has contributed to the understanding of what can be drawn from this type of data, as well as aided in the exploration of substantive issues, such as test-engagement and respondents’ cognitive strategies.
By capitalising on the lessons learned from these data and the results of this report, future large-scale assessments will likely be able to improve their design and maximise the research potential of log files. The information contained in current log files will be useful in improving the design of new items. Test developers will strive to design interactive items that will enrich the content of future log files, greatly enhancing their analytical potential. It will be particularly important to prespecify theoretical constructs or competing theoretical hypotheses that could be measured or tested using the information recorded in log files. It should also be made clearer whether the purpose of log files is to better measure the underlying cognitive constructs, or whether they can be used to proxy for other dimensions of respondents’ skills, such as personality traits or attitudes.
Large-scale assessments have been often criticised for not taking into account the effort and motivation of test takers, and for being silent about policy actions that can help improve individual skills. Log files carry the analytical promises to improve large-scale assessments on both dimensions.