This report describes and analyses the recently released dataset of information extracted from the log files generated during the Survey of Adult Skills cognitive assessment. It explores the potential and shortcomings of these data, as well as pitfalls to avoid when working with them. This chapter explains the value and limitations of log files and discusses the information available from the Survey of Adult Skills log files, with particular focus on timing indicators.
Beyond Proficiency
Chapter 1. Overview
Abstract
This report describes the content and characteristics of process data generated in the course of the Survey of Adult Skills, a product of the Programme for the International Assessment of Adult Competencies (PIAAC) (hereafter referred to as “PIAAC”) and stored in log files, with examples of how these recently released data might be analysed. The potential of process data to provide information relevant to improving cognitive assessments and as a window into test-takers’ behaviour has been known for at least 30 years (Bunderson, Inouye and Olsen, 1989[1]), but until recently there has been little progress in their analysis. This is partly due to the complexity of the data and the lack of well documented data sets accessible to social science researchers in readily usable formats.
The public release of PIAAC log files, along with documentation and dedicated software to import them, follows the release of similar data from the OECD Programme for International Student Assessment (PISA). The goal is to contribute to these recent advances.
This chapter explains the value and limitations of process data and discusses the information available from PIAAC log files, focussing on timing indicators. Chapter 2 provides background on what log files are and how they complement traditional proficiency scores, describing specific features of the PIAAC log files and how design of the assessment affects interpretation of the information they contain. Chapter 3 presents a descriptive analysis of the indicators that can be extracted from the PIAAC log files, focussing on timing indicators. Chapter 4 examines how respondents allocate time to the different tasks they face in the course of the assessment. Chapter 5 discusses how the information contained in the log files can be used to construct indicators of test disengagement.
The value of log files…
Computer-based administration is increasingly the norm for large-scale assessments. This has been made possible by technological developments and the increasing familiarity of test-takers with computers and digital devices. Computer-based administration makes it more efficient to administer, manage and monitor surveys, and it also reduces the risk of human error.
More importantly, for the purposes of this report, computer-based administration makes it possible to collect a richer set of information on test-takers. In principle, it is possible to capture a complete record of communication between the user interface and the server. This means that it is possible to observe not only a respondent’s final answer to a specific assessment item, but also all interactions with the testing platform as he/she answers the question. Moreover, as all events recorded are associated with a timestamp, it is possible to compute the amount of time elapsed between these events.
Information about respondents’ actions in the course of the assessment is potentially useful to understand their cognitive strategies. In this sense, log files can be seen as a “window into students’ minds” (Greiff, Wüstenberg and Avvisati, 2015[2]). They can help policy makers, researchers and educators to better understand the underlying causes of low and high performance, and thus to design appropriate interventions.
Over and above the cognitive constructs that test items are designed to measure, the information contained in log files can be used to investigate aspects of respondents’ ability, attitudes and behaviour. For example, information on the amount of time respondents devote to the different items of the assessment has been used to compute various indicators of test-takers’ attitudes, such as motivation, engagement and perseverance. Performance in a test is always the combined outcome of the ability of the respondent and the effort exerted in the course of the assessment. In low-stakes assessments, such as PIAAC and PISA, information on the motivation and engagement of respondents is essential for interpreting differences in observed performance.
The analysis of log files, therefore, offers considerable promise in terms of enriching the information obtained in large-scale assessments. In particular, it will help to develop a more nuanced and more accurate picture of respondents’ skills. These insights will help to improve the design of assessments and, ultimately, more effective training and learning programmes.
… and their limitations
The analytical potential of log files has only recently begun to be fully appreciated and exploited, although it was anticipated 30 years ago by Bunderson, Inouye and Olsen (1989[1]). Existing log files are usually unformatted files that need to be decrypted before being used for statistical analysis. They are not easy to analyse and should be seen as a useful but incidental by-product of the introduction of computer-based test delivery.
As log files are records of the interactions between respondents and survey items, interpretation of the information contained in the files is necessarily item-dependent. This is further complicated by the fact that existing log files contain only a subset of the respondent-computer interactions and the choice of which information to record was usually not informed by considerations about the usefulness of the data for subsequent analysis. Moreover, many of the actions that respondents undertake while solving an assessment item cannot be recorded in log files (e.g. notes taken on a piece of paper or mental reasoning). Some of this information could be collected by using devices such as webcams or eye-tracking devices, but this has not yet been done on a large scale.
Several conclusions flow from the above. First, researchers using log files need to have detailed knowledge of the characteristics of test items, including response formats, context and possibly content. Second, the item-dependent nature of test-taker/item interactions means that the same indicator can be interpreted differently for different items, so generalisations across multiple items are not straightforward. Third, the analytical utility of log files could be increased if users of the data were involved in: 1) defining the information to be captured in the files; and 2) deciding what derived variables or indicators should be included in user-accessible files for public and scientific use. In future, it will be important to consider the potential of log files to help understand cognitive processes and test-taker behaviour in the process of item design.
Information available from PIAAC log files
This report focuses on the analysis of timing indicators, which are available for all literacy and numeracy items and are easy to interpret consistently across different items. Other recent research papers based on the analysis of data from log files have examined the processes of solution of test items. These analyses tend to be highly item-specific, due to the individual nature of interactions between test-takers and specific items in the case of complex interactive items, such as those in the assessment of Problem Solving in Technology-Rich Environments (PSTRE). Recent attempts to analyse log files from PSTRE items using techniques borrowed from text mining and natural language processing include He and von Davier (2015[3]; 2016[4]) and He, Borgonovi and Paccagnella (forthcoming[5]).
In this report, the analysis concentrates on three indicators: 1) time on task (the total time spent on an item by the respondent); 2) time to first interaction (the time elapsed between the moment when an item is presented to the respondent and the moment at which he/she first interacts with the testing platform; and 3) time since last action (the time elapsed between the respondent’s last interaction with the platform, typically inserting the answer, and the moment at which he/she and moves on to the next item).
Differences in timing indicators across countries and respondents
A first important finding of this report is the cross-country variation in the amount of time respondents spent on the PIAAC assessment. Respondents in Norway, Germany, Finland, and Austria took the longest time to complete the literacy and numeracy assessment (about 50 minutes on average). In Spain, Italy, Slovak Republic, England / Northern Ireland (United Kingdom) and Ireland, respondents spent about 40 minutes on average. A similar picture emerges when looking at the other timing indicators.
At the country level, the overall time spent on the assessment is positively correlated with average performance and negatively correlated with the incidence of missing answers.
At the individual level, time spent on the assessment tends to increase with the age and education level of respondents, despite the fact that older individuals also display a higher propensity to skip items. Gender differences are relatively small, with women spending about one minute less than men to complete the literacy and numeracy assessment.
Respondents reporting greater familiarity with information and communications technology (ICT) tend to complete the assessment more rapidly than others, but the difference disappears after controlling for other observable characteristics. Familiarity with ICT is also associated with a shorter time to first interaction and a longer time since last action.
How respondents allocate time to different items
The time spent on different items is closely related to the intrinsic characteristics of items, most notably item difficulty.
Respondents devoted a significantly smaller amount of time to items administered in the second half of the assessment. This was accompanied by an increase in the proportion of missing answers and a decrease in performance, suggesting that the decrease in time on task is probably due to fatigue or disengagement.
Respondents appear to allocate time to items in a rational way. They tend to spend the most time on items that are challenging but feasible (for which the ex ante individual probability of giving a correct answer is close to 50%), while spending little time on items that, in relation to their estimated proficiency, are very easy or very difficult.
The analysis also shows that spending more time on an item increases the probability of giving a correct answer, although at a declining rate.
Log files can be used to capture respondents’ disengagement
Timing information can be used to construct indicators of disengagement. Respondents can be considered disengaged with an item if they spend too little time on it (on the basis of item-specific time thresholds). In such situations, it can be assumed that the respondent has not even devoted the effort necessary to understand the item and has skipped it without trying to determine if he/she was in a position to give a correct answer.
Disengagement may occur because PIAAC is a low-stakes assessment, and respondents do not have a strong incentive to perform at their best during the test. In assessments such as PIAAC or PISA, disengagement is an undesirable phenomenon, because it can introduce variation in estimated proficiency that is unrelated to the cognitive skills that the surveys intend to measure.
At the same time, disengagement may be associated with respondents’ attitudes or intrinsic motivation, which may well be related to important outcomes in real life. A joint analysis of disengagement and actual performance helps to better interpret the results of the survey and to perform more meaningful comparisons across different countries or different socio-demographic groups.
Disengagement varies across countries and socio-demographic groups
The incidence of disengagement varies substantially across countries. In Finland, the Netherlands and Norway, less than 10% of respondents are disengaged in relation to at least 10% of the items, compared to more than 20% in France, Ireland, Poland and the Slovak Republic, and more than 30% in Italy.
Disengagement is more likely to be observed on items presented in the second module of the assessment. This is consistent with the analysis of time allocation to different items. Adults with low levels of education and adults who are less familiar with ICT are more likely to become disengaged in the course of the assessment.
Moving forward
Research using log files is still in its infancy. PIAAC was the first large-scale international assessment delivered primarily on computers, and the information available from the PIAAC log files has already been used in a number of analyses. It has contributed to understanding what can be drawn from this type of data and aided in exploring substantive issues, such as test-engagement and respondents’ cognitive strategies. However, current PIAAC log files are, to a large extent, an accidental by‑product of the computer-testing platform. Neither the items nor the information stored in log files were designed with a view to maximising the analytical potential of the information collected. As a result, analysis of log files is often cumbersome and item‑specific, and the information they contain often lends itself to multiple interpretations.
The release of PIAAC log files has sparked a lot of interest on the part of both researchers and policy makers, and the LogDataAnalyzer (an instrument for processing data in PIAAC) has greatly contributed by facilitating access to the data.1 By capitalising on the lessons learned from these data and the results of this report, future large-scale assessments will likely be able to improve their design to maximise the research potential of log files.
Item design plays a crucial role in maximising the potential of log files. More interactive items, for instance, offer more possibilities to observe and record a variety of respondent-computer interactions. For the data to be interpretable without ambiguity, it is important to prespecify theoretical constructs or competing theoretical hypotheses that log files will be able to measure or test. In particular, it should be made clear whether the information recorded in log files is used to better measure the underlying cognitive construct (such as proficiency in literacy, numeracy, or problem solving), or whether it can be used as a proxy for other dimensions of respondents’ skills (which might include personality traits or attitudes).
Some improvements are relatively easy to achieve. Even if item design remains constant, potentially useful information that is currently not available could be recorded in future. For instance, it would be useful to track the input in text fields. Even without specifying the exact content, information on the insertion or deletion of characters would provide useful insights on the approaches followed by test-takers. Similarly, for multiple-choice items, it would be useful to track how many times respondents have checked a box (and which one) and whether they changed their mind before confirming the final answer.
It would be also possible to rethink the derived variables to be released in public-use files. For example, the analysis presented in this report shows that “time to first interaction” is largely dependent on item content, which limits its usefulness. On the other hand, it would be helpful to add “item position” to the public database to facilitate analysis. Prior to deciding on the content of the PIAAC public-use files for the second cycle of the study, it would be valuable to have experts review the current variables and suggest new ones, where relevant.
References
[1] Bunderson, C., D. Inouye and J. Olsen (1989), “The four generations of computerized educational measurement.”, in Educational Measurement, 3rd ed., American Council on Education.
[2] Greiff, S., S. Wüstenberg and F. Avvisati (2015), “Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving”, Computers & Education, Vol. 91, pp. 92-105, http://dx.doi.org/10.1016/J.COMPEDU.2015.10.018.
[5] He, Q., F. Borgonovi and M. Paccagnella (forthcoming), “Using process data to understand adults’ problem-solving behaviours in PIAAC: Identifying generalised patterns across multiple tasks with sequence mining”, OECD Education Working Papers, OECD Publishing, Paris.
[3] He, Q. and M. Von Davier (2015), “Identifying feature sequences from process data in problem-solving items with n-grams”, in van der Ark, L. et al. (eds.), Quantitative Psychology Research The 79th Annual Meeting of the Psychometric Society, Springer, New York, NY, http://dx.doi.org/10.1007/978-3-319-19977-1_13.
[4] He, Q. and M. von Davier (2016), “Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment”, in Rosen, Y., S. Ferrara and M. Mosharraf (eds.), Handbook of Research on Technology Tools for Real-World Skill Development, http://dx.doi.org/10.4018/978-1-4666-9441-5.ch029.
Note
← 1. For more information, see Annex A and https://tba.dipf.de/en/projects/logdataanalyzer.