One of the most important ways that assessments can function positively in education systems is by signalling the competencies that matter and illustrating the types of performances we want students to master. Because what we choose to assess inevitably ends up being taught in classrooms, assessing what matters – and doing it well – should be a priority for education policy. This volume makes the case for pursuing innovation in assessment in terms of the types of educational outcomes we should assess, the way we design tasks – capitalising on technology to generate rich and meaningful sources of data – and the processes required to ensure that assessments are valid given their intended use.
Innovating Assessments to Measure and Support Complex Skills
Executive summary
Assessments should measure what matters, not just what is easy to measure
What is worth knowing, doing and being has been subject to an intense debate over recent decades. Educational stakeholders agree that we need to support the development of complex cognitive and socio-cognitive constructs (or “21st century competencies”). Although frameworks describing these skills share similarities, translating this vision into practice requires aligning curriculum, pedagogy and assessment. Assessment can drive this alignment but challenges include defining constructs and learning progressions, developing tasks that elicit valid evidence and defining suitable models to interpret and report evidence.
Next-generation assessments should enable students to demonstrate what they can do in authentic contexts and evaluate how students learn new things
Better assessing 21st century competencies requires “next-generation assessments”. Insights from the learning sciences suggest several assessment design innovations that align with this goal including: using extended performance tasks with “low floors, high ceilings”; situating assessments in authentic contexts of practice; including opportunities for exploration, discovery and invention; and including feedback and learning scaffolds. During assessment, students should be given opportunities to engage in the types of learning, decision making and problem solving engaged by practitioners in the real world.
Because 21st century competencies are strongly intertwined in practice, creating separate assessments for individual competencies may not be a productive strategy. Decisions about what to assess might be better guided by three interrelated considerations: 1) identifying a cluster of relevant activities that require students to engage in learning, problem solving and decision making; 2) identifying the context of practice and the disciplinary or cross-disciplinary knowledge required in that context; and 3) deciding whether to integrate affordances to enable students to work collaboratively or independently.
Innovation is necessary across all phases of assessment design
Assessment design is always an exercise in design science: tasks and interpretation methods must be anchored to a well-defined theoretical framework for assessments to generate valid inferences. This is especially the case for next-generation assessments of complex skills. Valid inferences about students’ capacity to engage in complex types of problem solving and learn new things should combine top-down (justified by theory) and bottom-up (visible in data) arguments and evidence. This requires close collaboration among potential users of assessment, domain experts, psychometricians, task designers, software designers and user interface (UI) experts from the outset.
Digital technologies vastly expand the assessment designer’s toolbox, but new and better measurement models are needed
Digital technologies enable new and innovative task formats (including interactive and immersive problems and environments), test features (including adaptivity and affordances for learning) and potential sources of evidence (including work products or solutions, as well as a wide array of process data capturing student behaviours and processes). Although these new sources of now data are relatively “easy” to obtain from technology-enhanced assessments, existing psychometric models do not handle the complexity of these types of data well. New measurement models are needed, especially at scale – for example, exploring “hybrid” measurement solutions that incorporate one measurement model within another.
Intelligent Tutoring Systems (ITS) that provide students with dynamic tasks, interactivity and feedback might provide useful inspiration for how to develop choice-rich tasks and innovative scoring methods. Many ITS have made advances using artificial intelligence (AI)-based technologies, such as natural language processing, to provide intelligent feedback to learners, adapt content in response to their actions and evaluate what they know and can do. While learning analytics methods are increasingly intersecting with educational measurement, gaps remain to be bridged between both fields in order to use these new methods in a way that truly benefits the users of assessment.
Next-generation assessments require careful validation through principled design processes and data collection and evaluation
Complex constructs are inevitably shaped by cultural norms and expectations. In large-scale assessment, generating valid evidence through complex tasks must be balanced with the need to achieve score comparability. New issues that are specific to innovative digital assessments (e.g. the relationship between digital literacy and performance, potential biases in AI-based methodologies) must also be thoroughly evaluated. It is critical that evidence to support equivalence is established both through principled design processes and through dedicated empirical studies. Process data represent a valuable source of validity evidence concerning how individuals and different student groups engage with an assessment.
Next-generation assessments require intellectual, fiscal and political investment
Developing next-generation assessments will require the simultaneous investment of several types of capital: intellectual, involving different communities of experts in learning science, measurement science and data science to collaborate and solve conceptual and technical challenges; fiscal, to support the multidisciplinary teams required to design innovative assessments and bring promising examples to scale; and political, to buy into the vision to invest beyond the current possible and transform entrenched practices in assessment, and to assemble the fiscal capital required. International large-scale assessment programmes, like PISA, can play a pioneering role in bringing together these three capitals and innovating assessment at scale.