[30] AHELO (2014), Testing student and university performance globally: OECD’s AHELO, OECD, http://www.oecd.org/edu/skills-beyond-school/testingstudentanduniversityperformancegloballyoecdsahelo.htm.
[29] AHELO (2012), AHELO feasibility study interim report, OECD.
[14] Attali, Y. (2014), “A Ranking Method for Evaluating Constructed Responses”, Educational and Psychological Measurement, Vol. 74/5, pp. 795-808, https://doi.org/10.1177/0013164414527450.
[32] Bartram, D. et al. (2018), “ITC Guidelines for Translating and Adapting Tests (Second Edition)”, International Journal of Testing, Vol. 18/2, pp. 101-134, https://doi.org/10.1080/15305058.2017.1398166.
[7] Blömeke, S. et al. (2013), Modeling and measuring competencies in higher education: Tasks and challenges, Sense, Rotterdam, https://doi.org/10.1007/978-94-6091-867-4.
[18] Borowiec, K. and C. Castle (2019), “Using rater cognition to improve generalizability of an assessment of scientific argumentation”, Practical Assessment, Research and Evaluation, Vol. 24/1, https://doi.org/10.7275/ey9d-p954.
[20] Braun, H. (2019), “Performance assessment and standardization in higher education: A problematic conjunction?”, British Journal of Educational Psychology, Vol. 89/3, pp. 429-440, https://doi.org/10.1111/bjep.12274.
[17] Braun, H. et al. (2020), “Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation”, Frontiers in Education, Vol. 5, https://doi.org/10.3389/feduc.2020.00156.
[28] Cargas, S., S. Williams and M. Rosenberg (2017), “An approach to teaching critical thinking across disciplines using performance tasks with a common rubric”, Thinking Skills and Creativity, Vol. 26, pp. 24-37, https://doi.org/10.1016/j.tsc.2017.05.005.
[35] Eskola, S. (2004), “Untypical frequencies in translated language: A corpus-based study on a literary corpus of translated and non-translated Finnish”, in Translation universals: do they exist?.
[3] Geisinger, K. (1994), “Cross-Cultural Normative Assessment: Translation and Adaptation Issues Influencing the Normative Interpretation of Assessment Instruments”, Psychological Assessment, Vol. 6/4, p. 304, https://doi.org/10.1037/1040-3590.6.4.304.
[4] Hambleton, R. (2004), “Issues, designs, and technical guidelines for adapting tests into multiple languages and cultures”, in Adapting Educational and Psychological Tests for Cross-Cultural Assessment, Lawrence Erlbaum, https://doi.org/10.4324/9781410611758.
[13] Hyytinen, H. et al. (2015), “Problematising the equivalence of the test results of performance-based critical thinking tests for undergraduate students”, Studies in Educational Evaluation, Vol. 44, pp. 1-8, https://doi.org/10.1016/j.stueduc.2014.11.001.
[8] Hyytinen, H. and A. Toom (2019), “Developing a performance assessment task in the Finnish higher education context: Conceptual and empirical insights”, British Journal of Educational Psychology, Vol. 89/3, pp. 551-563, https://doi.org/10.1111/bjep.12283.
[33] Hyytinen, H. et al. (2021), “The dynamic relationship between response processes and self-regulation in critical thinking assessments”, Studies in Educational Evaluation, Vol. 71, p. 101090, https://doi.org/10.1016/j.stueduc.2021.101090.
[26] Klein, S. et al. (2007), “The collegiate learning assessment: Facts and fantasies”, Evaluation Review, Vol. 31/5, pp. 415-439, https://doi.org/10.1177/0193841X07303318.
[34] Leighton, J. (2017), Using Think-Aloud Interviews and Cognitive Labs in Educational Research, Oxford University Press, Oxford, https://doi.org/10.1093/acprof:oso/9780199372904.001.0001.
[22] Mauranen, A. (1993), “Cultural differences in academic discourse - problems of a linguistic and cultural minority”, in The Competent Intercultural Communicator: AFinLA Yearbook.
[10] McClelland, D. (1973), “Testing for competence rather than for “intelligence””, The American psychologist, Vol. 28/1, https://doi.org/10.1037/h0034092.
[16] Popham, W. (2003), Test Better, Teach Better: The Instructional Role of Assessment, ASCD, https://www.ascd.org/books/test-better-teach-better?variant=102088E4.
[21] Sahlberg, P. (2011), “Introduction: Yes We Can (Learn from Each Other)”, in FINNISH LESSONS: What can the world learn from educational change in Finland?.
[9] Shavelson, R. (2010), Measuring college learning responsibly: Accountability in a new era, Stanford University Press, https://www.sup.org/books/title/?id=16434.
[27] Shavelson, R. (2008), The collegiate learning assessment, Forum for the Future of Higher Education, https://www.researchgate.net/publication/271429276_The_collegiate_learning_assessment.
[19] Shavelson, R., G. Baxter and X. Gao (1993), “Sampling Variability of Performance Assessments”, Journal of Educational Measurement, Vol. 30/3, pp. 215-232, https://doi.org/10.1111/j.1745-3984.1993.tb00424.x.
[11] Shavelson, R., O. Zlatkin-Troitschanskaia and J. Mariño (2018), “International Performance Assessment of Learning in Higher Education (iPAL): Research and Development”, https://doi.org/10.1007/978-3-319-74338-7_10.
[15] Solano-Flores, G. (2012), Smarter Balanced Assessment Consortium: Translation accommodations framework for testing English language learners in mathematics, Smarter Balanced Assessment Consortium (SBAC), https://portal.smarterbalanced.org/library/en/translation-accommodations-framework-for-testing-english-language-learners-in-mathematics.pdf.
[25] Steedle, J. and M. Bradley (2012), Majors matter: Differential performance on a test of general college outcomes [Paper presentation], Annual Meeting of the American Educational Research Association, Vancouver, Canada.
[31] Tremblay, K., D. Lalancette and D. Roseveare (2012), “Assessment of Higher Education Learning Outcomes (AHELO) Feasibility Study”, Feasibility study report, Vol. 1, https://www.oecd.org/education/skills-beyond-school/AHELOFSReportVolume1.pdf (accessed on 1 August 2022).
[36] Tremblay, K., D. Lalancette and D. Roseveare (2012), Assessment of higher education learning outcomes feasibility study report: Design and implementation, OECD, Paris, http://hdl.voced.edu.au/10707/241317.
[24] Ursin, J. (2020), “Assessment in Higher Education (Finland)”, in Bloomsbury Education and Childhood Studies, https://doi.org/10.5040/9781350996489.0014.
[23] Ursin, J. et al. (2015), “Problematising the equivalence of the test results of performance-based critical thinking tests for undergraduate students”, Studies in Educational Evaluation, Vol. 44, pp. 1-8, https://doi.org/10.1016/j.stueduc.2014.11.001.
[37] Ursin, J. et al. (2021), Assessment of undergraduate students’ generic skills in Finland: Finding of the Kappas! Project (Report No. 2021: 31), Finnish Ministry of Education and Culture.
[5] Wolf, R., D. Zahner and R. Benjamin (2015), “Methodological challenges in international comparative post-secondary assessment programs: lessons learned and the road ahead”, Studies in Higher Education, Vol. 40/3, pp. 1-11, https://doi.org/10.1080/03075079.2015.1004239.
[2] Zahner, D. and A. Ciolfi (2018), “International Comparison of a Performance-Based Assessment in Higher Education”, in Olga Zlatkin-Troitschanskaia et al. (eds.), Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives, Springer, New York, https://doi.org/10.1007/978-3-319-74338-7_11.
[1] Zahner, D. and J. Steedle (2014), Evaluating performance task scoring comparability in an international testing programme [Paper presentation], The 2014 National Council on Measurement in Education, Philadelphia, PA.
[6] Zlatkin-Troitschanskaia, O., R. Shavelson and C. Kuhn (2015), “The international state of research on measurement of competency in higher education”, Studies in Higher Education, Vol. 40/3, pp. 393-411, https://doi.org/10.1080/03075079.2015.1004241.
[12] Zlatkin-Troitschanskaia, O. et al. (2019), “On the complementarity of holistic and analytic approaches to performance assessment scoring”, British Journal of Educational Psychology, Vol. 89/3, pp. 468-484, https://doi.org/10.1111/bjep.12286.