Presentation-mode effects in large-scale writing assessments
Introduction
We live in an era of ongoing digitalization. This issue affects education and educational assessment. In the Programme for International Student Assessment (PISA) 2015, computer-based assessment was the primary assessment mode for the first time (Jerrim, Micklewright, Heine, Salzer, & McKeown, 2018; OECD, 2017). More and more writing assessments have shifted (or plan to shift) to computer-based assessments as well (Applebee, 2011; Sandene et al., 2005). In the last few years, many studies have investigated whether, and in which way(s), computer-based writing is different from handwriting, and how these factors may affect the results of writing products (e.g. Chan, Bax, & Weir, 2018; Cheung, 2016; Horkay, Bennett, Allen, Kaplan, & Yan, 2006). Besides these writer-related differences between handwriting and computer-based writing, there may also be rater-based differences. Typewritten texts may be received, estimated, and—within the scope of writing assessment—scored differently than handwritten texts. This is a highly relevant point for assessment practice, especially if a mixture of writing modes is applied (computer-based and handwritten, either by choice of the writer or due to available resources in different test locations). For this case, the product-relevant writing processes as well as the rating and scoring should be sufficiently comparable. In addition, the transcription of paper-pencil-written essays is often considered in assessment practice to avoid additional costs because raters require more time to read essays that are not very legible due to poor handwriting and to ensure that ratings will not be influenced by the quality of the handwriting.
In this study, we addressed the question of whether handwritten essays are scored differently to computer-typed transcripts of the same essays in the context of a large-scale writing assessment which was part of a national educational study in Germany of students at the end of lower secondary education. Although the question of differences between ratings of handwritten and typed texts is not new—the initial studies go back to the 1960s—there are hardly any studies that have investigated these potential differences in standardized large-scale-assessments with intensely trained raters. In addition, we investigated different potential moderators of rating differences between handwritten and computer-typed text-versions that have already been detected by previous studies, and their interplay, which has not yet been investigated systematically.
Section snippets
Construct-irrelevant variance and halo effects
In large-scale educational assessments, a valid measurement of the intended construct is crucial because results are often used to diagnose the proficiency of students and can decide, for instance, about further schooling tracks or access to higher educational programs. A special threat to valid measurement is what Messick (1990, 1995) called construct-irrelevant variance, implying that, in addition to construct-relevant aspects, other aspects are also measured during the assessment process
Text sample
The essays scored in our study are a subsample of essays written in German by students at the end of lower secondary education (9th and 10th grade) in Germany; they are derived from a large-scale writing assessment study conducted in 2011 with 2996 students. Participating students visited schools providing general education; all school types (lower, middle, and higher track) were represented. The aim of the study was the generation of normed writing tasks to assess the attainment of the
Main effect of presentation mode
Paired-sample t-tests revealed a main effect of presentation mode with an advantage for computer-typed essay variants for both content quality (t(420) = 4.22; p < .001; dz = .21) and style quality (t(421) = 4.33; p < .001; dz = .21). For both variables, computer-typed variants were rated about 0.1 scale points better. Fig. 2 displays the means and confidence intervals for the conditions.
Moderation effects
In single moderation analyses for content quality ratings, we found a significant moderation effect for text
Discussion
In this study, we investigated whether trained raters scored the content-related and stylistic text quality of handwritten student essays from a large-scale assessment study at the end of lower secondary education higher or lower than computer-typed transcripts of these essays, and whether there are moderating effects of the presentation-mode effect by text quality, quality of handwriting, erroneousness (orthographic, grammatical, and punctuation errors), and text genre. We found a main effect
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Competing Interest
None.
Thomas Canz is postdoctoral researcher at the Faculty of Psychology at the FernUniversität in Hagen, Germany. His research projects focus on structure, development and measurement of reading and writing competencies.
References (41)
- et al.
The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores
Assessing Writing
(2018) - et al.
Researching the comparability of paper-based and computer-based delivery in a high-stakes writing test
Assessing Writing
(2018) - et al.
The effect of variations in handwriting and print on evaluation of student essays
Assessing Writing
(2005) Issues in large-scale writing assessment
Journal of Writing Assessment
(2011)- et al.
Rater effects and rater inconsistency in large-scale assessment of writing ability
Paper Presented at the 2009 NCME Annual Meeting & Training Sessions
(2009) - et al.
Aspekte der Kodierung von Schreibaufgaben: Vergleich holistischer und analytischer Kodierungen unter besonderer Berücksichtigung der Interraterreliabilität [Aspects of the coding of writing tasks: A comparison of holistic and analytic codings in particular consideration of interrater reliability]
The influence of handwriting on assessment
Educational Research
(1970)Validitätsaspekte bei der Messung von Schreibkompetenzen [Validity aspects of measuring writing competencies]
(2015)A comparative study of paper-and-pen versus computer-delivered assessment modes on students’ writing quality: A Singapore study
The Asia-Pacific Education Researcher
(2016)Are women prejudiced against women?
Society
(1968)
Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP
The Journal of Technology, Learning and Assessment
Bias in authentic assessment
Diagnostique
The influence of presentation upon examination marks
11th Annual Conference of the Subject Centre for Information and Computer Sciences
PISA 2015: How big is the ‘mode effect’ and what has been done about it?
Oxford Review of Education
The measurement of observer agreement for categorical data
Biometrics
Beauty is talent: Task evaluation as a function of the performer’s physical attractiveness
Journal of Personality and Social Psychology
The Hamburg study of achievement in written composition. National report for the IEA international study of achievement in written composition
The nature and correlates of law school essay grades
Educational and Psychological Measurement
Composition errors and essay examination grades re-examined
American Educational Research Journal
Cited by (4)
Feedback on teachers' text assessment: Does it foster assessment accuracy and motivation?
2024, Zeitschrift fur Padagogische PsychologieComparing Handwriting Fluency in English Language Teaching Using Computer Vision Techniques
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Inclusion and equity as a paradigm shift for artificial intelligence in education
2022, Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology
Thomas Canz is postdoctoral researcher at the Faculty of Psychology at the FernUniversität in Hagen, Germany. His research projects focus on structure, development and measurement of reading and writing competencies.
Lars Hoffmann is postdoctoral researcher at the Institute for Educational Quality Improvement (IQB) in Berlin, Germany. His research focusses on the diagnostic competencies of teachers and on educational quality and educational monitoring in secondary school.
Renate Kania is master student in Psychology at the FernUniversität in Hagen, Germany. In addition she absolved a scientific traineeship in the section for Educational Psychology.