Presentation-mode effects in large-scale writing assessments

doi:10.1016/j.asw.2020.100470

Assessing Writing

Volume 45, July 2020, 100470

https://doi.org/10.1016/j.asw.2020.100470 Get rights and content

Highlights

•
Raters scored computer-typed transcripts higher than handwritten original essays.
•
Presentation-mode effects were found for content quality and style quality.
•
Text quality and grammatical erroneousness moderated the presentation-mode effects.
•
Presentation-mode effects differed between text genres.

Abstract

To ensure valid measurement in large-scale assessments, avoiding incorporating construct-irrelevant aspects is crucial. We investigated a potential source of construct-irrelevant variance, i.e. the presentation mode of essays (handwritten vs. computer-typed) and its influence on scoring. Further, we investigated whether the presentation-mode effect is moderated by text quality and legibility, as well as by orthographic, grammatical, and punctuation erroneousness, and whether the presentation mode differs over text genres. Rater-teams scored two versions (originally handwritten vs. computer-typed transcripts) of 430 essays written by students at the end of lower secondary education in Germany on content quality and style quality, as well as the considered moderators. Statistical analyses showed a main effect for presentation mode: Computer-typed transcripts were scored higher than handwritten essays. This effect was moderated by text quality and grammatical erroneousness. Furthermore, the strength of the presentation-mode effect differed by text genre. We argued that the strength as well as the direction of the presentation mode are guided by these and further factors.

Introduction

We live in an era of ongoing digitalization. This issue affects education and educational assessment. In the Programme for International Student Assessment (PISA) 2015, computer-based assessment was the primary assessment mode for the first time (Jerrim, Micklewright, Heine, Salzer, & McKeown, 2018; OECD, 2017). More and more writing assessments have shifted (or plan to shift) to computer-based assessments as well (Applebee, 2011; Sandene et al., 2005). In the last few years, many studies have investigated whether, and in which way(s), computer-based writing is different from handwriting, and how these factors may affect the results of writing products (e.g. Chan, Bax, & Weir, 2018; Cheung, 2016; Horkay, Bennett, Allen, Kaplan, & Yan, 2006). Besides these writer-related differences between handwriting and computer-based writing, there may also be rater-based differences. Typewritten texts may be received, estimated, and—within the scope of writing assessment—scored differently than handwritten texts. This is a highly relevant point for assessment practice, especially if a mixture of writing modes is applied (computer-based and handwritten, either by choice of the writer or due to available resources in different test locations). For this case, the product-relevant writing processes as well as the rating and scoring should be sufficiently comparable. In addition, the transcription of paper-pencil-written essays is often considered in assessment practice to avoid additional costs because raters require more time to read essays that are not very legible due to poor handwriting and to ensure that ratings will not be influenced by the quality of the handwriting.

In this study, we addressed the question of whether handwritten essays are scored differently to computer-typed transcripts of the same essays in the context of a large-scale writing assessment which was part of a national educational study in Germany of students at the end of lower secondary education. Although the question of differences between ratings of handwritten and typed texts is not new—the initial studies go back to the 1960s—there are hardly any studies that have investigated these potential differences in standardized large-scale-assessments with intensely trained raters. In addition, we investigated different potential moderators of rating differences between handwritten and computer-typed text-versions that have already been detected by previous studies, and their interplay, which has not yet been investigated systematically.

Section snippets

Construct-irrelevant variance and halo effects

In large-scale educational assessments, a valid measurement of the intended construct is crucial because results are often used to diagnose the proficiency of students and can decide, for instance, about further schooling tracks or access to higher educational programs. A special threat to valid measurement is what Messick (1990, 1995) called construct-irrelevant variance, implying that, in addition to construct-relevant aspects, other aspects are also measured during the assessment process

Text sample

The essays scored in our study are a subsample of essays written in German by students at the end of lower secondary education (9th and 10th grade) in Germany; they are derived from a large-scale writing assessment study conducted in 2011 with 2996 students. Participating students visited schools providing general education; all school types (lower, middle, and higher track) were represented. The aim of the study was the generation of normed writing tasks to assess the attainment of the

Main effect of presentation mode

Paired-sample t-tests revealed a main effect of presentation mode with an advantage for computer-typed essay variants for both content quality (t(420) = 4.22; p < .001; d_z = .21) and style quality (t(421) = 4.33; p < .001; d_z = .21). For both variables, computer-typed variants were rated about 0.1 scale points better. Fig. 2 displays the means and confidence intervals for the conditions.

Moderation effects

In single moderation analyses for content quality ratings, we found a significant moderation effect for text

Discussion

In this study, we investigated whether trained raters scored the content-related and stylistic text quality of handwritten student essays from a large-scale assessment study at the end of lower secondary education higher or lower than computer-typed transcripts of these essays, and whether there are moderating effects of the presentation-mode effect by text quality, quality of handwriting, erroneousness (orthographic, grammatical, and punctuation errors), and text genre. We found a main effect

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

None.

Thomas Canz is postdoctoral researcher at the Faculty of Psychology at the FernUniversität in Hagen, Germany. His research projects focus on structure, development and measurement of reading and writing competencies.

References (41)

K. Barkaoui et al.
The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores
Assessing Writing
(2018)
S. Chan et al.
Researching the comparability of paper-based and computer-based delivery in a high-stakes writing test
Assessing Writing
(2018)
J. Klein et al.
The effect of variations in handwriting and print on evaluation of student essays
Assessing Writing
(2005)
A.N. Applebee
Issues in large-scale writing assessment
Journal of Writing Assessment
(2011)
K. Böhme et al.
Rater effects and rater inconsistency in large-scale assessment of writing ability
Paper Presented at the 2009 NCME Annual Meeting & Training Sessions
(2009)
K. Böhme et al.
Aspekte der Kodierung von Schreibaufgaben: Vergleich holistischer und analytischer Kodierungen unter besonderer Berücksichtigung der Interraterreliabilität [Aspects of the coding of writing tasks: A comparison of holistic and analytic codings in particular consideration of interrater reliability]
D. Briggs
The influence of handwriting on assessment
Educational Research
(1970)
T. Canz
Validitätsaspekte bei der Messung von Schreibkompetenzen [Validity aspects of measuring writing competencies]
(2015)
Y.L. Cheung
A comparative study of paper-and-pen versus computer-delivered assessment modes on students’ writing quality: A Singapore study
The Asia-Pacific Education Researcher
(2016)
P. Goldberg
Are women prejudiced against women?
Society
(1968)

N. Horkay et al.

Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP

The Journal of Technology, Learning and Assessment

(2006)

K.W. Howell et al.

Bias in authentic assessment

Diagnostique

(1993)

J. Hughes et al.

The influence of presentation upon examination marks

11th Annual Conference of the Subject Centre for Information and Computer Sciences

(2010)

J. Jerrim et al.

PISA 2015: How big is the ‘mode effect’ and what has been done about it?

Oxford Review of Education

(2018)

F.J. King, F. Rohani, C. Sanfilippo, N. White. Effects of handwritten versus computer-written modes of communication on...

J.R. Landis et al.

The measurement of observer agreement for categorical data

Biometrics

(1977)

D. Landy et al.

Beauty is talent: Task evaluation as a function of the performer’s physical attractiveness

Journal of Personality and Social Psychology

(1974)

R.H. Lehmann et al.

The Hamburg study of achievement in written composition. National report for the IEA international study of achievement in written composition

(1987)

R.L. Linn et al.

The nature and correlates of law school essay grades

Educational and Psychological Measurement

(1972)

J.C. Marshall

Composition errors and essay examination grades re-examined

American Educational Research Journal

(1967)

Cited by (4)

Feedback on teachers' text assessment: Does it foster assessment accuracy and motivation?
2024, Zeitschrift fur Padagogische Psychologie
Comparing Handwriting Fluency in English Language Teaching Using Computer Vision Techniques
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Inclusion and equity as a paradigm shift for artificial intelligence in education
2022, Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology
The importance and value of knowledge in the context of informatization: The problem of knowledge fragmentation
2020, TEM Journal

Lars Hoffmann is postdoctoral researcher at the Institute for Educational Quality Improvement (IQB) in Berlin, Germany. His research focusses on the diagnostic competencies of teachers and on educational quality and educational monitoring in secondary school.

Renate Kania is master student in Psychology at the FernUniversität in Hagen, Germany. In addition she absolved a scientific traineeship in the section for Educational Psychology.

View full text

Presentation-mode effects in large-scale writing assessments

Highlights

Abstract

Introduction

Section snippets

Construct-irrelevant variance and halo effects

Text sample

Main effect of presentation mode

Moderation effects

Discussion

Funding

Declaration of Competing Interest

Assessing Writing

Assessing Writing

Assessing Writing

Issues in large-scale writing assessment

Journal of Writing Assessment

Rater effects and rater inconsistency in large-scale assessment of writing ability

Paper Presented at the 2009 NCME Annual Meeting & Training Sessions

Aspekte der Kodierung von Schreibaufgaben: Vergleich holistischer und analytischer Kodierungen unter besonderer Berücksichtigung der Interraterreliabilität [Aspects of the coding of writing tasks: A comparison of holistic and analytic codings in particular consideration of interrater reliability]

The influence of handwriting on assessment

Educational Research

Validitätsaspekte bei der Messung von Schreibkompetenzen [Validity aspects of measuring writing competencies]

A comparative study of paper-and-pen versus computer-delivered assessment modes on students’ writing quality: A Singapore study

The Asia-Pacific Education Researcher

Are women prejudiced against women?

Society

Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP

The Journal of Technology, Learning and Assessment

Bias in authentic assessment

Diagnostique

The influence of presentation upon examination marks

11th Annual Conference of the Subject Centre for Information and Computer Sciences

PISA 2015: How big is the ‘mode effect’ and what has been done about it?

Oxford Review of Education

The measurement of observer agreement for categorical data

Biometrics

Beauty is talent: Task evaluation as a function of the performer’s physical attractiveness

Journal of Personality and Social Psychology

The Hamburg study of achievement in written composition. National report for the IEA international study of achievement in written composition

The nature and correlates of law school essay grades

Educational and Psychological Measurement

Composition errors and essay examination grades re-examined

American Educational Research Journal