Elsevier

Assessing Writing

Volume 45, July 2020, 100470
Assessing Writing

Presentation-mode effects in large-scale writing assessments

https://doi.org/10.1016/j.asw.2020.100470Get rights and content

Highlights

  • Raters scored computer-typed transcripts higher than handwritten original essays.

  • Presentation-mode effects were found for content quality and style quality.

  • Text quality and grammatical erroneousness moderated the presentation-mode effects.

  • Presentation-mode effects differed between text genres.

Abstract

To ensure valid measurement in large-scale assessments, avoiding incorporating construct-irrelevant aspects is crucial. We investigated a potential source of construct-irrelevant variance, i.e. the presentation mode of essays (handwritten vs. computer-typed) and its influence on scoring. Further, we investigated whether the presentation-mode effect is moderated by text quality and legibility, as well as by orthographic, grammatical, and punctuation erroneousness, and whether the presentation mode differs over text genres. Rater-teams scored two versions (originally handwritten vs. computer-typed transcripts) of 430 essays written by students at the end of lower secondary education in Germany on content quality and style quality, as well as the considered moderators. Statistical analyses showed a main effect for presentation mode: Computer-typed transcripts were scored higher than handwritten essays. This effect was moderated by text quality and grammatical erroneousness. Furthermore, the strength of the presentation-mode effect differed by text genre. We argued that the strength as well as the direction of the presentation mode are guided by these and further factors.

Introduction

We live in an era of ongoing digitalization. This issue affects education and educational assessment. In the Programme for International Student Assessment (PISA) 2015, computer-based assessment was the primary assessment mode for the first time (Jerrim, Micklewright, Heine, Salzer, & McKeown, 2018; OECD, 2017). More and more writing assessments have shifted (or plan to shift) to computer-based assessments as well (Applebee, 2011; Sandene et al., 2005). In the last few years, many studies have investigated whether, and in which way(s), computer-based writing is different from handwriting, and how these factors may affect the results of writing products (e.g. Chan, Bax, & Weir, 2018; Cheung, 2016; Horkay, Bennett, Allen, Kaplan, & Yan, 2006). Besides these writer-related differences between handwriting and computer-based writing, there may also be rater-based differences. Typewritten texts may be received, estimated, and—within the scope of writing assessment—scored differently than handwritten texts. This is a highly relevant point for assessment practice, especially if a mixture of writing modes is applied (computer-based and handwritten, either by choice of the writer or due to available resources in different test locations). For this case, the product-relevant writing processes as well as the rating and scoring should be sufficiently comparable. In addition, the transcription of paper-pencil-written essays is often considered in assessment practice to avoid additional costs because raters require more time to read essays that are not very legible due to poor handwriting and to ensure that ratings will not be influenced by the quality of the handwriting.

In this study, we addressed the question of whether handwritten essays are scored differently to computer-typed transcripts of the same essays in the context of a large-scale writing assessment which was part of a national educational study in Germany of students at the end of lower secondary education. Although the question of differences between ratings of handwritten and typed texts is not new—the initial studies go back to the 1960s—there are hardly any studies that have investigated these potential differences in standardized large-scale-assessments with intensely trained raters. In addition, we investigated different potential moderators of rating differences between handwritten and computer-typed text-versions that have already been detected by previous studies, and their interplay, which has not yet been investigated systematically.

Section snippets

Construct-irrelevant variance and halo effects

In large-scale educational assessments, a valid measurement of the intended construct is crucial because results are often used to diagnose the proficiency of students and can decide, for instance, about further schooling tracks or access to higher educational programs. A special threat to valid measurement is what Messick (1990, 1995) called construct-irrelevant variance, implying that, in addition to construct-relevant aspects, other aspects are also measured during the assessment process

Text sample

The essays scored in our study are a subsample of essays written in German by students at the end of lower secondary education (9th and 10th grade) in Germany; they are derived from a large-scale writing assessment study conducted in 2011 with 2996 students. Participating students visited schools providing general education; all school types (lower, middle, and higher track) were represented. The aim of the study was the generation of normed writing tasks to assess the attainment of the

Main effect of presentation mode

Paired-sample t-tests revealed a main effect of presentation mode with an advantage for computer-typed essay variants for both content quality (t(420) = 4.22; p < .001; dz = .21) and style quality (t(421) = 4.33; p < .001; dz = .21). For both variables, computer-typed variants were rated about 0.1 scale points better. Fig. 2 displays the means and confidence intervals for the conditions.

Moderation effects

In single moderation analyses for content quality ratings, we found a significant moderation effect for text

Discussion

In this study, we investigated whether trained raters scored the content-related and stylistic text quality of handwritten student essays from a large-scale assessment study at the end of lower secondary education higher or lower than computer-typed transcripts of these essays, and whether there are moderating effects of the presentation-mode effect by text quality, quality of handwriting, erroneousness (orthographic, grammatical, and punctuation errors), and text genre. We found a main effect

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

None.

Thomas Canz is postdoctoral researcher at the Faculty of Psychology at the FernUniversität in Hagen, Germany. His research projects focus on structure, development and measurement of reading and writing competencies.

References (41)

  • N. Horkay et al.

    Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP

    The Journal of Technology, Learning and Assessment

    (2006)
  • K.W. Howell et al.

    Bias in authentic assessment

    Diagnostique

    (1993)
  • J. Hughes et al.

    The influence of presentation upon examination marks

    11th Annual Conference of the Subject Centre for Information and Computer Sciences

    (2010)
  • J. Jerrim et al.

    PISA 2015: How big is the ‘mode effect’ and what has been done about it?

    Oxford Review of Education

    (2018)
  • F.J. King, F. Rohani, C. Sanfilippo, N. White. Effects of handwritten versus computer-written modes of communication on...
  • J.R. Landis et al.

    The measurement of observer agreement for categorical data

    Biometrics

    (1977)
  • D. Landy et al.

    Beauty is talent: Task evaluation as a function of the performer’s physical attractiveness

    Journal of Personality and Social Psychology

    (1974)
  • R.H. Lehmann et al.

    The Hamburg study of achievement in written composition. National report for the IEA international study of achievement in written composition

    (1987)
  • R.L. Linn et al.

    The nature and correlates of law school essay grades

    Educational and Psychological Measurement

    (1972)
  • J.C. Marshall

    Composition errors and essay examination grades re-examined

    American Educational Research Journal

    (1967)
  • Cited by (4)

    Thomas Canz is postdoctoral researcher at the Faculty of Psychology at the FernUniversität in Hagen, Germany. His research projects focus on structure, development and measurement of reading and writing competencies.

    Lars Hoffmann is postdoctoral researcher at the Institute for Educational Quality Improvement (IQB) in Berlin, Germany. His research focusses on the diagnostic competencies of teachers and on educational quality and educational monitoring in secondary school.

    Renate Kania is master student in Psychology at the FernUniversität in Hagen, Germany. In addition she absolved a scientific traineeship in the section for Educational Psychology.

    View full text