Evidence of fairness: Twenty-five years of research in Assessing Writing

doi:10.1016/j.asw.2019.100418

Assessing Writing

Volume 42, October 2019, 100418

https://doi.org/10.1016/j.asw.2019.100418 Get rights and content

Highlights

•
Five fairness reseach trends identified: bias, validity, social context, legal responsibility, and ethical obligation.
•
Findings show researchers have diverse categories in gathering evidence of fairness.
•
Findings reveal that we do not share a single taxonomy for researching fairness.
•
Individual score use and contextual factors invite future research that works to structure opportunity for students.

Abstract

When Assessing Writing (ASW) was founded 25 years ago, conversations about fairness were very much in the air and illustrated sharp divides between teachers and educational measurement researchers. For teachers, fairness was typically associated with consistency and access. For educational measurement researchers, fairness was a technical issue: an assessment that did not identify the presence of β (the bias factor) was fair. Since its founding, ASW has continued to be a space where evolving discussions about fairness play out. In this article, we examine a selection of 73 ASW research studies published from 1994 to 2018 that use fairness as a category of evidence. In tracing the use of fairness and related terms across these research articles, our goal is to understand how the conversation about fairness has changed in the last quarter century. Following a literature review that situates fairness within generational, standards-based, and evidential scholarship, we analyze five trends in the journal: fairness as the elimination of bias; fairness as the pursuit of validity; fairness as acknowledgement of social context; fairness as legal responsibility; and fairness as ethical obligation. A tidy narrative that theoretical conceptualization of fairness has deepened over the ASW lifespan is not born out by our findings. Instead, evidence suggests that the disparate stances and methodological challenges that informed early research on fairness remain. As well, the textual record suggests that we have not developed or shared taxonomies for systematically investigating questions of fairness. In our desire to make the research we present actionable, we close by calling attention to the need for theorization of fairness, the advantages nuanced of research methods, and the benefits of non-Western perspectives.

Introduction

In the first editorial to Assessing Writing (ASW) in 1994, Brian Huot wrote about the genesis of the journal, which stemmed from the 1992 New Directions in Portfolios Conference (Black, Daiker, Sommers, & Stygal, 1994; Black, Helton, & Sommers, 1994). As Huot explained in his introduction, there were few venues for writing researchers to publish on assessment despite the wealth of scholarship presented at conferences. For Huot and co-editor Kathleen Blake Yancey, the journal was a space where “the crucial relationship between pedagogy and assessment” would be valued (Huot & Yancey, 1994, p. 143). There would not be a sole focus on educational measurement perspectives; instead, as the editors established in their first editorial of the second volume, ASW would “contribute to the increasingly interesting and divergent conversations about assessment currently taking place” (Huot & Yancey, 1995, p. 1).

Indeed, early issues of the journal created a space where teachers of writing, writing program administrators, and writing researchers could find a place to work through the issues that interested them in the ways they believed were viable for their programs and students. The traditions of measurement and composition would be in conversation. Notions of fairness would be important here. There was a sense of justice—that what researchers were publishing in the journal was not solely about assessing writing more efficiently or accurately; rather, the research was about making the teaching and assessment of writing fairer. In other words, the conversation was not just about technical advances in-and-of themselves but about examining the social conditions created by and through writing assessment. In fact, one might argue that because of its resonances with justice, democracy, and social good, the very notion of fairness has been integral to ASW since its inception.

Two articles from the first issue of ASW illustrate how fairness informed the journal from its inception. In “Validity in High Stakes Writing Assessment: Problems and Possibilities,” Pamela Moss, a trained psychometrician then assistant professor at University of Michigan, wrote: “Extensive research has been conducted on how to develop and score standardized writing tasks to provide reliable, valid, and fair estimates of students’ writing abilities (e.g., Breland, Camp, Jones, Morris, & Rock, 1987; Huot, 1990; Ruth & Murphy, 1988)” (1994, p. 109). In citing Hunter Breland, a former engineer and then researcher at Educational Testing Service (ETS), in relation to Brian Huot, then assistant professor of English at University of Louisville who advocated for linking writing assessment with learning, Moss was clearly making connections between the measurement and composition communities. Moss went on to connect questions of fairness to “consequential decisions about individuals and programs,” opportunity to learn, and research on “the cognitive and social aspects of learning” (p. 110, p. 109).

If Moss was attempting to put competing notions of fairness in dialog, Michael Williamson (1994), an English professor who specialized in writing assessment and had studied with the psychometrician Michael J. Zieky at ETS, was interested in thinking about the history of education and the implications of fairness in testing beyond educational settings. In “The Worship of Efficiency: Untangling Theoretical and Practical Considerations in Writing Assessment,” Williamson associated fairness with rise of rationalist methods at the beginning of the 20th century: “The positivist science of psychometrics that developed in the late nineteenth century connected to [a] shift in education began to provide assessment tools believed to be objective and fair because they were seen as independent of the bias of the human decisions of individual teachers” (Williamson, 1994, p. 151). Here, Williamson identified a concept of fairness associated with assessment methods designed to be distributed across settings—and, hence, often more reliable than valid or fair. He went on to locate another concept of fairness in the “bureaucratic model” of contemporary education, one in which there is a need for fairness in terms of equitable impact. In his description, fairness is related to treatment and the downstream implications of decisions related to access or exclusion (p. 151). Clearly, Williamson was thinking about the ramifications of assessment beyond a single practice and wanted teachers and researchers to think more expansively about the implications of experimental design and social consequence.

We provide as examples these articles in the first two issues of ASW to illustrate the complex discussions about fairness that researchers were already having in the field at the inception of the journal. Note that terminology about race, class, gender, or linguistic difference are not prominent in the quoted articles but that concerns about social inequality are certainly present. Emphasis on these two early articles is a useful lesson that current conversations about fairness, and the very meaning of fairness itself, are not new; rather, discussions of fairness in writing assessment are rooted in much deeper philosophical and methodological discussions in the field. These two articles, both published in 1994, therefore, identify an initial and enduring concern with evidence of fairness that continue to the present writing. That concern has taken many forms, as we will show: from the U.S. founding of the journal in 1994 under Huot and Yancey until 2000; through its internationalization under the editorship of Liz Hamp-Lyons from 2002 to 2017; and continuing under the current editorship of David Slomp and Martin East and their focus on the consequential dimensions of validity, reliability and fairness in international settings.

In the following article, written in response to the call from Slomp and East in their first issue as editors as they aimed to frame the future of writing assessment, we trace this evolving conversation in the pages of ASW from 73 selected research articles that use fairness and related terms as key words. After establishing our literature review (§2), research questions (§3) and methods (§4), we present five trends we identified in the 73 articles: fairness as the elimination of bias (§5.1); fairness as the pursuit of validity (§5.2); fairness as acknowledgement of social context (§5.3); fairness as legal responsibility (§5.4); and fairness as ethical obligation (§5.5). We then discuss our findings in terms of our research questions (§6) and conclude with recommendations (§7) and conclusions (§8) that may prove useful to writing assessment researchers.

Section snippets

Literature review

Three ways to review the scholarship related to fairness in writing assessment have informed the current study. The first way, as demonstrated by Dorans (2011), is to establish periodization through a generational approach to international educational measurement. A second way is to focus on U.S. educational measurement standards as they have changed from 1952 to 2014. A third way is to focus on changes in the way researchers have used evidence related to fairness. Other ways to review

Research questions

Informed by the literature review, four questions guided our research as we investigated fairness as an evidential category in the journal:

1
How have writing assessment researchers constructed fairness?
2
How have those constructions either directly or indirectly revealed categories of evidence related to fairness?
3
Have there been major shifts in the use of fairness as a guiding research principle in the last 25 years?
4
Are there ways that fairness has not been used by writing assessment researchers

Methods

Drawing on Petticrew and Roberts (2006), our systematic review was based on an analysis of ASW issues from 1994 (Vol. 1) to 2018 (Vol. 39). To answer our research questions, we used a three-phase approach that included keyword analysis, categorical analysis, and interpretative analysis.

Five trends

Our findings related to the presence and shifts in fairness research in ASW are best presented visually as well as textually. First, as Fig. 1 reveals, the 1999 edition of the Standards captured the educational measurement zeitgeist that had an impact of the journal—that discussions about fairness were nascent but would have a lasting impact on the research published in ASW. As Fig. 1 shows, though, there has been a dearth in philosophical articles related to fairness. Fig. 1 also points to the

Discussion

Based on the 73 articles we examined for this review, we offer a discussion about each of our research questions concerning the ways fairness has been defined, and has evolved, in the 25 years that ASW has been published. Based on our findings, we see that while foundational research has begun, much work remains. Before discussing our findings, we acknowledge that there are other possible ways of concluding a study such as ours. For example, using a reference corpus of the 291 articles selected

Recommendations

In the afterword to The Practical Past (2014), Hayden White wrote that we need a new understanding of history lest we wind up submitting “to the authority of those claiming the right to tell us who we are, what we are supposed to do, and what we should strive for in order to be at all” (p. 103). His recommendation is that we consider a practical past alongside the historical past. In that imaginative consideration, we gain scope, depth, and awareness. If the historical past is a scientific

Conclusion

The history of a journal like Assessing Writing—one that came about as a field was emerging—is ultimately an intellectual history. In this capacity, ASW serves as both a testament to the changing landscape of the field of writing assessment and the imperative of academic journals to lead conversations in a field. The space that the original editors, Huot and Yancey, created for making connections between pedagogy and assessment has proved to be generative over time. The generative nature of

Acknowledgements

The authors would like to thank Cherice Escobar Jones for preparation of Fig. 1 and Appendix B. We would also like to thank Robert J. Mislevy for his review of Appendix A and his consultation regarding new uses of IRT models used to provide student ability portraits. We also thank our two anonymous reviewers.

References (112)

K. Barkaoui et al.
The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores
Assessing Writing
(2018)
S.W. Beck et al.
Genres of high-stakes writing assessments and the construct of writing competence
Assessing Writing
(2007)
N. Behizadeh et al.
Historical view of the influences of measurement and writing theories on the practice of writing assessment in the United States
Assessing Writing
(2011)
N. Behizadeh et al.
Awaiting a new wave: The status of state writing
Assessing Writing
(2016)
L. Black et al.
Connecting current research on authentic and performance assessment through portfolios
Assessing Writing
(1994)
B. Broad
Reciprocal authorities in communal writing assessment: Constructing textual value within a “New politics of inquiry.”
Assessing Writing
(1997)
T. Brunfaut et al.
Going online: The effect of mode of delivery on performances and perceptions on an English L2 writing test suite
Assessing Writing
(2018)
H. Camp
The psychology of writing development—And its implications for assessment
Assessing Writing
(2012)
A. Cumming
Assessing L2 writing: Alternative constructs and ethical dilemmas
Assessing Writing
(2002)
L. Dappen et al.
A statewide writing assessment model: Student proficiency and future implications
Assessing Writing
(2008)

G.L. Goldberg et al.

A question of choice: The implications of assessing expressive writing in multiple genres

Assessing Writing

(1998)

L. Hamp-Lyons

The scope of writing assessment

Assessing Writing

(2002)

R. Haswell et al.

Gender bias and critique of student writing

Assessing Writing

(1996)

L. He et al.

ESL students’ perceptions and experiences of standardized English writing tests

Assessing Writing

(2008)

J. Huang

Using generalizability theory to examine the accuracy and validity of large-scale ESL writing assessment

Assessing Writing

(2012)

B. Huot

Editorial: An introduction to assessing writing

Assessing Writing

(1994)

B. Huot et al.

From the editors

Assessing Writing

(1994)

B. Huot et al.

From the editors

Assessing Writing

(1995)

J.V. Jeffery

Constructs of writing proficiency in US state and national writing assessments: Exploring variability

Assessing Writing

(2009)

A.C. Johnson et al.

College student perceptions of writing errors, text quality, and author characteristics

Assessing Writing

(2017)

J. Li et al.

Academic tutors’ beliefs about and practices of giving feedback on students’ written assignments: A New Zealand case study

Assessing Writing

(2011)

V. Lindhardsen

From independent ratings to communal ratings: A study of CWA raters’ decision-making behaviors

Assessing Writing

(2018)

P. Moss

Validity in high stakes writing assessment: Problems and possibilities

Assessing Writing

(1994)

J. Petersen

“This test makes no freaking sense”: Criticism, confusion, and frustration in times writing

Assessing Writing

(2009)

E. Schendel et al.

Exploring the theories and consequences of self-assessment through ethical inquiry

Assessing Writing

(1999)

E. Spalding et al.

It was the best of times. It was a waste of time: University of Kentucky students’ views of writing under KERA

Assessing Writing

(1998)

American Educational Research Association et al.

Standards for educational and psychological testing

(1985)

American Educational Research Association et al.

Standards for educational and psychological testing

(1999)

American Educational Research Association et al.

Standards for educational and psychological testing

(2014)

Americans with Disabilities Act of 1990, Pub. L. No. 101-336, § 2, 104 Stat. 327 (1999). Retrieved...

J.L. Austin

How to do things with words

(1962)

Bakhtin, M. (1934-1935/1981). Discourse in the novel. In M. Holquist (Ed.), The dialogic imagination: Four essays (pp....

Bakhtin, M. (1952-1953/1986). The problem of speech genres. In C. Emerson & M. Holquist (Eds.), Speech genres and other...

C. Bazerman

Systems of genres and the enactment of social intentions

A. Beaufort

College writing and beyond: A new framework for university writing instruction

(2008)

C.M. Berry

Differential validity and differential prediction of cognitive ability tests: Understanding test bias in the employment context

Annual Review of Organizational Psychology and Organizational Behavior

(2015)

B. Bishop

Equality was heart of reform

(1998)

L. Black et al.

New directions in portfolio assessment: Reflective practice, critical theory, and large-scale scoring

(1994)

W.J. Boone et al.

Rasch analysis: A primer for school psychology researchers and practitioners

Cogent Education

(2017)

H.M. Breland et al.

Assessing writing skill. College Board research monograph No. 11

(1987)

J. Britton et al.

The development of writing abilities

(1975)

J.E. Carlson et al.

Item response theory

Civil Rights Act of 1964, Pub.L. 88-352, 78 Stat. 241 (1964). Retrieved...

T.A. Cleary

Test bias: Prediction of grades of Negro and White Students in integrated colleges

Journal of Educational Measurement

(1968)

J. Cohen

Statistical power analysis for the behavioral sciences

(1988)

F. Cram

Lessons on decolonizing evaluation from Kaupapa Māori evaluation

Canadian Journal of Program Evaluation

(2016)

L.J. Cronbach

Five perspectives on validity argument

E. Cushman

Decolonizing validity

The Journal of Writing Assessment

(2016)

A.J. Devitt

Generalizing about genre: New conceptions on an old concept

College English

(1993)

N.J. Dorans

Holland’s advice for the fourth generation of test theory: Blood tests can be contests

Cited by (0)

Mya Poe is Associate Professor of English and Director of the Writing Program at Northeastern University. Her research focuses on writing assessment and writing development with particular attention to equity and fairness. She is the co-author of Learning to Communicate in Science and Engineering (CCCC Advancement of Knowledge Award, 2012), co-editor of Race and Writing Assessment (CCCC Outstanding Book of the Year, 2014), and co-editor of Writing, Assessment, Social Justice, and Opportunity to Learn (2019). She has also guest-edited special issues of Research in the Teaching of English and College English dedicated to issues of social justice, diversity, and writing assessment. She is series co-editor of the Oxford Brief Guides to Writing in the Disciplines. In 2015–2016, she won the College of Social Sciences and Humanities Outstanding Teaching Award and the Northeastern University Teaching Excellence Award.

Norbert Elliot is Professor Emeritus of English at New Jersey Institute of Technology. In 2016, he was appointed Research Professor in the Department of English at University of South Florida. He presently serves on the editorial boards of Assessing Writing, IEEE Transactions in Professional Communication, and Research in the Teaching of English. From 2017 to 2019, he served as Editor-in-Chief of The Journal of Writing Analytics. Most recently, he is co-author with Richard Haswell of Early Holistic Scoring of Writing: A Theory, A History, A Reflection (Utah State University Press, 2019). With Diane Kelly-Riley, he is co-editor of Improving Outcomes: Disciplinary Writing, Local Assessment, and the Aim of Fairness (forthcoming, Modern Language Association).

View full text