Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study

Keller-Margulis, Milena A.; Mercer, Sterett H.; Matta, Michael

doi:10.1007/s11145-021-10153-6

Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study

Published: 01 April 2021

Volume 34, pages 2461–2480, (2021)
Cite this article

Reading and Writing Aims and scope Submit manuscript

800 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

Existing approaches to measuring writing performance are insufficient in terms of both technical adequacy as well as feasibility for use as a screening measure. This study examined the validity and diagnostic accuracy of several approaches to automated text evaluation as well as written expression curriculum-based measurement (WE-CBM) to determine whether an automated approach improves technical adequacy. A sample of 140 fourth grade students generated writing samples that were then scored using traditional and automated approaches and examined in relation to the statewide measure of writing performance. Results indicated that the validity and diagnostic accuracy for the best performing WE-CBM metric, correct minus incorrect word sequences, and the automated approaches to scoring were comparable, with automated approaches offering potentially improved feasibility for use in screening. Averaging scores across three time points was necessary, however, in order to achieve improved validity and adequate levels of diagnostic accuracy across the scoring approaches. Limitations, implications, and directions for future research regarding the use of automated scoring approaches for screening are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Keith S. Taber

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

Article Open access 10 December 2019

Kit S. Double, Joshua A. McGrane & Therese N. Hopfenbeck

Artificial intelligence in higher education: the state of the field

Article Open access 24 April 2023

Helen Crompton & Diane Burke

References

Allen, L., Dascalu, M., McNamara, D. S., Crossley, S., & Trausan-Matu, S. (2016). Maodeling individual differences among writers using ReaderBench. In L. Gómez Chova, A. López Martínez, & I. Candel Torres (Eds.), EDULEARN16 proceedings. (pp. 5269–5279). IATED Academy.
Chapter Google Scholar
Botarleanu, R. M., Dascalu, M., Sirbu, M. D., Crossley, S. A., & Trausan-Matu, S. (2019). ReadMEGenerating personalized feedback for essay writing using the ReaderBench framework. In H. Knoche, E. Popescu, & A. Cartelli (Eds.), Smart learning ecosystems and regional development 2018. (pp. 133–145). Springer.
Google Scholar
Cook, B. G., Lloyd, J. W., Mellor, D., Nosek, B. A., & Therrien, W. J. (2018). Promoting open science to increase the trustworthiness of evidence in special education. Exceptional Children, 85, 104–118. https://doi.org/10.1177/0014402918793138.
Article Google Scholar
Crossley, S. A., Bradfield, F., & Bustamante, A. (2019). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11, 251–270. https://doi.org/10.17239/jowr-2019.11.02.01.
Article Google Scholar
Dascalu, M., Crossley, S. A., McNamara, D. S., Dessus, P., & Trausan-Matu, S. (2018). Please ReaderBench this text: A multi-dimensional textual complexity assessment framework. In S. D. Craig (Ed.), Tutoring and intelligent tutoring systems. (pp. 251–271). Nova Science.
Google Scholar
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232. https://doi.org/10.1177/001440298505200303.
Article Google Scholar
Diedenhofen, B., & Musch, J. (2015). cocor: A comprehensive solution for the statistical comparison of correlations. PLoS ONE, 10, 1–12. https://doi.org/10.1371/journal.pone.0121945.
Article Google Scholar
Espin, C., Shin, J., Deno, S. L., Skare, S., Robinson, S., & Benner, B. (2000). Identifying indicators of written expression proficiency for middle school students. The Journal of Special Education, 34, 140–153. https://doi.org/10.1177/002246690003400303.
Article Google Scholar
Espin, C. A., Scierka, B. J., Skare, S., & Halverson, N. (1999). Criterion-related validity of curriculum-based measures in writing for secondary school students. Reading & Writing Quarterly: Overcoming Learning Difficulties, 15, 5–27. https://doi.org/10.1080/105735699278279.
Article Google Scholar
Gansle, K. A., Noell, G. H., VanDerHeyden, A. M., Naquin, G. M., & Slider, N. J. (2002). Moving beyond total words written: The reliability, criterion validity, and time cost of alternate measures for curriculum-based measurement in writing. School Psychology Review, 31, 477–497.
Article Google Scholar
Graesser, A. C., McNamara, D. S., Cai, Z., Conley, M., Li, H., & Pennebaker, J. (2014). Coh-Metrix measures text characteristics at multiple levels of language and discourse. The Elementary School Journal, 115, 210–229. https://doi.org/10.1086/678293.
Article Google Scholar
Hanley, J. A., & McNeil, B. J. (1983). A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148, 839–843. https://doi.org/10.1148/radiology.148.3.6878708.
Article Google Scholar
Hosp, M. K., Hosp, J. L., & Howell, K. W. (2016). The ABCs of CBM: A practical guide to curriculum-based measurement (2nd ed.). Guilford.
Keller-Margulis, M. A., Mercer, S. H., & Thomas, E. L. (2016). Generalizability theory reliability of written expression curriculum-based measurement in universal screening. School Psychology Quarterly, 31, 383–392. https://doi.org/10.1037/spq0000126.
Article Google Scholar
Kim, Y. G., Schatschneider, C., Wanzek, J., Gatlin, B., & Al Otaiba, S. (2017). Writing evaluation: Rater and task effects on the reliability of writing scores for children in Grades 3 and 4. Reading and Writing: An Interdisciplinary Journal, 30, 1287–1310. https://doi.org/10.1007/s11145-017-9724-6.
Article Google Scholar
Malecki, C. K., & Jewell, J. (2003). Developmental, gender, and practical considerations in scoring curriculum-based measurement writing probes. Psychology in the Schools, 40, 379–390. https://doi.org/10.1002/pits.10096.
Article Google Scholar
McMaster, K. L., & Campbell, H. (2008). New and existing curriculum-based writing measures: Technical features within and across grades. School Psychology Review, 37, 550–556.
Article Google Scholar
McMaster, K. L., & Espin, C. A. (2007). Technical features of curriculum-based measurement in writing. The Journal of Special Education, 41, 68–84. https://doi.org/10.1177/00224669070410020301.
Article Google Scholar
McMaster, K. L., Lembke, E. S., Shin, J., Poch, A. L., Smith, R. A., Jung, P., Allen, A. A., & Wagner, K. (2020). Supporting teachers’ use of data-based instruction to improve students’ early writing skills. Journal of Educational Psychology, 112, 1–21. https://doi.org/10.1037/edu0000358.
Article Google Scholar
Meng, X., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111, 172–175. https://doi.org/10.1037/0033-2909.111.1.172.
Article Google Scholar
Mercer, S. H. (2020). writeAlizer: Generate predicted writing quality and written expression CBM scores. (Version 1.2.0) [Computer software]. https://github.com/shmercer/writeAlizer/.
Mercer, S. H., Keller-Margulis, M. A., Faith, E. L., Reid, E. K., & Ochs, S. (2019). The potential for automated text evaluation to improve the technical adequacy of written expression curriculum-based measurement. Learning Disability Quarterly, 42, 117–128. https://doi.org/10.1177/0731948718803296.
Article Google Scholar
National Center for Educational Statistics. (2012). The nation's report card: Writing 2011. Institute of Education Sciences, U.S. Department of Education. http://nationsreportcard.gov.
National Center on Intensive Intervention. (2018). Academic screening tools chart rating rubric. https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_RatingRubric_July2018.pdf.
Page, E. B. (2003). Project essay grade: PEG. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective. (pp. 43–54). Lawrence Erlbaum Associates.
Google Scholar
Payan, A. M., Keller-Margulis, M., Burridge, A. B., McQuillin, S. D., & Hassett, K. S. (2019). Assessing teacher usability of written expression curriculum-based measurement. Assessment for Effective Intervention, 45, 51–64. https://doi.org/10.1177/1534508418781007.
Article Google Scholar
Perelman, L. (2014). When “the state of the art” is counting words. Assessing Writing, 21, 104–111. https://doi.org/10.1016/j.asw.2014.05.001.
Article Google Scholar
Perin, D. (2020). Reading, writing, and self-efficacy of low-skilled postsecondary students. In D. Perin (Ed.), The Wiley handbook of adult literacy. (pp. 237–260). Blackwell: Wiley.
Google Scholar
Philippakos, Z. A., MacArthur, C. A., & Coker, D. L. (2015). Developing strategic writers through genre instruction: Resources for grades 3–5. Guilford.
R Core Team. (2019). R: A language and environment for statistical computing. (Version 3.6.1) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/.
Rainie, L., & Anderson, J. (2017). Code-dependent: pros and cons of the algorithm age. Pew Research Center. http://www.pewinternet.org/2017/02/08/code-dependent-pros-and-cons-of-the-algorithm-age.
Ritchey, K. D., McMaster, K. L., Al Otaiba, S., Puranik, C. S., Kim, Y. G., Parker, D. C., & Ortiz, M. (2016). Indicators of fluent writing in beginning writers. In K. D. Cummings & Y. Petscher (Eds.), The fluency construct: Curriculum-based measurement concepts and applications. (pp. 21–66). Springer.
Chapter Google Scholar
Robitzsch, A., & Grund, S. (2020). miceadds: Some additional multiple imputation functions, especially for 'mice'. (Version 3.9–14) [Computer software]. https://CRAN.R-project.org/package=miceadds.
Roebuck, D. B., Sightler, K. W., & Brush, C. C. (1995). Organizational size, company type, and position effects on the perceived importance of oral and written communication skills. Journal of Managerial Issues, 7, 99–115.
Google Scholar
Romig, J. E., Therrien, W. J., & Lloyd, J. W. (2017). Meta-analysis of criterion validity for curriculum-based measurement in written language. The Journal of Special Education, 51, 72–82. https://doi.org/10.1177/0022466916670637.
Article Google Scholar
Smolkowski, K., Cummings, K. D., & Strycker, L. (2016). An introduction to the statistical evaluation of fluency measures with signal detection theory. In K. D. Cummings & Y. Petscher (Eds.), The fluency construct: Curriculum-based measurement concepts and applications. (pp. 187–221). Springer.
Chapter Google Scholar
Stevens, B. (2005). What communication skills do employers want? Silicon Valley recruiters respond. Journal of Employment Counseling, 42, 2–9. https://doi.org/10.1002/j.2161-1920.2005.tb00893.x.
Article Google Scholar
Texas Education Agency. (2012a). State of Texas assessments of academic readiness: Grade 4 expository scoring guide spring 2012. https://tea.texas.gov/sites/default/files/staar-g4-ExpScorGde-spr2012.pdf.
Texas Education Agency. (2012b). State of Texas assessments of academic readiness: Grade 4 personal narrative scoring guide spring 2012. https://tea.texas.gov/sites/default/files/staar-g4Wtg-PerNarrScoreGde-Spr2012.pdf.
Texas Education Agency. (2012c). Technical digest 2011–2012. https://tea.texas.gov/student-assessment/testing/student-assessment-overview/technical-digest-2011-2012.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67. https://doi.org/10.18637/jss.v045.i03.
Article Google Scholar
Wilson, J. (2018). Universal screening with automated essay scoring: Evaluating classification accuracy in grades 3 and 4. Journal of School Psychology, 68, 19–37. https://doi.org/10.1016/j.jsp.2017.12.005.
Article Google Scholar
Wilson, J., Chen, D., Sandbank, M. P., & Hebert, M. (2019). Generalizability of automated scores of writing quality in Grades 3–5. Journal of Educational Psychology, 111, 619–640. https://doi.org/10.1037/edu0000311.
Article Google Scholar
Wilson, J., Olinghouse, N. G., McCoach, D. B., Santangelo, T., & Andrada, G. N. (2016). Comparing the accuracy of different scoring methods for identifying sixth graders at risk of failing a state writing assessment. Assessing Writing, 27, 11–23. https://doi.org/10.1016/j.asw.2015.06.003.
Article Google Scholar
Wilson, J., Roscoe, R., & Ahmed, Y. (2017). Automated formative writing assessment using a levels of language framework. Assessing Writing, 34, 16–36. https://doi.org/10.1016/j.asw.2017.08.002.
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A190100 awarded to the University of Houston (PI – Milena Keller-Margulis). The opinions expressed are those of the authors and do not represent views of the Institute or thae U.S. Department of Education.

Author information

Authors and Affiliations

University of Houston, Houston, USA
Milena A. Keller-Margulis & Michael Matta
University of British Columbia, Vancouver, Canada
Sterett H. Mercer

Authors

Milena A. Keller-Margulis
View author publications
You can also search for this author in PubMed Google Scholar
Sterett H. Mercer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Matta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Milena A. Keller-Margulis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Keller-Margulis, M.A., Mercer, S.H. & Matta, M. Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study. Read Writ 34, 2461–2480 (2021). https://doi.org/10.1007/s11145-021-10153-6

Download citation

Accepted: 21 March 2021
Published: 01 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11145-021-10153-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

Artificial intelligence in higher education: the state of the field

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Validity of automated text evaluation tools for written-expression curriculum-based measurement: a comparison study

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

Artificial intelligence in higher education: the state of the field

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation