当前位置: X-MOL 学术Language Testing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Revisiting rating scale development for rater-mediated language performance assessments: Modelling construct and contextual choices made by scale developers
Language Testing ( IF 2.400 ) Pub Date : 2021-02-23 , DOI: 10.1177/0265532221994052
Ute Knoch 1 , Bart Deygers 2 , Apichat Khamboonruang 3
Affiliation  

Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing research for over a decade. In this paper we refine the dominant model of rating scale development by drawing on a corpus of 36 studies identified in a systematic review. We present a model showing the different sources of scale construct in the corpus. In the discussion, we argue that rating scale designers, just like test developers more broadly, need to start by determining the purpose of the test, the relevant policies that guide test development and score use, and the intended score use when considering the design choices available to them. These include considering the impact of such sources on the generalizability of the scores, the precision of the post-test predictions that can be made about test takers’ future performances and scoring reliability. The most important contributions of the model are that it gives rating scale developers a framework to consider prior to starting scale development and validation activities.



中文翻译:

重新审视等级量表的开发,以进行由评定者介导的语言表现评估:对量表开发人员进行的构造和上下文选择建模

语言评估领域的等级量表开发通常以二分法来考虑:假定它是由专家的直觉或利用绩效数据来指导的。即使许多作者认为等级量表的开发很少如此容易分类,但这种二元论观点在语言测试研究中占据了十多年的主导地位。在本文中,我们通过在系统综述中确定的36项研究的语料库,完善了评级量表发展的主导模型。我们提出了一个模型,显示了语料库中规模构建的不同来源。在讨论中,我们认为,与更广泛的测试开发人员一样,评分量表设计人员也需要从确定测试的目的,指导测试开发和评分使用的相关政策开始,以及在考虑他们可用的设计选择时使用的预期分数。这些措施包括考虑这些来源对分数可推广性的影响,可以对考生的未来表现和分数可靠性进行测后预测的精度。该模型最重要的贡献在于,它为评级量表开发人员提供了一个框架,可以在开始进行量表开发和验证活动之前对其进行考虑。

更新日期:2021-02-23
down
wechat
bug