当前位置: X-MOL 学术Language Testing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparison of holistic, analytic, and part marking models in speaking assessment
Language Testing ( IF 2.400 ) Pub Date : 2020-01-24 , DOI: 10.1177/0265532219898635
Nahal Khabbazbashi 1 , Evelina D. Galaczi 2
Affiliation  

This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings—which suggested stronger measurement properties for the part MM—phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. Although strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.

中文翻译:

口语评估中整体、分析和部分评分模型的比较

这项混合方法研究在半直接在线口语测试中检查了整体、分析和部分标记模型 (MM) 的测量属性和对候选 CEFR 分类的影响。首先对 240 名考生的口语表演进行整体评分和部分评分(第一阶段)。基于阶段 1 的发现——这表明部件 MM 具有更强的测量特性——阶段 2 侧重于部件和分析 MM 的比较。在该阶段对 400 名候选人的口语表现进行了分析和部分评分。评分者对他们的评分经验发表了公开评论。结果表明 MM 有显着影响;根据用于分配分数的 MM 的选择,在第 1 和第 2 阶段分别约有 30% 和 50% 的候选人获得了不同(相邻)的 CEFR 级别。整体 MM 的 CEFR 水平较高,部分 MM 的 CEFR 水平较低。尽管在所有 MM 配对之间发现了很强的相关性,但进一步的分析揭示了重要的差异。MM 部分显示出卓越的测量质量,特别是在允许评估者在不同的口语能力水平之间做出更精细的区分方面。这些发现对口语测试的评分有效性有影响。
更新日期:2020-01-24
down
wechat
bug