当前位置: X-MOL 学术Language Assessment Quarterly › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Investigating the Skills Involved in Reading Test Tasks through Expert Judgement and Verbal Protocol Analysis: Convergence and Divergence between the Two Methods
Language Assessment Quarterly ( IF 1.4 ) Pub Date : 2021-03-23 , DOI: 10.1080/15434303.2021.1881964
Xiaohua Liu 1 , John Read 2
Affiliation  

ABSTRACT

Expert judgement has been frequently employed with reading assessments to gauge the skills potentially measured by test tasks, for purposes such as construct validation or producing diagnostic information. Despite the critical role it plays in such endeavours, few studies have triangulated its results with other types of data such as reported test-taking processes. A lack of such triangulation may bring the validity of experts’ judgements into question and undermine the credibility of subsequent procedures that build on them. In light of this, this study compared two groups of language experts’ judgements on the content of two sets of reading test tasks with ten university students’ verbal reports on solving those tasks. It was found that convergence was achieved between the two information sources for about 53% of the test tasks on what they were mainly assessing. However, there was a bigger gap between them regarding the specific skills involved in each task. A careful examination of the discrepancies between the two sources revealed that they are attributable to a number of factors. This study highlights the need to cross-check the results of expert judgement with other data sources. Implications for future test development and research are also discussed.



中文翻译:

通过专家判断和口头协议分析调查阅读测试任务所涉及的技能:两种方法之间的趋同和发散

摘要

专家判断经常与阅读评估一起使用,以衡量测试任务可能衡量的技能,用于构建验证或生成诊断信息等目的。尽管它在这些努力中发挥着关键作用,但很少有研究将其结果与其他类型的数据(例如报告的考试过程)进行三角测量。缺乏这种三角测量可能会使专家判断的有效性受到质疑,并损害建立在这些判断基础上的后续程序的可信度。有鉴于此,本研究将两组语言专家对两组阅读测试任务内容的判断与十名大学生解决这些任务的口头报告进行了比较。结果发现,对于大约 53% 的测试任务,它们主要评估的内容在两个信息源之间实现了收敛。然而,在每项任务所涉及的具体技能方面,他们之间存在更大的差距。仔细检查两个来源之间的差异后发现,它们可归因于多种因素。本研究强调需要将专家判断的结果与其他数据源进行交叉检查。还讨论了对未来测试开发和研究的影响。本研究强调需要将专家判断的结果与其他数据源进行交叉检查。还讨论了对未来测试开发和研究的影响。本研究强调需要将专家判断的结果与其他数据源进行交叉检查。还讨论了对未来测试开发和研究的影响。

更新日期:2021-03-23
down
wechat
bug