当前位置: X-MOL 学术Language Testing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Developing a level-specific checklist for assessing EFL writing
Language Testing ( IF 2.400 ) Pub Date : 2020-05-07 , DOI: 10.1177/0265532220916703
Zoltán Lukácsi 1
Affiliation  

In second language writing assessment, rating scales and scores from human-mediated assessment have been criticized for a number of shortcomings including problems with adequacy, relevance, and reliability (Hamp-Lyons, 1990; McNamara, 1996; Weigle, 2002). In its testing practice, Euroexam International also detected that the rating scales for writing at B2 had limited discriminating power and did not adequately reflect finer shades of candidate ability. This study sought to investigate whether a level-specific checklist of binary choice items could be designed to yield results that accurately reflect differential degrees of ability in EFL essay writing at level B2. The participants were four language teachers working as independent raters. The study involved the task materials, operational rating scales, reported scores, and candidate scripts from the May 2017 test administration. In a mixed-methods strategy of inquiry, qualitative data from stimulated recall, think-aloud protocols, and semi-structured interviews informed statistical test and item analyses. The results indicated that the checklist items were more transparent, led to increased variance, and contributed to a more coherent candidate language profile than scores from the rating scales. The implications support the recommendation that checklists should be used for level-specific language proficiency testing (Council of Europe, 2001, p. 189).

中文翻译:

制定特定级别的清单以评估 EFL 写作

在第二语言写作评估中,人类中介评估的评分量表和分数因许多不足而受到批评,包括充分性、相关性和可靠性问题(Hamp-Lyons,1990;McNamara,1996;Weigle,2002)。在其测试实践中,Euroexam International 还发现 B2 写作的评分量表具有有限的辨别力,并没有充分反映候选人能力的更精细程度。本研究旨在调查是否可以设计特定级别的二元选择项目清单,以产生准确反映 B2 级 EFL 论文写作能力差异程度的结果。参与者是四名作为独立评估员工作的语言教师。该研究涉及任务材料、操作评定量表、报告分数、和 2017 年 5 月测试管理的候选脚本。在混合方法的询问策略中,来自刺激回忆、大声思考协议和半结​​构化访谈的定性数据为统计测试和项目分析提供了信息。结果表明,与评分量表的分数相比,清单项目更透明,导致差异增加,并有助于形成更连贯的候选语言概况。其含义支持了检查表应用于特定级别语言能力测试的建议(欧洲委员会,2001 年,第 189 页)。结果表明,与评分量表的分数相比,清单项目更透明,导致差异增加,并有助于形成更连贯的候选语言概况。其含义支持以下建议:检查表应用于特定级别的语言能力测试(欧洲委员会,2001 年,第 189 页)。结果表明,与评分量表的分数相比,清单项目更透明,导致差异增加,并有助于形成更连贯的候选语言概况。其含义支持了检查表应用于特定级别语言能力测试的建议(欧洲委员会,2001 年,第 189 页)。
更新日期:2020-05-07
down
wechat
bug