Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine-Scored Syntax: Comparison of the CLAN Automatic Scoring Program to Manual Scoring.
Language, Speech, and Hearing Services in Schools ( IF 2.2 ) Pub Date : 2020-03-18 , DOI: 10.1044/2019_lshss-19-00056
Jenny A Roberts 1 , Evelyn P Altenberg 1 , Madison Hunter 2
Affiliation  

Purpose The results of automatic machine scoring of the Index of Productive Syntax from the Computerized Language ANalysis (CLAN) tools of the Child Language Data Exchange System of TalkBank (MacWhinney, 2000) were compared to manual scoring to determine the accuracy of the machine-scored method. Method Twenty transcripts of 10 children from archival data of the Weismer Corpus from the Child Language Data Exchange System at 30 and 42 months were examined. Measures of absolute point difference and point-to-point accuracy were compared, as well as points erroneously given and missed. Two new measures for evaluating automatic scoring of the Index of Productive Syntax were introduced: Machine Item Accuracy (MIA) and Cascade Failure Rate- these measures further analyze points erroneously given and missed. Differences in total scores, subscale scores, and individual structures were also reported. Results Mean absolute point difference between machine and hand scoring was 3.65, point-to-point agreement was 72.6%, and MIA was 74.9%. There were large differences in subscales, with Noun Phrase and Verb Phrase subscales generally providing greater accuracy and agreement than Question/Negation and Sentence Structures subscales. There were significantly more erroneous than missed items in machine scoring, attributed to problems of mistagging of elements, imprecise search patterns, and other errors. Cascade failure resulted in an average of 4.65 points lost per transcript. Conclusions The CLAN program showed relatively inaccurate outcomes in comparison to manual scoring on both traditional and new measures of accuracy. Recommendations for improvement of the program include accounting for second exemplar violations and applying cascaded credit, among other suggestions. It was proposed that research on machine-scored syntax routinely report accuracy measures detailing erroneous and missed scores, including MIA, so that researchers and clinicians are aware of the limitations of a machine-scoring program. Supplemental Material https://doi.org/10.23641/asha.11984364.

中文翻译:

机器评分的语法:CLAN自动评分程序与手动评分的比较。

目的通过对TalkBank儿童语言数据交换系统(MacWhinney,2000)的计算机语言分析(CLAN)工具对生产语法索引进行自动机器评分的结果与手动评分的比较,来确定机器评分的准确性方法。方法在30和42个月时,从儿童语言数据交换系统的Weismer语料库的档案数据中检查了10名儿童的20个笔录。比较了绝对点差和点对点精度的度量,以及错误给出和遗漏的点。引入了两种用于评估生产语法索引自动评分的新措施:机器项目准确性(MIA)和级联失败率-这些措施进一步分析了错误给出和遗漏的点。总分,分量表分,并报告了单个结构。结果机器和手得分的平均绝对分差为3.65,点对点一致性为72.6%,MIA为74.9%。在子量表上存在很大差异,名词短语和动词短语子量表通常比问题/否定和句子结构子量表提供更高的准确性和一致性。机器计分中的错误比遗漏的项目多得多,这归因于元素标记错误,搜索模式不准确以及其他错误的问题。级联失败导致每个转录本平均损失4.65分。结论与传统的和新的准确性度量的人工评分相比,CLAN计划显示出相对不准确的结果。改善计划的建议包括对第二个范本的违法行为进行会计处理,并应用级联信用,以及其他建议。有人提出,对机器评分语法的研究应定期报告准确性措施,详细说明包括MIA在内的错误分数和遗漏分数,以便研究人员和临床医生意识到机器评分程序的局限性。补充材料https://doi.org/10.23641/asha.11984364。
更新日期:2020-03-18
down
wechat
bug