当前位置: X-MOL 学术International Journal of Corpus Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dependency parsing of learner English
International Journal of Corpus Linguistics ( IF 1.6 ) Pub Date : 2018-05-31 , DOI: 10.1075/ijcl.16080.hua
Yan Huang 1 , Akira Murakami 2 , Theodora Alexopoulou 1 , Anna Korhonen 1
Affiliation  

Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.

中文翻译:

学习者英语的依存解析

当前大规模学习者语料库的句法注释主要依靠在母语数据上训练的“标准解析器”。了解这些解析器如何处理学习者数据对于与学习者语言相关的下游研究和应用很重要。本研究评估了多个标准概率解析器对学习者英语的性能。我们的贡献是三方面的。首先,我们证明了构建黄金标准的常见做法——通过手动更正单个解析器的预注释——可能会给解析器评估带来偏差。我们提出了一种可以控制注释偏差的替代注释方法。其次,我们量化了学习器错误对解析错误的影响,并找出对解析影响最大的学习器错误。最后,我们比较了解析器在学习者英语和母语英语上的性能。我们的结果对如何为学习者英语选择标准解析器具有有用的意义。
更新日期:2018-05-31
down
wechat
bug