当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting native language interference for native language identification
Natural Language Engineering ( IF 2.5 ) Pub Date : 2020-11-26 , DOI: 10.1017/s1351324920000595
Ilia Markov 1 , Vivi Nastase 2 , Carlo Strapparava 3
Affiliation  

Native language identification (NLI)—the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2)—is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages’ structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.

中文翻译:

利用母语干扰进行母语识别

母语识别 (NLI) - 根据人们使用第二语言 (L2) 的写作自动识别他们的母语 (L1) 的任务 - 是基于这样的假设,即 L1 的特征将出现并干扰文本的生成在 L2 中,只要 L1 是可识别的。我们对在 NLI 任务的上下文中可能涉及母语干扰的各种语言现象建模的特征进行了深入调查:语言通过标点符号使用的信息结构、语言中的情感表达以及形式的相似性通过使用英语单词、同源词和其他拼写错误来学习 L1 词汇。在各种设置中使用不同特征组合的实验结果使我们能够量化这些语言现象的母语干扰值,并显示它们在跨语料库实验中的鲁棒性以及关于 L2 的熟练程度。这些实验提供了对 NLI 任务的更深入了解,展示了母语干扰如何解释基线、独立于语料库的特征与依赖于涵盖(不加选择地)各种语言现象的特征/表示的现有技术之间的差距。
更新日期:2020-11-26
down
wechat
bug