当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?
arXiv - CS - Computation and Language Pub Date : 2021-02-25 , DOI: arxiv-2102.12971
Taraka Rama, Sowmya Vajjala

Development of language proficiency models for non-native learners has been an active area of interest in NLP research for the past few years. Although language proficiency is multidimensional in nature, existing research typically considers a single "overall proficiency" while building models. Further, existing approaches also considers only one language at a time. This paper describes our experiments and observations about the role of pre-trained and fine-tuned multilingual embeddings in performing multi-dimensional, multilingual language proficiency classification. We report experiments with three languages -- German, Italian, and Czech -- and model seven dimensions of proficiency ranging from vocabulary control to sociolinguistic appropriateness. Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency. All code, data and related supplementary material can be found at: https://github.com/nishkalavallabhi/MultidimCEFRScoring.

中文翻译:

预训练的文本表示形式对多语言和多维语言能力建模有用吗?

在过去的几年中,为非母语学习者开发语言能力模型一直是NLP研究中的一个活跃领域。尽管语言能力本质上是多维的,但现有研究通常会在构建模型时考虑单个“总体能力”。此外,现有方法也一次只考虑一种语言。本文介绍了有关预训练和微调的多语言嵌入在执行多维,多语言语言能力分类中的作用的实验和观察。我们报告了三种语言的实验-德语,意大利语和捷克语-并模拟了从词汇控制到社会语言适当性的七个熟练度维度。我们的结果表明,虽然微调的嵌入对于多语言熟练度建模很有用,但所有功能都无法在语言熟练度的所有维度上始终如一地获得最佳性能。所有代码,数据和相关补充材料都可以在以下网址找到:https://github.com/nishkalavallabhi/MultidimCEFRScoring。
更新日期:2021-02-26
down
wechat
bug