Natural language processing for similar languages, varieties, and dialects: A survey,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Natural language processing for similar languages, varieties, and dialects: A survey
Natural Language Engineering ( IF 2.3 ) Pub Date : 2020-11-20 , DOI: 10.1017/s1351324920000492
Marcos Zampieri , Preslav Nakov , Yves Scherrer

There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.

中文翻译：

相似语言、变体和方言的自然语言处理：一项调查

最近，自然语言处理 (NLP) 社区对语言变体和方言的计算处理产生了很多兴趣，旨在提高机器翻译、语音识别和对话系统等应用程序的性能。在这里，我们试图调查这个不断发展的研究领域，重点关注用于处理相似语言、变体和方言的计算方法。特别是，我们讨论了处理变体语言变异时最重要的挑战，并介绍了一些可用的数据集、数据收集过程以及用于为相似语言、变体和方言编译数据集的最常见数据收集策略. 我们进一步提出了一些关于开发和/或适用于预处理的计算方法的研究，规范化、词性标记和解析相似的语言、语言变体和方言。最后，我们讨论了相关应用，例如语言和方言识别以及密切相关的语言、语言变体和方言的机器翻译。

更新日期：2020-11-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11