当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Language resources for Maghrebi Arabic dialects’ NLP: a survey
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2020-04-25 , DOI: 10.1007/s10579-020-09490-9
Jihene Younes , Emna Souissi , Hadhemi Achour , Ahmed Ferchichi

Diglossia is one of the main characteristics of Arabic language. In Arab countries, there are three forms of Arabic that co-exist: Classical Arabic (CA) which is mainly used in the Quran and in several classical literary texts, Modern Standard Arabic (MSA) that descends from CA and used as official language, and various regional colloquial varieties of Arabic that are usually referred to as Arabic dialects (AD). Deemed to be amongst low-resource languages, these dialects have aroused increased interest among the NLP community in recent years. Indeed, the various Arabic dialects are increasingly used on the social web and may be transcribed in both the Arabic and the Latin script. The latter is known as Arabizi and seems to be more frequently used for some of them. The AD NLP raises many challenges and requires the availability of large and appropriate language resources. In this study, we focus, in particular, on the Maghrebi Arabic dialects (MADs). We propose a thorough review of the language resources (LRs) that have been generated by the various work carried out on the MAD language processing. A survey of the currently online available MAD NLP dedicated-LRs is also compiled and discussed. LRs investigated in this work are essentially data-resources such as primary and annotated corpora, lexica, dictionaries, ontologies, etc.



中文翻译:

Maghrebi阿拉伯方言的NLP语言资源:一项调查

专卖是阿拉伯语的主要特征之一。在阿拉伯国家/地区,阿拉伯语共存三种形式:古典阿拉伯语(CA),主要用于古兰经和几种古典文学作品中;现代标准阿拉伯语(MSA),源于CA,并用作官方语言;以及阿拉伯语的各种区域口语变种,通常称为阿拉伯方言(AD)。这些方言被认为是资源贫乏的语言之一,近年来引起了NLP社区的越来越多的关注。确实,各种阿拉伯方言在社交网络上的使用越来越多,并且可以转录成阿拉伯文和拉丁文。后者被称为Arabizi,并且似乎更常用于其中一些。AD NLP提出了许多挑战,并需要大量适当的语言资源。在这项研究中,我们特别关注Maghrebi阿拉伯方言(MAD)。我们建议对MAD语言处理中进行的各种工作所产生的语言资源(LR)进行彻底的审查。还对当前在线可用的MAD NLP专用LR进行了调查和讨论。在这项工作中研究的LR基本上是数据资源,例如主要和注释语料库,词典,词典,本体等。还对当前在线可用的MAD NLP专用LR进行了调查和讨论。在这项工作中研究的LR基本上是数据资源,例如主要和注释语料库,词典,词典,本体等。还对当前在线可用的MAD NLP专用LR进行了调查和讨论。在这项工作中研究的LR基本上是数据资源,例如主要和注释语料库,词典,词典,本体等。

更新日期:2020-04-25
down
wechat
bug