当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-22 , DOI: arxiv-2011.10970
Muhammad Abdul-Mageed, Shady Elbassuoni, Jad Doughman, AbdelRahim Elmadany, El Moatez Billah Nagoudi, Yorgo Zoughby, Ahmad Shaher Iskander Gaba, Ahmed Helal, Mohammed El-Razzaz

Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.

中文翻译:

DiaLex:评估多方言阿拉伯语单词嵌入的基准

词嵌入是现代自然语言处理系统的核心组件,因此对其进行全面评估的能力成为一项至关重要的任务。我们描述了DiaLex,这是对方言阿拉伯语单词嵌入进行内在评估的基准。DiaLex涵盖了五个重要的阿拉伯方言:阿尔及利亚,埃及,黎巴嫩,叙利亚和突尼斯。在这些方言中,DiaLex为六个句法和语义关系提供了一个测试库,即男性对女性,单数对偶,单数对复数,反义词,可比较的和与过去时的关系。因此,DiaLex由单词对的集合组成,这些单词对代表五个方言中的每个六个关系。为了演示DiaLex的实用性,我们使用它来评估我们开发的一组现有和新的阿拉伯语单词嵌入。我们的基准,评估代码,
更新日期:2020-11-25
down
wechat
bug