当前位置: X-MOL 学术Journal of Semitic Studies › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Verified Arabic-IPA Mapping for Arabic Transcription Technology, Informed by Quranic Recitation, Traditional Arabic Linguistics, and Modern Phonetics
Journal of Semitic Studies Pub Date : 2016-03-01 , DOI: 10.1093/jss/fgv035
Clare Brierley , Majdi Sawalha , Barry Heselwood , Eric Atwell

In this paper, we present a detailed mapping from the graphemes of Modern Standard Arabic (MSA) to symbols from the International Phonetic Alphabet (IPA) for automated transcription of Arabic text. This mapping is distinctive in several ways. First, the corpus used in rule development is the full text of the Qur’ān rendered in fully pointed MSA. Second, we validate our scheme via automaticallygenerated frequency distributions of Arabic letters and diacritics over the whole corpus to anticipate and disambiguate non-trivial, compound grapheme-to-phoneme events, thus reducing the number of letter-to-sound rules. Such difficult cases include: the definite article; the letters alif, wāw, and yāʼ; the variant forms of hamza; the tanwīn case mark; and words with special pronunciations. Finally, our mapping scheme is informed by theory and practice from medieval Arabic linguistics and traditional Quranic recitation or tajwīd; we make a novel contribution with new translations for ancient terms which incorporate concepts familiar to modern phoneticians. Our principal objective in automating Arabic-IPA transcription is to generate phonemic citation forms of Arabic words to enhance Arabic dictionaries, to facilitate Arabic language learning, and for natural language engineering applications.

中文翻译:

通过古兰经背诵、传统阿拉伯语言学和现代语音学为阿拉伯语转录技术验证的阿拉伯语-IPA 映射

在本文中,我们展示了从现代标准阿拉伯语 (MSA) 的字素到国际音标 (IPA) 符号的详细映射,用于自动转录阿拉伯语文本。这种映射在几个方面是与众不同的。首先,规则制定中使用的语料库是在完全指向的 MSA 中呈现的古兰经全文。其次,我们通过在整个语料库中自动生成阿拉伯字母和变音符号的频率分布来验证我们的方案,以预测和消除非平凡的复合字素到音素事件,从而减少字母到声音规则的数量。这种困难的情况包括:定冠词;字母 alif、wāw 和 yāʼ;hamza 的变体形式;tanwīn 案例标记;和有特殊发音的词。最后,我们的制图方案来自中世纪阿拉伯语言学和传统古兰经背诵或 tajwīd 的理论和实践;我们对古代术语的新翻译做出了新的贡献,这些翻译融合了现代语音学家熟悉的概念。我们自动化阿拉伯语-IPA 转录的主要目标是生成阿拉伯语单词的音素引用形式,以增强阿拉伯语词典,促进阿拉伯语学习,并用于自然语言工程应用。
更新日期:2016-03-01
down
wechat
bug