当前位置: X-MOL 学术arXiv.cs.FL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2021-09-14 , DOI: arxiv-2109.06374
Sina Ahmadi

Spell checking and morphological analysis are two fundamental tasks in text and natural language processing and are addressed in the early stages of the development of language technology. Despite the previous efforts, there is no progress in open-source to create such tools for Sorani Kurdish, also known as Central Kurdish, as a less-resourced language. In this paper, we present our efforts in annotating a lexicon with morphosyntactic tags and also, extracting morphological rules of Sorani Kurdish to build a morphological analyzer, a stemmer and a spell-checking system using Hunspell. This implementation can be used for further developments in the field by researchers and also, be integrated into text editors under a publicly available license.


Hunspell 用于 Sorani 库尔德语拼写检查和形态分析

拼写检查和形态分析是文本和自然语言处理中的两项基本任务,在语言技术发展的早期阶段就得到解决。尽管之前做出了努力,但在为 Sorani Kurdish(也称为 Central Kurdish,作为一种资源较少的语言)创建此类工具的开源方面没有取得任何进展。在本文中,我们展示了我们在用形态句法标签注释词典以及提取 Sorani Kurdish 的形态规则以使用 Hunspell 构建形态分析器、词干分析器和拼写检查系统方面所做的努力。此实现可用于研究人员在该领域的进一步开发,也可在公开许可下集成到文本编辑器中。