当前位置: X-MOL 学术Applied Linguistics Review › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Building natural language processing tools for Runyakitara
Applied Linguistics Review ( IF 2.1 ) Pub Date : 2020-07-13 , DOI: 10.1515/applirev-2020-2004
Fridah Katushemererwe 1 , Andrew Caines 2 , Paula Buttery 3
Affiliation  

This paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore-Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.

中文翻译:

为Runyakitara构建自然语言处理工具

本文介绍了为Runyakitara构建自然语言处理(NLP)工具的努力,Runyakitara是乌干达西部使用的四种紧密相关的班图语。与主要的世界语言(例如英语)相比,其语料库相对丰富并且NLP工具得到了很好的开发,而Runyakitara的计算语言资源却供不应求。因此,首先,我们需要收集这些语言的语料库,然后才能进行拼写检查器,语法检查器和计算机辅助语言学习(CALL)的应用程序的设计。我们将说明我们如何收集新的Runya语料库的主要数据,并概述形态分析器的设计,并讨论如何使用这些新资源来构建NLP工具。我们最初与Runyankore-Rukiga合作,一对紧密相关的Runyakitara语言,我们在NLP框架中构建资源匮乏的语言,并在CALL框架中保护濒危语言。我们提出了我们的项目,作为通过教育和技术振兴濒危语言的测试案例。
更新日期:2020-07-13
down
wechat
bug