Machine Learning Approach to Suffix Separation on a Sandhi Rule Annotated Malayalam Data Set,South Asia Research

当前位置： X-MOL 学术 › South Asia Research › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine Learning Approach to Suffix Separation on a Sandhi Rule Annotated Malayalam Data Set
South Asia Research ( IF 0.6 ) Pub Date : 2020-05-29 , DOI: 10.1177/0262728020915567
Mary Priya Sebastian _{1,

2} , G. Santhosh Kumar _{2,

3}

Affiliation

This article explores in depth various sandhi (joining) rules in Kerala’s Malayalam language, which play a vital role in framing of the inflected and agglutinated forms of words and their compounds. It discusses significant progress in a scientific method to generate a specific annotated data set of Malayalam words that would be useful in many Natural Language Processing tasks which involve Malayalam preprocessing. The article discusses the results and issues encountered in developing this word-splitting tool for Malayalam, mainly in the context of improving the alignments between parallel texts that form a core resource in the Machine Translation task.

中文翻译：

在 Sandhi 规则注释的马拉雅拉姆语数据集上进行后缀分离的机器学习方法

本文深入探讨了喀拉拉邦马拉雅拉姆语中的各种连接规则，这些规则在构成词的屈折和粘连形式及其复合词的框架中起着至关重要的作用。它讨论了在生成特定的马拉雅拉姆语单词注释数据集的科学方法方面取得的重大进展，该数据集在涉及马拉雅拉姆语预处理的许多自然语言处理任务中非常有用。本文讨论了在为马拉雅拉姆语开发这种分词工具时遇到的结果和问题，主要是在改进平行文本之间的对齐方面，这些平行文本是机器翻译任务中的核心资源。

更新日期：2020-05-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文