当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving stemming for Assamese information retrieval
International Journal of Information Technology Pub Date : 2021-07-10 , DOI: 10.1007/s41870-021-00718-7
Arjun Gogoi 1 , Nomi Baruah 1 , Rakhee D. Phukan 1 , Sikhar Kr. Sarma 2
Affiliation  

To enhance the Assamese stemmer several approaches and solutions by researchers have been proposed. Such stemmers are important as the features are often applied for application-oriented projects, and especially, to develop information retrieval (IR) systems. Assamese stemming could be defined as a process that strips off a set of suffixes from words. But this process also has certain set back such as vocalization ambiguity, incorrect removal, single solution, etc. In this paper, we have proposed an Assamese stemmer that provides solutions to various drawbacks as proposed earlier and to make use of various features as mentioned above efficiently. We have tested using 20,000 words from 16 different articles, all possible suffixes in the Assamese language were manually collected taking the help of an Assamese linguistic expert. It has achieved quite better accuracy with 86.16%. Also, the accuracy of the system is compared with other existing approaches and our system outperforms all the others. Besides, we proposed an automatic approach for the evaluation and comparison of Assamese stemmers that takes into account metrics related to the accuracy of results.



中文翻译:

改进阿萨姆语信息检索的词干提取

为了增强阿萨姆语词干分析器,研究人员提出了几种方法和解决方案。此类词干分析器很重要,因为这些功能通常用于面向应用的项目,尤其是开发信息检索 (IR) 系统。阿萨姆语词干可以定义为从单词中去除一组后缀的过程。但是这个过程也有一定的挫折,比如发声歧义、不正确去除、单一解决方案等。 在本文中,我们提出了一个阿萨姆词干分析器,它为前面提出的各种缺点提供了解决方案,并利用了上述各种特征有效率的。我们使用来自 16 篇不同文章的 20,000 个单词进行了测试,所有可能的阿萨姆语后缀都是在阿萨姆语语言专家的帮助下手动收集的。它达到了 86.16% 的更高准确率。此外,将系统的准确性与其他现有方法进行比较,我们的系统优于所有其他方法。此外,我们提出了一种自动评估和比较阿萨姆语词干分析器的方法,该方法考虑了与结果准确性相关的指标。

更新日期:2021-07-12
down
wechat
bug