A comparative review of Urdu stemmers: Approaches and challenges,Computer Science Review

当前位置： X-MOL 学术 › Comput. Sci. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A comparative review of Urdu stemmers: Approaches and challenges
Computer Science Review ( IF 12.9 ) Pub Date : 2019-09-25 , DOI: 10.1016/j.cosrev.2019.100195
Abdul Jabbar , Saif ul Islam , Shafiq Hussain , Adnan Akhunzada , Manzoor Ilahi

With the advent of globalization epoch, the Internet-based resources for Urdu are increasing in depth and breadth at a higher pace than ever and thus require a mechanism for computational processing of Urdu text. Information retrieval (IR) systems have now become the major tool for seeking varied information on the web. It uses variant forms of the word transformed through stemmer. Broadly speaking, current Urdu stemmers can be categorized into two major categories: linguistic-based stemmers and statistical stemmers. In this paper, the authors explain the applications where stemming is used as a first step and highlight the challenges in Urdu text stemming. This is the first comparative study of the state-of-the-art Urdu stemmers, based on various distinct features such as used approach, main idea, limitations, the rules or affixes, data set, evaluation criteria and claimed accuracy. A comparative analysis, among state-of-the-art Urdu stemmers, is performed by using the standard data set. Finally, we outline the relevant research gaps in the literature and suggest recommendations for future research on Urdu text stemming.

中文翻译：

乌尔都语词干对比研究：方法和挑战

随着全球化时代的到来，用于Urdu的基于Internet的资源的深度和广度以前所未有的速度增长，因此需要一种用于处理Urdu文本的机制。信息检索（IR）系统现已成为在网络上查找各种信息的主要工具。它使用通过词干转换词的变体形式。广义上讲，当前的乌尔都语词干可以分为两大类：基于语言的词干和统计词干。在本文中，作者解释了使用词干作为第一步的应用程序，并重点介绍了乌尔都语文本词干的挑战。这是对最先进的乌尔都语词干分析器的首次比较研究，其基于各种不同的特征，例如使用的方法，主要思想，局限性，规则或词缀，数据集，评估标准和要求的准确性。使用标准数据集对最先进的乌尔都语茎干进行比较分析。最后，我们概述了文献中的相关研究空白，并对乌尔都语文本词干的未来研究提出了建议。

更新日期：2019-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>