当前位置: X-MOL 学术J. Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An approach for document retrieval using cluster-based inverted indexing
Journal of Information Science ( IF 1.8 ) Pub Date : 2021-06-14 , DOI: 10.1177/01655515211018401
Gunjan Chandwani 1 , Anil Ahlawat 2 , Gaurav Dubey 3
Affiliation  

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.



中文翻译:

一种使用基于簇的倒排索引进行文档检索的方法

文档检索在知识管理中起着重要作用,因为它有助于我们从现有数据中发现相关信息。本文提出了一种用于文档检索的基于簇的倒排索引算法。首先,进行预处理以从文档中去除不必要和冗余的词。然后,通过集成分段模糊C开发的基于簇的倒排索引算法对文档进行索引。-means (piFCM) 聚类算法和倒排索引。在为文档提供索引后,使用 Bhattacharyya 距离对用户查询执行查询匹配。最后通过Pearson相关系数进行查询优化,检索出相关文档。通过WebKB数据集和20个新闻组数据集分析了所提出算法的性能。分析表明,所提出的算法提供了高性能,精度为 1,召回率为 0.70,F度量为 0.8235。提议的文件检索系统检索最相关的文件并加快信息的存储和检索。

更新日期:2021-06-15
down
wechat
bug