当前位置: X-MOL 学术Comput. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Text Document Clustering Enabled by Weighed Similarity Oriented Jaya With Grey Wolf Optimization Algorithm
The Computer Journal ( IF 1.4 ) Pub Date : 2021-04-26 , DOI: 10.1093/comjnl/bxab013
Gugulothu Venkanna 1 , Dr K F Bharati 2
Affiliation  

Owing to scientific development, a variety of challenges present in the field of information retrieval. These challenges are because of the increased usage of large volumes of data. These huge amounts of data are presented from large-scale distributed networks. Centralization of these data to carry out analysis is tricky. There exists a requirement for novel text document clustering algorithms, which overcomes challenges in clustering. The two most important challenges in clustering are clustering accuracy and quality. For this reason, this paper intends to present an ideal clustering model for text document using term frequency–inverse document frequency, which is considered as feature sets. Here, the initial centroid selection is much concentrated which can automatically cluster the text using weighted similarity measure in the proposed clustering process. In fact, the weighted similarity function involves the inter-cluster, and intra-cluster similarity of both ordered and unordered documents, which is used to minimize weighted similarity among the documents. An advanced model for clustering is proposed by the hybrid optimization algorithm, which is the combination of the Jaya Algorithm (JA) and Grey Wolf Algorithm (GWO), and so the proposed algorithm is termed as JA-based GWO. Finally, the performance of the proposed model is verified through a comparative analysis with the state-of-the-art models. The performance analysis exhibits that the proposed model is 96.56% better than genetic algorithm, 99.46% better than particle swarm optimization, 97.09% superior to Dragonfly algorithm, and 96.21% better than JA for the similarity index. Therefore, the proposed model has confirmed its efficiency through valuable analysis.

中文翻译:

基于加权相似性的 Jaya 与灰狼优化算法实现的最佳文本文档聚类

由于科学的发展,信息检索领域面临着各种挑战。这些挑战是由于大量数据的使用增加。这些海量数据来自大规模分布式网络。集中这些数据进行分析是很棘手的。需要一种新颖的文本文档聚类算法,以克服聚类中的挑战。聚类中最重要的两个挑战是聚类准确性和质量。出于这个原因,本文打算提出一个理想的文本文档聚类模型,使用词频-逆文档频率,将其视为特征集。这里,初始质心选择非常集中,可以在建议的聚类过程中使用加权相似度度量自动聚类文本。实际上,加权相似度函数涉及有序和无序文档的簇间和簇内相似度,用于最小化文档之间的加权相似度。混合优化算法是Jaya算法(JA)和灰狼算法(GWO)的结合,提出了一种先进的聚类模型,因此该算法被称为基于JA的GWO。最后,通过与最先进模型的比较分析,验证了所提出模型的性能。性能分析表明,所提出的模型比遗传算法 99 好 96.56%。比粒子群优化46%,比蜻蜓算法好97.09%,相似度指标比JA好96.21%。因此,所提出的模型通过有价值的分析证实了它的有效性。
更新日期:2021-04-26
down
wechat
bug