当前位置: X-MOL 学术Comput. J. › 论文详情
Optimal Text Document Clustering Enabled by Weighed Similarity Oriented Jaya With Grey Wolf Optimization Algorithm
The Computer Journal ( IF 1.077 ) Pub Date : 2021-04-30 , DOI: 10.1093/comjnl/bxab013
Gugulothu Venkanna, Dr K F Bharati

Owing to scientific development, a variety of challenges present in the field of information retrieval. These challenges are because of the increased usage of large volumes of data. These huge amounts of data are presented from large-scale distributed networks. Centralization of these data to carry out analysis is tricky. There exists a requirement for novel text document clustering algorithms, which overcomes challenges in clustering. The two most important challenges in clustering are clustering accuracy and quality. For this reason, this paper intends to present an ideal clustering model for text document using term frequency–inverse document frequency, which is considered as feature sets. Here, the initial centroid selection is much concentrated which can automatically cluster the text using weighted similarity measure in the proposed clustering process. In fact, the weighted similarity function involves the inter-cluster, and intra-cluster similarity of both ordered and unordered documents, which is used to minimize weighted similarity among the documents. An advanced model for clustering is proposed by the hybrid optimization algorithm, which is the combination of the Jaya Algorithm (JA) and Grey Wolf Algorithm (GWO), and so the proposed algorithm is termed as JA-based GWO. Finally, the performance of the proposed model is verified through a comparative analysis with the state-of-the-art models. The performance analysis exhibits that the proposed model is 96.56% better than genetic algorithm, 99.46% better than particle swarm optimization, 97.09% superior to Dragonfly algorithm, and 96.21% better than JA for the similarity index. Therefore, the proposed model has confirmed its efficiency through valuable analysis.

中文翻译:

加权相似度的Jaya与灰狼优化算法的最优文本文档聚类

由于科学的发展,信息检索领域面临着各种各样的挑战。这些挑战是由于海量数据使用量的增加。这些大量的数据是从大型分布式网络中呈现的。这些数据的集中进行分析是棘手的。存在对新颖的文本文档聚类算法的需求,该算法克服了聚类中的挑战。聚类中两个最重要的挑战是聚类的准确性和质量。因此,本文打算使用术语频率-逆文档频率,将其视为文本集,为文本文档提供理想的聚类模型,该模型被视为特征集。这里,初始质心选择非常集中,可以在建议的聚类过程中使用加权相似性度量自动将文本聚类。实际上,加权相似度函数涉及有序和无序文档的簇间和簇内相似度,这用于最小化文档之间的加权相似度。混合优化算法提出了一种先进的聚类模型,该模型是Jaya算法(JA)和Gray Wolf算法(GWO)的组合,因此该算法被称为基于JA的GWO。最后,通过与最新模型的比较分析,验证了所提出模型的性能。性能分析表明,所提出的模型比遗传算法99改进了96.56%。相似度指标比粒子群优化算法好46%,比Dragonfly算法好97.09%,比JA好96.21%。因此,所提出的模型已经通过有价值的分析证实了其有效性。
更新日期:2021-05-04
全部期刊列表>>
欢迎新作者ACS
聚焦环境污染物
专攻离子通道生理学研究
中国作者高影响力研究精选
虚拟特刊
屿渡论文,编辑服务
浙大
上海中医药大学
苏州大学
江南大学
四川大学
灵长脑研究中心
毛凌玲
南开大学陈瑶
朱如意
中科院
南开大学
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
华辉
天合科研
x-mol收录
试剂库存
down
wechat
bug