当前位置: X-MOL 学术Comput. Math. Method Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.
Computational and Mathematical Methods in Medicine ( IF 2.809 ) Pub Date : 2020-08-01 , DOI: 10.1155/2020/7636857
Saeedeh Pourahmad 1, 2 , Atefeh Basirat 2 , Amir Rahimi 1, 3 , Marziyeh Doostfatemeh 2
Affiliation  

Random selection of initial centroids (centers) for clusters is a fundamental defect in -means clustering algorithm as the algorithm’s performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in -means clustering algorithm. As regards, there are no comparative studies comparing these methods in various aspects, the present paper compared three hybrid methods with -means clustering algorithm using concepts of genetic algorithm, minimum spanning tree, and hierarchical clustering method. Although these three hybrid methods have received more attention in previous researches, fewer studies have compared their results. Hence, seven quantitative datasets with different characteristics in terms of sample size, number of features, and number of different classes are utilized in present study. Eleven indices of external and internal evaluating index were also considered for comparing the methods. Data indicated that the hybrid methods resulted in higher convergence rate in obtaining the final solution than the ordinary -means method. Furthermore, the hybrid method with hierarchical clustering algorithm converges to the optimal solution with less iteration than the other two hybrid methods. However, hybrid methods with minimal spanning trees and genetic algorithms may not always or often be more effective than the ordinary -means method. Therefore, despite the computational complexity, these three hybrid methods have not led to much improvement in the -means method. However, a simulation study is required to compare the methods and complete the conclusion.

中文翻译:

确定初始聚类质心是否会改善K-Means聚类算法的性能?遗传算法,最小生成树和分层聚类三种混合方法的比较研究。

聚类的初始质心(中心)的随机选择是-聚类算法的一个基本缺陷,因为该算法的性能取决于初始质心,并且可能最终会进行局部优化。各种混合方法已经被引入来解决这个缺陷-均值聚类算法。就目前而言,尚无比较研究比较这些方法的各个方面,本文将三种混合方法与-表示使用遗传算法,最小生成树和分层聚类方法概念的聚类算法。尽管这三种混合方法在以前的研究中受到了更多的关注,但是很少有研究比较它们的结果。因此,在本研究中使用了七个在样本量,特征数量和不同类别数量方面具有不同特征的定量数据集。为了比较这些方法,还考虑了11个外部和内部评估指标。数据表明,混合方法在获得最终解决方案方面比普通方法具有更高的收敛速度-表示方法。此外,与其他两种混合方法相比,具有分层聚类算法的混合方法收敛到最优解决方案,迭代次数更少。然而,以最小生成树和遗传算法的混合方法可能不总是或经常是比普通更有效- means法。因此,尽管计算复杂,但这三种混合方法并没有导致-均值方法的很大改进。但是,需要进行仿真研究以比较这些方法并完成结论。
更新日期:2020-08-01
down
wechat
bug