An Optimized k-means Algorithm Based on Information Entropy,The Computer Journal

当前位置： X-MOL 学术 › Comput. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Optimized k-means Algorithm Based on Information Entropy
The Computer Journal ( IF 1.4 ) Pub Date : 2021-05-15 , DOI: 10.1093/comjnl/bxab078
Meiling Liu _{1,

2} , Beixian Zhang ₂ , Xi Li ₁ , Weidong Tang ₁ , GangQiang Zhang ₁

Affiliation

Clustering is a widely used technique in data mining applications and various pattern recognition applications, in which data objects are divided into groups. K-means algorithm is one of the most classical clustering algorithms. In this algorithm, the initial clustering centers are randomly selected, this results in unstable clustering results. To solve this problem, an optimized algorithm to select the initial centers is proposed. In the proposed algorithm, dispersion degree is defined, which is based on entropy. In the algorithm, all the objects are firstly grouped into a big cluster, and the object that has the maximum dispersion degree and the object that has the minimum dispersion degree are selected as the initial clustering centers from the initial big cluster. And then other objects in the biggest cluster are partitioned to the initial clusters to which the objects are nearest. The partition process will be repeated until the cluster number is equal to the specified value k. Finally, the partitioned k clusters and their cluster centers are applied to k-means algorithm as initial clusters and centers. Several experiments are conducted on real data sets to evaluate the proposed algorithm. The proposed algorithm is compared with traditional k-means algorithm and max-min distance clustering algorithm, and experimental results show that the improved k-means algorithm is stable in selecting initial clustering, because it can select unique initial clustering centers. The optimized algorithm’s effectiveness and feasibility are also verified by experiments, and the algorithm can reduce the times of iterations and has more stable clustering results and higher accuracy.

中文翻译：

一种基于信息熵的优化k-means算法

聚类是数据挖掘应用和各种模式识别应用中广泛使用的技术，其中数据对象被分组。K-means算法是最经典的聚类算法之一。在该算法中，初始聚类中心是随机选择的，导致聚类结果不稳定。为了解决这个问题，提出了一种优化的初始中心选择算法。在所提出的算法中，离散度是基于熵定义的。该算法首先将所有对象组成一个大簇，从初始大簇中选择离散度最大的对象和离散度最小的对象作为初始聚类中心。然后将最大簇中的其他对象划分为对象最近的初始簇。将重复划分过程，直到簇数等于指定的值 k。最后，将划分的 k 个簇及其簇中心作为初始簇和中心应用于 k-means 算法。在真实数据集上进行了几个实验来评估所提出的算法。将所提算法与传统k-means算法和max-min距离聚类算法进行了比较，实验结果表明改进的k-means算法在选择初始聚类时是稳定的，因为它可以选择唯一的初始聚类中心。优化算法的有效性和可行性也通过实验得到验证，

更新日期：2021-05-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>