当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The k-means Algorithm: A Comprehensive Survey and Performance Evaluation
Electronics ( IF 2.6 ) Pub Date : 2020-08-12 , DOI: 10.3390/electronics9081295
Mohiuddin Ahmed , Raihan Seraj , Syed Mohammed Shamsul Islam

The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.

中文翻译:

k均值算法:综合调查和性能评估

K-均值聚类算法被认为是在研究界最强大和最流行的数据挖掘算法之一。然而,尽管其受欢迎,该算法仍具有某些局限性,包括与质心的随机初始化有关的问题,这会导致意外的收敛。另外,这种聚类算法需要预先定义聚类的数量,这导致了不同的聚类形状和离群值效应。k-means算法的基本问题是它无法处理各种数据类型。本文提供了针对k均值算法进行的研究的结构化和概要性概述,以克服此类缺点。k均值的变体讨论了算法及其最新进展,并在对各种数据集进行实验分析的基础上研究了算法的有效性。详细的实验分析以及不同k均值聚类算法之间的全面比较使我们的工作与其他现有调查论文有所不同。此外,它概述了对k-means算法及其不同研究方向的清晰而透彻的理解。
更新日期:2020-08-12
down
wechat
bug