Clustering Large Datasets by Merging K-Means Solutions,Journal of Classification

当前位置： X-MOL 学术 › J. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering Large Datasets by Merging K-Means Solutions
Journal of Classification ( IF 1.8 ) Pub Date : 2019-03-29 , DOI: 10.1007/s00357-019-09314-8
Volodymyr Melnykov , Semhar Michael

Existing clustering methods range from simple but very restrictive to complex but more flexible. The K-means algorithm is one of the most popular clustering procedures due to its computational speed and intuitive construction. Unfortunately, the application of K-means in its traditional form based on Euclidean distances is limited to cases with spherical clusters of approximately the same volume and spread of points. Recent developments in the area of merging mixture components for clustering show good promise. We propose a general framework for hierarchical merging based on pairwise overlap between components which can be readily applied in the context of the K-means algorithm to produce meaningful clusters. Such an approach preserves the main advantage of the K-means algorithm—its speed. The developed ideas are illustrated on examples, studied through simulations, and applied to the problem of digit recognition.

中文翻译：

通过合并 K-Means 解决方案对大型数据集进行聚类

现有的聚类方法范围从简单但非常严格到复杂但更灵活。由于其计算速度和直观的构造，K-means 算法是最流行的聚类程序之一。不幸的是，基于欧几里德距离的传统形式的 K 均值的应用仅限于具有近似相同体积和点分布的球形簇的情况。合并用于聚类的混合成分领域的最新发展显示出良好的前景。我们提出了一个基于组件之间成对重叠的分层合并的通用框架，该框架可以很容易地应用于 K-means 算法的上下文中以生成有意义的集群。这种方法保留了 K-means 算法的主要优势——它的速度。已开发的想法在示例中得到说明，

更新日期：2019-03-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11