DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DISCERN: diversity-based selection of centroids for k-estimation and rapid non-stochastic clustering
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-09-21 , DOI: 10.1007/s13042-020-01193-5
Ali Hassani , Amir Iranmanesh , Mahdi Eftekhari , Abbas Salemi

One of the applications of center-based clustering algorithms such as K-means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-means for each possible K. In order to address this issue, we propose a K-means initialization similar to K-means++, which would be able to estimate K based on the feature space while finding suitable initial centroids for K-means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practical K estimation methods, while also comparing clustering results of K-means when initialized randomly, using K-means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance.

中文翻译：

DISCERN：基于多样性的质心选择，用于k估计和快速非随机聚类

基于中心的聚类算法（例如K-means）的应用之一是将数据点划分为K个聚类。在某些示例中，特征空间与我们试图解决的潜在问题有关，有时我们可以获得合适的特征空间。尽管如此，尽管K-means是最有效的离线聚类算法之一，但它无法估算聚类的数量，这在某些实际情况下很有用。其他实际的方法确实太复杂了，因为它们对于每个可能的K至少需要运行一次K均值。为了解决这个问题，我们提出了一种类似于K-means ++的K-means初始化方法，该方法可以估算K基于特征空间，同时以确定性方式找到适合K均值的初始质心。然后，我们将提出的方法DISCERN与一些最实用的K估计方法进行了比较，同时还比较了使用K-means ++和DISCERN随机初始化时K-means的聚类结果。结果表明，在估计和最终聚类性能上都有改进。

更新日期：2020-09-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11