k -Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint,Journal of Classification

当前位置： X-MOL 学术 › J. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

k -Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint
Journal of Classification ( IF 1.8 ) Pub Date : 2020-08-26 , DOI: 10.1007/s00357-020-09370-5
Andrzej Młodak

We analyze some possibilities of using contiguity (neighbourhood) matrix as a constraint in the clustering made by the k-means and Ward methods as well as by an approach based on distances and probabilistic assignments aimed at obtaining a solution of the multi-facility location problem (MFLP). That is, some special two-stage algorithms being the kinds of clustering with relational constraint are proposed. They optimize division of set of objects into clusters respecting the requirement that neighbours have to belong to the same cluster. In the case of the probabilistic d-clustering, relevant modification of its target function is suggested and studied. Versatile simulation study and empirical analysis verify the practical efficiency of these methods. The quality of clustering is assessed on the basis of indices of homogeneity, heterogeneity and correctness of clusters as well as the silhouette index. Using these tools and similarity indices (Rand, Peirce and Sokal and Sneath), it was shown that the probabilistic d-clustering can produce better results than Ward’s algorithm. In comparison with the k-means approach, the probabilistic d-clustering—although gives rather similar results—is more robust to creation of trivial (of which empty) clusters and produces less diversified (in replications, in terms of correctness) results than k-means approach, i.e. is more predictable from the point of view of the clustering quality.

中文翻译：

具有连续性约束的k-均值，病房和概率距离聚类方法

我们分析了在k均值和Ward方法以及基于距离和概率分配的方法进行聚类的聚类中使用连续性（邻域）矩阵作为约束的一些可能性，旨在获得多设施位置问题的解决方案（MFLP）。也就是说，提出了一些特殊的两阶段算法，即具有关系约束的聚类。他们优化了对象集到集群的划分，从而满足了邻居必须属于同一集群的要求。在概率d的情况下提出并研究了其目标功能的相关修改。多种仿真研究和经验分析证明了这些方法的实际有效性。聚类的质量是基于聚类的同质性，异质性和正确性指标以及轮廓指数进行评估的。使用这些工具和相似性指标（Rand，Peirce，Sokal和Sneath），表明概率d聚类比Ward算法可产生更好的结果。与k -means方法相比，概率d与k-均值方法相比，聚类（尽管给出的结果相当相似）对于创建琐碎（其中为空）的簇更稳健，并且产生的多样性（在重复性方面，就正确性而言）比k-均值方法更少，即从聚类质量的观点。

更新日期：2020-08-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11