当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
Statistics and Computing ( IF 1.6 ) Pub Date : 2020-06-25 , DOI: 10.1007/s11222-020-09958-2
Serhat Emre Akhanli , Christian Hennig

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

中文翻译:

通过聚集校准的聚类有效性指标比较聚类和聚类数量

聚类分析中的关键问题是选择合适的聚类方法和确定最佳聚类数。根据不同的准则,对同一数据集进行不同的聚类是最优的,而这些准则的选择取决于聚类的背景和目的。因此,研究人员需要考虑他们针对的集群应该具有哪些数据分析特性,以及集群内部的同质性,集群之间的分离和稳定性。在此,提出了一套衡量聚类质量不同方面的内部聚类有效性指标,其中包括一些来自文献的指标。用户可以选择与应用程序相关的索引。为了测量聚类的整体质量(用于比较来自不同方法和/或不同数目的聚类的聚类),对索引值进行校准以进行聚合。校准是相对于同一数据上的一组随机聚类而言的。提出了两个特定的聚合索引,并将它们与模拟和真实数据上的现有索引进行比较。
更新日期:2020-06-25
down
wechat
bug