当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient synthetical clustering validity indexes for hierarchical clustering
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2020-03-13 , DOI: 10.1016/j.eswa.2020.113367
Qin Xu , Qiang Zhang , Jinpei Liu , Bin Luo

Clustering validation and identifying the optimal number of clusters are of great importance in expert and intelligent systems. However, the commonly used similarity measures for validating are not versatile to measure the complex data structure, in reality, some of which are not as effective as that of the used clustering algorithm which gives the clustering results. This paper studies the validity indexes for the hierarchical clustering algorithm and proposes a unified validity index framework. For the single-linkage agglomerative hierarchical clustering we propose two efficient synthetical clustering validity (SCV) indexes using the minimum spanning tree to calculate the intra-cluster compactness to overcome the deficiencies of the measurements in the existing validity indexes. For the general hierarchical clustering, a self-adaptive similarity measure strategy and two generalized synthetical clustering validity (GSCV) indexes, which are the extension of the proposed SCV indexes, are developed. The proposed SCV and GSCV indexes constitute a unified validity index framework, where SCV index is a special case of GSCV index, can avoid the incompatibility of the similarity measure between the clustering and validation. The experimental comparisons with the state-of-the-art validity indexes on artificial and real-world data sets demonstrate the efficiency of the proposed validity indexes in discovering the true number of clusters and dealing with various sorts of data sets, including imbalanced data sets.



中文翻译:

用于层次聚类的有效综合聚类有效性指标

在专家和智能系统中,聚类验证和确定最佳聚类数量非常重要。但是,通常用于验证的相似性度量并不通用来度量复杂的数据结构,实际上,其中一些不如所给出的聚类结果所使用的聚类算法有效。本文研究了层次聚类算法的有效性指标,提出了一个统一的有效性指标框架。对于单链接聚集层次聚类,我们提出了两个有效的综合聚类有效性(SCV)指标,该指标使用最小生成树来计算聚类内部紧密度,以克服现有有效性指标中的测量缺陷。对于一般的分层聚类,提出了一种自适应相似性度量策略和两个通用的综合聚类有效性(GSCV)指标,它们是所提出的SCV指标的扩展。提出的SCV和GSCV索引构成了一个统一的有效性索引框架,其中SCV索引是GSCV索引的特例,可以避免聚类和验证之间的相似性度量不兼容。在人工和现实数据集上与最新有效性指标进行的实验比较表明,所提出的有效性指标在发现集群的真实数量和处理各种数据集(包括不平衡数据集)方面的有效性。提出的SCV和GSCV索引构成了一个统一的有效性索引框架,其中SCV索引是GSCV索引的特例,可以避免聚类和验证之间的相似性度量不兼容。在人工和现实数据集上与最新有效性指标进行的实验比较表明,所提出的有效性指标在发现集群的真实数量和处理各种数据集(包括不平衡数据集)方面的效率。提出的SCV和GSCV索引构成了一个统一的有效性索引框架,其中SCV索引是GSCV索引的特例,可以避免聚类和验证之间的相似性度量不兼容。在人工和现实数据集上与最新有效性指标进行的实验比较表明,所提出的有效性指标在发现集群的真实数量和处理各种数据集(包括不平衡数据集)方面的有效性。

更新日期:2020-03-13
down
wechat
bug