当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A neighborhood-based three-stage hierarchical clustering algorithm
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2021-07-29 , DOI: 10.1007/s11042-021-11171-w
Yan Wang 1 , Yan Ma 1 , Hui Huang 1
Affiliation  

Many neighborhood-based clustering algorithms have been proposed to measure the similarity between data points or subclusters with their neighborhood information. However, most of them are vulnerable to the different cluster sizes, shapes and densities. In this paper, we propose a neighborhood-based three-stage hierarchical clustering algorithm (NTHC) which is robust to the difference. Three concepts, i.e., the stability of data point pair, the linked representatives, and the expanded representatives, are defined. Furthermore, a new measure of intercluster distance based on representatives is designed. In Stage 1, the outliers are detected and removed from the data set using reverse nearest neighbors. In Stage 2, small clusters are formed by merging the data points with stable connection on 1-nearest neighbor graph. In Stage 3, the final partitions are obtained by iteratively merging the closest pair of clusters based on the new measure of intercluster distance. Tests are carried out to compare the proposal with 15 other clustering algorithms. The experimental results on synthetic and real data sets demonstrate the proposed method is effective. In addition, we test the statistically significant differences among the sixteen clustering algorithms using the Friedman test. And the average rank value of the proposed algorithm is 4.19, which is superior to the other algorithms.



中文翻译:

一种基于邻域的三阶段层次聚类算法

已经提出了许多基于邻域的聚类算法来测量数据点或子聚类与其邻域信息之间的相似性。然而,它们中的大多数容易受到不同集群大小、形状和密度的影响。在本文中,我们提出了一种对差异具有鲁棒性的基于邻域的三阶段层次聚类算法(NTHC)。定义了三个概念,即数据点对的稳定性、链接代表和扩展代表。此外,设计了一种新的基于代表的簇间距离度量。在第 1 阶段,使用反向最近邻检测并从数据集中删除异常值。在第 2 阶段,通过在 1-最近邻图上合并具有稳定连接的数据点来形成小集群。在第 3 阶段,最终分区是通过基于新的集群间距离度量迭代合并最近的集群对来获得的。进行测试以将提议与其他 15 种聚类算法进行比较。在合成数据集和真实数据集上的实验结果表明所提出的方法是有效的。此外,我们使用 Friedman 检验测试了 16 种聚类算法之间的统计显着差异。并且所提算法的平均秩值为4.19,优于其他算法。此外,我们使用 Friedman 检验测试了 16 种聚类算法之间的统计显着差异。并且所提算法的平均秩值为4.19,优于其他算法。此外,我们使用 Friedman 检验测试了 16 种聚类算法之间的统计显着差异。并且所提算法的平均秩值为4.19,优于其他算法。

更新日期:2021-07-29
down
wechat
bug