当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient construction of an approximate similarity graph for minimum spanning tree based clustering
Applied Soft Computing ( IF 7.2 ) Pub Date : 2020-09-15 , DOI: 10.1016/j.asoc.2020.106676
Gaurav Mishra , Sraban Kumar Mohanty

Minimum spanning tree (MST) based unsupervised learning techniques are popular due to their ability to identify intrinsic clusters of heterogeneous structures. One of the important factors which affects their effectiveness is how to construct a sparse similarity graph which can effectively capture the local neighborhood information in sub quadratic time. In this paper, we propose a technique which efficiently uses the local nearest neighbors of data points to construct a similarity graph. The proposed approach consists of two steps. In the first step, the dataset is divided into groups using the dispersion level of data points and then all pair intra-partition edges are computed. In the second step, the boundary data points across the neighboring partitions are considered to produce inter-partition edges for increasing the accuracy. The resulting graph is generated by considering all intra- and inter-partition edges. Approximate MST of the similarity graph is constructed to show its efficacy. Experimental analyses demonstrate that the similarity graph captures shorter edges and discards the longest edges, based on graph diameter, all pair shortest path and weight error of MST. Moreover, the quality of the approximate MST is also validated by applying clustering technique on various synthetic and real data sets of different characteristics and cluster quality analyses demonstrate that it has a satisfying performance over other competing approximate MST construction techniques.



中文翻译:

有效构建基于最小生成树的聚类近似相似图

基于最小生成树(MST)的无监督学习技术之所以流行,是因为它们能够识别异构结构的内在簇。影响其有效性的重要因素之一是如何构建一个稀疏相似度图,该图可以有效地捕获亚二次时间的局部邻域信息。在本文中,我们提出了一种有效利用数据点的局部最近邻居构建相似度图的技术。提议的方法包括两个步骤。第一步,使用数据点的分散级别将数据集分为几组,然后计算所有成对的分区内边缘。在第二步骤中,考虑跨相邻分区的边界数据点以产生分区间边缘以提高精度。通过考虑所有分区内和分区间边缘来生成结果图。构建相似图的近似MST以显示其功效。实验分析表明,基于图的直径,所有对的最短路径和MST的权重误差,相似度图捕获较短的边缘并丢弃最长的边缘。此外,还通过将聚类技术应用于具有不同特征的各种合成和真实数据集来验证近似MST的质量,并且聚类质量分析表明,与其他竞争性近似MST构造技术相比,它具有令人满意的性能。实验分析表明,基于图的直径,所有对的最短路径和MST的权重误差,相似度图捕获较短的边缘并丢弃最长的边缘。此外,还通过将聚类技术应用于具有不同特征的各种合成和真实数据集来验证近似MST的质量,并且聚类质量分析表明,与其他竞争性近似MST构造技术相比,它具有令人满意的性能。实验分析表明,基于图的直径,所有对的最短路径和MST的权重误差,相似度图捕获较短的边缘并丢弃最长的边缘。此外,还通过将聚类技术应用于具有不同特征的各种合成和真实数据集来验证近似MST的质量,并且聚类质量分析表明,与其他竞争性近似MST构造技术相比,它具有令人满意的性能。

更新日期:2020-09-15
down
wechat
bug