当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Refining a k-nearest neighbor graph for a computationally efficient spectral clustering
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-02-06 , DOI: 10.1016/j.patcog.2021.107869
Mashaan Alshammari , John Stavrakakis , Masahiro Takatsuka

Spectral clustering became a popular choice for data clustering for its ability of uncovering clusters of different shapes. However, it is not always preferable over other clustering methods due to its computational demands. One of the effective ways to bypass these computational demands is to perform spectral clustering on a subset of points (data representatives) then generalize the clustering outcome, this is known as approximate spectral clustering (ASC). ASC uses sampling or quantization to select data representatives. This makes it vulnerable to 1) performance inconsistency (since these methods have a random step either in initialization or training), 2) local statistics loss (because the pairwise similarities are extracted from data representatives instead of data points). We proposed a refined version of k-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges for computational efficiency. Local statistics were exploited to keep the edges that do not violate the intra-cluster distances and nullify all other edges in the k-nearest neighbor graph. We also introduced an optional step to automatically select the number of clusters C. The proposed method was tested on synthetic and real datasets. Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.



中文翻译:

精炼 ķ最近邻图用于计算有效的光谱聚类

频谱聚类由于能够发现不同形状的聚类而成为数据聚类的流行选择。但是,由于其计算要求,它并不总是比其他聚类方法更好。绕过这些计算需求的有效方法之一是对点的子集(数据代表)执行光谱聚类,然后对聚类结果进行概括,这称为近似光谱聚类(ASC)。ASC使用采样或量化来选择数据代表。这使其容易受到以下影响:1)性能不一致(因为这些方法在初始化或训练中都有随机步骤),2)局部统计信息丢失(因为从数据代表而不是数据点提取成对相似性)。我们提出了完善的版本ķ-最近邻图,其中我们保留数据点并积极减少边的数量以提高计算效率。利用本地统计信息来保留不违反集群内距离的边,并使图中的所有其他边无效ķ-最近邻居图。我们还引入了一个可选步骤,以自动选择集群数C。该方法在合成和真实数据集上进行了测试。与ASC方法相比,尽管边缘明显减少,但所提出的方法仍具有一致的性能。

更新日期:2021-02-15
down
wechat
bug