当前位置: X-MOL 学术Appl. Comput. Harmon. Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diffusion K-means clustering on manifolds: Provable exact recovery via semidefinite relaxations
Applied and Computational Harmonic Analysis ( IF 2.5 ) Pub Date : 2020-03-14 , DOI: 10.1016/j.acha.2020.03.002
Xiaohui Chen , Yun Yang

We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.



中文翻译:

扩散K-均值在流形上的聚类:可通过半确定松弛来精确恢复

我们介绍了在黎曼子流形上的扩散K均值聚类方法,该方法基于扩散距离最大化了聚类内部的连通性。扩散K-均值在相似度图上构造随机游动,其中顶点是在流形上随机采样的数据点,而边缘则是由捕获流形局部几何形状的核给出的相似性。扩散K均值是一种多尺度聚类工具,适用于混合维度中具有非线性和非欧几里得几何特征的数据。给定簇数,我们通过半定规划(SDP)提出了多项式时间凸松弛算法来解决扩散K-手段。此外,我们还提出了一种适用于集群数量的核规范正则化SDP。在这两种情况下,我们都表明,在子流形的适当簇间可分离性和簇内连接性下,可以实现扩散K均值的SDP的精确恢复,这些总和量化了歧管聚类问题的难度。我们还通过使用从最近邻居估计的局部自适应带宽来提出局部扩散K均值。我们表明,局部扩散K-均值的精确恢复是完全适应基础概率密度和基础子流形的几何结构的。

更新日期:2020-04-20
down
wechat
bug