Spectral analysis of weighted Laplacians arising in data clustering,Applied and Computational Harmonic Analysis

当前位置： X-MOL 学术 › Appl. Comput. Harmon. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Spectral analysis of weighted Laplacians arising in data clustering
Applied and Computational Harmonic Analysis ( IF 2.6 ) Pub Date : 2021-09-01 , DOI: 10.1016/j.acha.2021.07.004
Franca Hoffmann ₁ , Bamdad Hosseini ₁ , Assad A. Oberai ₂ , Andrew M. Stuart ₁

Affiliation

Graph Laplacians computed from weighted adjacency matrices are widely used to identify geometric structure in data, and clusters in particular; their spectral properties play a central role in a number of unsupervised and semi-supervised learning algorithms. When suitably scaled, graph Laplacians approach limiting continuum operators in the large data limit. Studying these limiting operators, therefore, sheds light on learning algorithms. This paper is devoted to the study of a parameterized family of divergence form elliptic operators that arise as the large data limit of graph Laplacians. The link between a three-parameter family of graph Laplacians and a three-parameter family of differential operators is explained. The spectral properties of these differential operators are analyzed in the situation where the data comprises of two nearly separated clusters, in a sense which is made precise. In particular, we investigate how the spectral gap depends on the three parameters entering the graph Laplacian, and on a parameter measuring the size of the perturbation from the perfectly clustered case. Numerical results are presented which exemplify the analysis and which extend it in the following ways: the computations study situations in which there are two nearly separated clusters, but which violate the assumptions used in our theory; situations in which more than two clusters are present, also going beyond our theory; and situations which demonstrate the relevance of our studies of differential operators for the understanding of finite data problems via the graph Laplacian. The findings provide insight into parameter choices made in learning algorithms which are based on weighted adjacency matrices; they also provide the basis for analysis of the consistency of various unsupervised and semi-supervised learning algorithms, in the large data limit.

中文翻译：

数据聚类中出现的加权拉普拉斯算子的谱分析

从加权邻接矩阵计算的拉普拉斯图被广泛用于识别数据中的几何结构，特别是聚类；它们的光谱特性在许多无监督和半监督学习算法中起着核心作用。当适当缩放时，图拉普拉斯算子会在大数据限制中限制连续统算子。因此，研究这些限制算子可以阐明学习算法。本文致力于研究作为图拉普拉斯算子的大数据限制而出现的散度形式椭圆算子的参数化族。解释了图拉普拉斯算子的三参数族和微分算子的三参数族之间的联系。在数据由两个几乎分离的簇组成的情况下分析这些微分算子的谱特性，在某种意义上说是精确的。特别是，我们研究了光谱间隙如何取决于进入图拉普拉斯算子的三个参数，以及测量完美聚类情况下扰动大小的参数。给出的数值结果举例说明了分析并以下列方式对其进行了扩展：计算研究了两个几乎分离的集群的情况，但违反了我们理论中使用的假设；存在两个以上集群的情况，也超出了我们的理论；以及证明我们的微分算子研究对于通过图拉普拉斯算子理解有限数据问题的相关性的情况。研究结果提供了对基于加权邻接矩阵的学习算法的参数选择的深入了解；它们还为在大数据限制下分析各种无监督和半监督学习算法的一致性提供了基础。

更新日期：2021-09-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11