A multiscale environment for learning by diffusion,Applied and Computational Harmonic Analysis

当前位置： X-MOL 学术 › Appl. Comput. Harmon. Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A multiscale environment for learning by diffusion
Applied and Computational Harmonic Analysis ( IF 2.6 ) Pub Date : 2021-11-17 , DOI: 10.1016/j.acha.2021.11.004
James M. Murphy ₁ , Sam L. Polk ₁

Affiliation

Clustering algorithms partition a dataset into groups of similar points. The clustering problem is very general, and different partitions of the same dataset could be considered correct and useful. To fully understand such data, it must be considered at a variety of scales, ranging from coarse to fine. We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model, which is a family of clusterings parameterized by nonlinear diffusion on the dataset. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Nonlinear Diffusion (M-LUND) clustering algorithm, which is derived from a diffusion process at a range of temporal scales. We provide theoretical guarantees for the algorithm's performance and establish its computational efficiency. Finally, we show that the M-LUND clustering algorithm detects the latent structure in a range of synthetic and real datasets.

中文翻译：

通过扩散学习的多尺度环境

聚类算法将数据集划分为相似点的组。聚类问题非常普遍，同一数据集的不同分区可以被认为是正确和有用的。为了充分理解这些数据，必须从粗到细的各种尺度进行考虑。我们介绍了通过扩散学习（MELD）数据模型的多尺度环境，这是一个由数据集上的非线性扩散参数化的聚类系列。我们展示了 MELD 数据模型精确捕获数据中潜在的多尺度结构并促进其分析。为了有效地学习在许多真实数据集中观察到的多尺度结构，我们引入了无监督非线性扩散 (M-LUND) 聚类算法的多尺度学习，该算法源自一系列时间尺度的扩散过程。我们为算法的性能提供理论保证并建立其计算效率。最后，我们展示了 M-LUND 聚类算法在一系列合成和真实数据集中检测潜在结构。

更新日期：2021-11-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11