Multi-scale affinities with missing data: Estimation and applications,Statistical Analysis and Data Mining

当前位置： X-MOL 学术 › Stat. Anal. Data Min. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-scale affinities with missing data: Estimation and applications
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2021-11-05 , DOI: 10.1002/sam.11561
Min Zhang ₁ , Gal Mishne ₂ , Eric C Chi ₃

Affiliation

Many machine learning algorithms depend on weights that quantify row and column similarities of a data matrix. The choice of weights can dramatically impact the effectiveness of the algorithm. Nonetheless, the problem of choosing weights has arguably not been given enough study. When a data matrix is completely observed, Gaussian kernel affinities can be used to quantify the local similarity between pairs of rows and pairs of columns. Computing weights in the presence of missing data, however, becomes challenging. In this paper, we propose a new method to construct row and column affinities even when data are missing by building off a co-clustering technique. This method takes advantage of solving the optimization problem for multiple pairs of cost parameters and filling in the missing values with increasingly smooth estimates. It exploits the coupled similarity structure among both the rows and columns of a data matrix. We show these affinities can be used to perform tasks such as data imputation, clustering, and matrix completion on graphs.

中文翻译：

缺失数据的多尺度亲和力：估计和应用

许多机器学习算法依赖于量化数据矩阵的行和列相似性的权重。权重的选择可以极大地影响算法的有效性。尽管如此，选择权重的问题可能还没有得到足够的研究。当数据矩阵被完全观察时，高斯核相似性可用于量化行对和列对之间的局部相似性。然而，在存在缺失数据的情况下计算权重变得具有挑战性。在本文中，我们提出了一种新方法，即使在数据丢失时也可以通过构建共聚类技术来构建行和列亲和力。该方法利用解决多对成本参数的优化问题并通过逐渐平滑的估计来填充缺失值。它利用数据矩阵的行和列之间的耦合相似性结构。我们展示了这些相似性可用于执行图形上的数据插补、聚类和矩阵完成等任务。

更新日期：2021-11-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11