当前位置: X-MOL 学术J. Multivar. Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Asymptotic performance of PCA for high-dimensional heteroscedastic data
Journal of Multivariate Analysis ( IF 1.6 ) Pub Date : 2018-09-01 , DOI: 10.1016/j.jmva.2018.06.002
David Hong 1 , Laura Balzano 1 , Jeffrey A Fessler 1
Affiliation  

Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.

中文翻译:

PCA对高维异方差数据的渐近性能

主成分分析 (PCA) 是一种经典的方法,通过将数据投影到能够捕获其大部分变化的子空间来降低数据的维数。在现代应用程序中有效使用 PCA 需要了解其对高维和异方差数据的性能。本文分析了 PCA 在这种情况下的统计性能,即从低维子空间中提取并被异方差噪声降级的高维数据。我们提供了底层子空间、子空间幅度和子空间系数的渐近 PCA 恢复的简化表达式;这些表达式可以轻松高效地计算和推理 PCA 的性能。我们利用这些表达式的结构来表明,对于固定的平均噪声方差,PCA 对异方差数据的渐近恢复总是比对同方差数据更差(即,对于样本间相等的噪声方差)。因此,虽然平均噪声方差通常是衡量数据整体质量的实用方便的度量,但它对异方差数据的 PCA 性能给出了过于乐观的估计。
更新日期:2018-09-01
down
wechat
bug