当前位置: X-MOL 学术J. R. Stat. Soc. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Right singular vector projection graphs: fast high dimensional covariance matrix estimation under latent confounding
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 5.8 ) Pub Date : 2020-01-31 , DOI: 10.1111/rssb.12359
Rajen D. Shah 1 , Benjamin Frot 2 , Gian-Andrea Thanei 2 , Nicolai Meinshausen 2
Affiliation  

We consider the problem of estimating a high dimensional p×p covariance matrix Σ, given n observations of confounded data with covariance urn:x-wiley:13697412:media:rssb12359:rssb12359-math-0001, where Γ is an unknown p×q matrix of latent factor loadings. We propose a simple and scalable estimator based on the projection onto the right singular vectors of the observed data matrix, which we call right singular vector projection (RSVP). Our theoretical analysis of this method reveals that, in contrast with approaches based on the removal of principal components, RSVP can cope well with settings where the smallest eigenvalue of urn:x-wiley:13697412:media:rssb12359:rssb12359-math-0002 is relatively close to the largest eigenvalue of Σ, as well as when the eigenvalues of urn:x-wiley:13697412:media:rssb12359:rssb12359-math-0003 are diverging fast. RSVP does not require knowledge or estimation of the number of latent factors q, but it recovers Σ only up to an unknown positive scale factor. We argue that this suffices in many applications, e.g. if an estimate of the correlation matrix is desired. We also show that, by using subsampling, we can further improve the performance of the method. We demonstrate the favourable performance of RSVP through simulation experiments and an analysis of gene expression data sets collated by the GTEX consortium.

中文翻译:

右奇异矢量投影图:潜在混杂下的快速高维协方差矩阵估计

给定n个带有协方差的混杂数据的观测值,我们考虑估计高维p × p协方差矩阵Σ的问题,其中Γ是潜伏因子负载的未知p × q矩阵。我们基于对观测数据矩阵的右奇异矢量的投影,提出了一种简单且可扩展的估计器,我们称其为右奇异矢量投影(RSVP)。我们对该方法的理论分析表明,与基于去除主成分的方法相比,RSVP可以很好地应对其中的最小特征值相对接近于Σ的最大特征值的设置,以及骨灰盒:x-wiley:13697412:media:rssb12359:rssb12359-math-0001骨灰盒:x-wiley:13697412:media:rssb12359:rssb12359-math-0002骨灰盒:x-wiley:13697412:media:rssb12359:rssb12359-math-0003正在迅速分歧。RSVP不需要了解或估计潜在因子q的数量,但是它最多只能恢复未知的正比例因子Σ。我们认为,这在许多应用中就足够了,例如,如果需要相关矩阵的估计。我们还表明,通过使用子采样,我们可以进一步提高该方法的性能。我们通过模拟实验和由GTEX联盟整理的基因表达数据集的分析证明了RSVP的良好性能。
更新日期:2020-01-31
down
wechat
bug