当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimation of the Number of Spiked Eigenvalues in a Covariance Matrix by Bulk Eigenvalue Matching Analysis
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2021-07-23 , DOI: 10.1080/01621459.2021.1933497
Zheng Tracy Ke 1 , Yucong Ma 1 , Xihong Lin 2, 3
Affiliation  

ABSTRACT

The spiked covariance model has gained increasing popularity in high-dimensional data analysis. A fundamental problem is determination of the number of spiked eigenvalues, K. For estimation of K, most attention has focused on the use of top eigenvalues of sample covariance matrix, and there is little investigation into proper ways of using bulk eigenvalues to estimate K. We propose a principled approach to incorporating bulk eigenvalues in the estimation of K. Our method imposes a working model on the residual covariance matrix, which is assumed to be a diagonal matrix whose entries are drawn from a gamma distribution. Under this model, the bulk eigenvalues are asymptotically close to the quantiles of a fixed parametric distribution. This motivates us to propose a two-step method: the first step uses bulk eigenvalues to estimate parameters of this distribution, and the second step leverages these parameters to assist the estimation of K. The resulting estimator K̂ aggregates information in a large number of bulk eigenvalues. We show the consistency of K̂ under a standard spiked covariance model. We also propose a confidence interval estimate for K. Our extensive simulation studies show that the proposed method is robust and outperforms the existing methods in a range of scenarios. We apply the proposed method to analysis of a lung cancer microarray dataset and the 1000 Genomes dataset.



中文翻译:

通过批量特征值匹配分析估计协方差矩阵中尖峰特征值的数量

摘要

尖峰协方差模型在高维数据分析中越来越受欢迎。一个基本问题是确定尖峰特征值K的数量。对于K的估计,大多数注意力都集中在样本协方差矩阵的顶部特征值的使用上,很少研究使用大量特征值来估计K 的正确方法。我们提出了一种将大量特征值纳入K估计的原则性方法. 我们的方法在残差协方差矩阵上施加了一个工作模型,该矩阵被假定为一个对角矩阵,其条目来自伽马分布。在此模型下,体特征值渐近地接近固定参数分布的分位数。这促使我们提出一种两步法:第一步使用大量特征值来估计该分布的参数,第二步利用这些参数来辅助估计K。结果估计量̂聚合大量批量特征值中的信息。我们展示了一致性̂在标准尖峰协方差模型下。我们还提出了K的置信区间估计。我们广泛的模拟研究表明,所提出的方法是稳健的,并且在一系列场景中优于现有方法。我们将所提出的方法应用于肺癌微阵列数据集和 1000 基因组数据集的分析。

更新日期:2021-07-23
down
wechat
bug