当前位置: X-MOL 学术J. Multivar. Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors
Journal of Multivariate Analysis ( IF 1.4 ) Pub Date : 2021-06-18 , DOI: 10.1016/j.jmva.2021.104781
Wenjia Wang , Yi-Hui Zhou

Classical canonical correlation analysis (CCA) requires matrices to be low dimensional, i.e. the number of features cannot exceed the sample size. Recent developments in CCA have mainly focused on the high-dimensional setting, where the number of features in both matrices under analysis greatly exceeds the sample size. These approaches impose penalties in the optimization problems that are needed to be solve iteratively, and estimate multiple canonical vectors sequentially. In this work, we provide an explicit link between sparse multiple regression with sparse canonical correlation analysis, and an efficient algorithm that can estimate multiple canonical pairs simultaneously rather than sequentially. Furthermore, the algorithm naturally allows parallel computing. These properties make the algorithm much efficient. We provide theoretical results on the consistency of canonical pairs. The algorithm and theoretical development are based on solving an eigenvectors problem, which significantly differentiate our method with existing methods. Simulation results support the improved performance of the proposed approach. We apply eigenvector-based CCA to analysis of the GTEx thyroid histology images, analysis of SNPs and RNA-seq gene expression data, and a microbiome study. The real data analysis also shows improved performance compared to traditional sparse CCA.



中文翻译:

基于特征向量的稀疏典型相关分析:用于估计多个典型向量的快速计算

经典典型相关分析 (CCA) 要求矩阵是低维的,即特征的数量不能超过样本大小。CCA 的最新发展主要集中在高维设置上,其中所分析的两个矩阵中的特征数量大大超过了样本量。这些方法对需要迭代求解的优化问题施加惩罚,并依次估计多个规范向量。在这项工作中,我们提供了稀疏多元回归与稀疏规范相关分析之间的显式联系,以及一种可以同时而不是顺序估计多个规范对的有效算法。此外,该算法自然允许并行计算。这些特性使算法非常有效。我们提供了关于规范对一致性的理论结果。算法和理论发展基于解决特征向量问题,这使我们的方法与现有方法显着不同。仿真结果支持所提出方法的改进性能。我们将基于特征向量的 CCA 应用于 GTEx 甲状腺组织学图像的分析、SNP 和 RNA-seq 基因表达数据的分析以及微生物组研究。与传统的稀疏 CCA 相比,真实数据分析也显示了改进的性能。我们将基于特征向量的 CCA 应用于 GTEx 甲状腺组织学图像的分析、SNP 和 RNA-seq 基因表达数据的分析以及微生物组研究。与传统的稀疏 CCA 相比,真实数据分析也显示了改进的性能。我们将基于特征向量的 CCA 应用于 GTEx 甲状腺组织学图像的分析、SNP 和 RNA-seq 基因表达数据的分析以及微生物组研究。与传统的稀疏 CCA 相比,真实数据分析也显示了改进的性能。

更新日期:2021-07-13
down
wechat
bug