当前位置: X-MOL 学术Biometrika › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sparse semiparametric canonical correlation analysis for data of mixed types
Biometrika ( IF 2.4 ) Pub Date : 2020-04-15 , DOI: 10.1093/biomet/asaa007
Grace Yoon 1 , Raymond J Carroll 1 , Irina Gaynanova 1
Affiliation  

Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern data sets due to high-dimensionality and mixed data types such as continuous, binary and zero-inflated. To overcome these challenges, we propose a semiparametric approach for sparse canonical correlation analysis based on Gaussian copula. Our main contribution is a truncated latent Gaussian copula model for data with excess zeros, which allows us to derive a rank-based estimator of the latent correlation matrix for mixed variable types without the estimation of marginal transformation functions. The resulting canonical correlation analysis method works well in high-dimensional settings as demonstrated via numerical studies, as well as in application to the analysis of association between gene expression and micro RNA data of breast cancer patients.

中文翻译:


混合类型数据的稀疏半参数典型相关分析



典型相关分析研究两组变量之间的线性关系,但由于高维和混合数据类型(例如连续、二进制和零膨胀),通常在现代数据集上效果不佳。为了克服这些挑战,我们提出了一种基于高斯联结函数的稀疏典型相关分析的半参数方法。我们的主要贡献是针对具有多余零的数据的截断潜在高斯联结模型,这使我们能够导出混合变量类型的潜在相关矩阵的基于等级的估计器,而无需估计边际变换函数。由此产生的典型相关分析方法在高维环境中效果良好,如数值研究所证明的那样,以及应用于乳腺癌患者基因表达与微小 RNA 数据之间关联的分析。
更新日期:2020-04-15
down
wechat
bug