当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Covariate-driven factorization by thresholding for multiblock data
Biometrics ( IF 1.9 ) Pub Date : 2020-08-16 , DOI: 10.1111/biom.13352
Xing Gao 1 , Sungwon Lee 2 , Gen Li 3 , Sungkyu Jung 4
Affiliation  

Multiblock data, where multiple groups of variables from different sources are observed for a common set of subjects, are routinely collected in many areas of science. Methods for joint factorization of such multiblock data are being developed to explore the potentially joint variation structure of the data. While most of the existing work focuses on delineating joint components, shared across all data blocks, from individual components, which is only relevant to a single data block, we propose to model and estimate partially joint components across some, but not all, data blocks. If covariates, with potential multiblock structures, are available, then the components are further modeled to be driven by the covariate information. To estimate such a covariate-driven, block-structured factor model, we propose an iterative algorithm based on thresholding, by transforming the problem of signal segmentation into a grouped variable selection problem. The proposed factorization provides accurate estimation of individual and (partially) joint structures in multiblock data, as confirmed by simulation studies. In the analysis of a real multiblock genomic dataset from the Cancer Genome Atlas project, we demonstrate that the estimated block structures provide straightforward interpretation and facilitate subsequent analyses.

中文翻译:

通过多块数据的阈值进行协变量驱动的因式分解

多块数据,即针对一组共同的主题观察来自不同来源的多组变量,通常在许多科学领域收集。正在开发对此类多块数据进行联合分解的方法,以探索数据的潜在联合变化结构。虽然现有的大部分工作都侧重于从仅与单个数据块相关的单个组件中描述跨所有数据块共享的联合组件,但我们建议对一些但不是所有数据块的部分联合组件进行建模和估计. 如果具有潜在多块结构的协变量可用,则组件将进一步建模以由协变量信息驱动。为了估计这样一个协变量驱动的块结构因子模型,我们通过将信号分割问题转化为分组变量选择问题,提出了一种基于阈值的迭代算法。拟议的因式分解提供了对多块数据中个体和(部分)联合结构的准确估计,正如模拟研究所证实的那样。在对来自癌症基因组图谱项目的真实多块基因组数据集的分析中,我们证明估计的块结构提供了直接的解释并促进了后续分析。
更新日期:2020-08-16
down
wechat
bug