当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2020-06-30 , DOI: 10.1080/01621459.2020.1769635
Chris McKennan 1 , Dan Nicolae 2
Affiliation  

Abstract

Many high-dimensional and high-throughput biological datasets have complex sample correlation structures, which include longitudinal and multiple tissue data, as well as data with multiple treatment conditions or related individuals. These data, as well as nearly all high-throughput “omic” data, are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation of and inference on the effects of interest. We therefore developed CBCV and CorrConf: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high-dimensional data with correlated or nonexchangeable residuals. We demonstrate each method’s superior performance compared to other state of the art methods by analyzing simulated multi-tissue gene expression data and identifying sex-associated DNA methylation sites in a real, longitudinal twin study. Supplementary materials for this article are available online.



中文翻译:

高维相关数据中未观察到的协变量的估计和解释

摘要

许多高维、高通量的生物数据集具有复杂的样本相关结构,包括纵向和多组织数据,以及具有多种治疗条件或相关个体的数据。这些数据以及几乎所有高通量“omic”数据都受到研究人员未知的技术和生物学因素的影响,如果不加以考虑,可能会严重混淆对利益影响的估计和推断。因此,我们开发了 CBCV 和 CorrConf:可证明准确且计算效率高的方法,用于选择和估计存在于具有相关或不可交换残差的高维数据中的潜在混杂因素的数量。我们通过分析模拟的多组织基因表达数据并在真实的纵向双胞胎研究中识别与性别相关的 DNA 甲基化位点,证明了每种方法与其他最先进方法相比的优越性能。本文的补充材料可在线获取。

更新日期:2020-06-30
down
wechat
bug