当前位置: X-MOL 学术Biometrika › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data
Biometrika ( IF 2.7 ) Pub Date : 2019-09-16 , DOI: 10.1093/biomet/asz037
Chris McKennan 1 , Dan Nicolae 1
Affiliation  

An important phenomenon in high-throughput biological data is the presence of unobserved covariates that can have a significant impact on the measured response. When these covariates are also correlated with the covariate of interest, ignoring or improperly estimating them can lead to inaccurate estimates of and spurious inference on the corresponding coefficients of interest in a multivariate linear model. We first prove that existing methods to account for these unobserved covariates often inflate Type I error for the null hypothesis that a given coefficient of interest is zero. We then provide alternative estimators for the coefficients of interest that correct the inflation, and prove that our estimators are asymptotically equivalent to the ordinary least squares estimators obtained when every covariate is observed. Lastly, we use previously published DNA methylation data to show that our method can more accurately estimate the direct effect of asthma on DNA methylation levels compared to existing methods, the latter of which likely fail to recover and account for latent cell type heterogeneity.

中文翻译:

考虑高维生物数据中具有不同程度可估计性的未观察到的协变量

高通量生物数据中的一个重要现象是存在未观察到的协变量,这些协变量会对测量的响应产生重大影响。当这些协变量也与感兴趣的协变量相关时,忽略或不正确地估计它们可能导致对多元线性模型中相应的感兴趣系数的不准确估计和虚假推断。我们首先证明,解释这些未观察到的协变量的现有方法通常会夸大给定感兴趣系数为零的零假设的 I 类错误。然后,我们为校正通货膨胀的感兴趣系数提供替代估计量,并证明我们的估计量与观察每个协变量时获得的普通最小二乘估计量渐近等效。最后,
更新日期:2019-09-16
down
wechat
bug