当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-dimensional generalized propensity score with application to omics data
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-08-18 , DOI: 10.1093/bib/bbab331
Qian Gao 1 , Yu Zhang 1 , Jie Liang 1 , Hongwei Sun 2 , Tong Wang 1
Affiliation  

Propensity score (PS) methods are popular when estimating causal effects in non-randomized studies. Drawing causal conclusion relies on the unconfoundedness assumption. This assumption is untestable and is considered more plausible if a large number of pre-treatment covariates are included in the analysis. However, previous studies have shown that including unnecessary covariates into PS models can lead to bias and efficiency loss. With the ever-increasing amounts of available data, such as the omics data, there is often little prior knowledge of the exact set of important covariates. Therefore, variable selection for causal inference in high-dimensional settings has received considerable attention in recent years. However, recent studies have focused mainly on binary treatments. In this study, we considered continuous treatments and proposed the generalized outcome-adaptive LASSO (GOAL) to select covariates that can provide an unbiased and statistically efficient estimation. Simulation studies showed that when the outcome model was linear, the GOAL selected almost all true confounders and predictors of outcome and excluded other covariates. The accuracy and precision of the estimates were close to ideal. Furthermore, the GOAL is robust to model misspecification. We applied the GOAL to seven DNA methylation datasets from the Gene Expression Omnibus database, which covered four brain regions, to estimate the causal effects of epigenetic aging acceleration on the incidence of Alzheimer’s disease.

中文翻译:

应用于组学数据的高维广义倾向评分

在估计非随机研究中的因果效应时,倾向评分 (PS) 方法很受欢迎。得出因果结论依赖于不混杂性假设。如果在分析中包含大量预处理协变量,则该假设是不可检验的,并且被认为更合理。然而,先前的研究表明,在 PS 模型中包含不必要的协变量会导致偏差和效率损失。随着可用数据量的不断增加,例如组学数据,通常很少有关于重要协变量的确切集合的先验知识。因此,近年来,高维环境中因果推理的变量选择受到了相当大的关注。然而,最近的研究主要集中在二元治疗上。在这项研究中,我们考虑了连续治疗,并提出了广义的结果自适应 LASSO (GOAL) 来选择可以提供无偏和统计有效估计的协变量。模拟研究表明,当结果模型为线性时,GOAL 选择了几乎所有真正的混杂因素和结果预测因子,并排除了其他协变量。估计的准确性和精确度接近理想。此外,目标对于模型错误指定是稳健的。我们将 GOAL 应用于 Gene Expression Omnibus 数据库中的 7 个 DNA 甲基化数据集,该数据库涵盖了四个大脑区域,以估计表观遗传衰老加速对阿尔茨海默病发病率的因果影响。模拟研究表明,当结果模型为线性时,GOAL 选择了几乎所有真正的混杂因素和结果预测因子,并排除了其他协变量。估计的准确性和精确度接近理想。此外,目标对于模型错误指定是稳健的。我们将 GOAL 应用于 Gene Expression Omnibus 数据库中的 7 个 DNA 甲基化数据集,该数据库涵盖了四个大脑区域,以估计表观遗传衰老加速对阿尔茨海默病发病率的因果影响。模拟研究表明,当结果模型为线性时,GOAL 选择了几乎所有真正的混杂因素和结果预测因子,并排除了其他协变量。估计的准确性和精确度接近理想。此外,目标对于模型错误指定是稳健的。我们将 GOAL 应用于 Gene Expression Omnibus 数据库中的 7 个 DNA 甲基化数据集,该数据库涵盖了四个大脑区域,以估计表观遗传衰老加速对阿尔茨海默病发病率的因果影响。
更新日期:2021-08-18
down
wechat
bug