当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data
Statistics in Medicine ( IF 1.8 ) Pub Date : 2022-07-31 , DOI: 10.1002/sim.9535
Janine Witte 1, 2 , Ronja Foraita 1 , Vanessa Didelez 1, 2
Affiliation  

Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.

中文翻译:

用于不完整队列数据的因果发现的多重插补和测试删除

因果发现算法根据观察数据估计因果图。这可以为侧重于个体治疗-结果对之间因果关系的分析提供有价值的补充。基于约束的因果发现算法在构建图形时依赖于条件独立性测试。直到最近,这些算法还无法处理缺失值。在本文中,我们研究了两种替代解决方案:测试删除和多重插补。我们为测试删除下因果结构的可恢复性建立必要和充分条件,并认为多重插补在因果发现的背景下比估计更具挑战性。我们通过从基准因果图进行模拟来进行广泛的比较:正如人们所期望的那样,我们发现测试删除和多重插补都明显优于列表删除和单一插补。至关重要的是,我们的结果进一步表明,多重插补在具有少量高斯变量或离散变量的设置中特别有用,但当数据集包含这两种方法的混合时,这两种方法都不是最好的。我们比较的方法包括随机森林插补和结合测试删除和多重插补的混合程序。对欧洲儿童饮食和生活方式相关疾病的 IDEFICS 队列研究数据的应用就是一个说明性的例子。我们的结果进一步表明,多重插补在具有少量高斯或离散变量的设置中特别有用,但是当数据集包含这两种方法的混合时,这两种方法都不是最好的。我们比较的方法包括随机森林插补和结合测试删除和多重插补的混合程序。对欧洲儿童饮食和生活方式相关疾病的 IDEFICS 队列研究数据的应用就是一个说明性的例子。我们的结果进一步表明,多重插补在具有少量高斯或离散变量的设置中特别有用,但是当数据集包含这两种方法的混合时,这两种方法都不是最好的。我们比较的方法包括随机森林插补和结合测试删除和多重插补的混合程序。对欧洲儿童饮食和生活方式相关疾病的 IDEFICS 队列研究数据的应用就是一个说明性的例子。
更新日期:2022-07-31
down
wechat
bug