当前位置: X-MOL 学术J. Educ. Behav. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variable Selection for Causal Effect Estimation: Nonparametric Conditional Independence Testing With Random Forests
Journal of Educational and Behavioral Statistics ( IF 1.9 ) Pub Date : 2019-09-08 , DOI: 10.3102/1076998619872001
Bryan Keller 1
Affiliation  

Widespread availability of rich educational databases facilitates the use of conditioning strategies to estimate causal effects with nonexperimental data. With dozens, hundreds, or more potential predictors, variable selection can be useful for practical reasons related to communicating results and for statistical reasons related to improving the efficiency of estimators. Background knowledge should take precedence in deciding which variables to retain. However, with many potential predictors, theory may be weak, such that functional form relationships are likely to be unknown. In this article, I propose a nonparametric method for data-driven variable selection based on permutation testing with conditional random forest variable importance. The algorithm automatically handles nonlinear relationships and interactions in its naive implementation. Through a series of Monte Carlo simulation studies and a case study with Early Childhood Longitudinal Study–K data, I find that the method performs well across a variety of scenarios where other methods fail.

中文翻译:

因果估计的变量选择:随机森林的非参数条件独立测试

丰富的教育数据库的广泛可用性有助于使用条件策略来估计非实验数据的因果效应。具有数十种,数百种或更多种潜在的预测变量,出于与传达结果相关的实际原因以及与提高估算效率有关的统计原因,变量选择可能很有用。在确定保留哪些变量之前,应先了解背景知识。但是,由于有许多潜在的预测因素,因此理论可能很薄弱,因此功能形式之间的关系可能是未知的。在本文中,我提出了一种基于置换测试的数据驱动变量选择的非参数方法,具有条件随机森林变量重要性。该算法在其幼稚的实现中会自动处理非线性关系和相互作用。
更新日期:2019-09-08
down
wechat
bug