Covariate selection for generalizing experimental results: Application to a large-scale development program in Uganda*,The Journal of the Royal Statistical Society, Series A (Statistics in Society)

当前位置： X-MOL 学术 › J. R. Stat. Soc. A › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Covariate selection for generalizing experimental results: Application to a large-scale development program in Uganda*
The Journal of the Royal Statistical Society, Series A (Statistics in Society) ( IF 1.5 ) Pub Date : 2021-08-23 , DOI: 10.1111/rssa.12734
Naoki Egami ₁ , Erin Hartman ₂

Affiliation

Generalizing estimates of causal effects from an experiment to a target population is of interest to scientists. However, researchers are usually constrained by available covariate information. Analysts can often collect many fewer variables from population samples than from experimental samples, which has limited applicability of existing approaches that assume rich covariate data from both experimental and population samples. In this article, we examine how to select covariates necessary for generalizing experimental results under such data constraints. In our concrete context of a large-scale development program in Uganda, although more than 40 pre-treatment covariates are available in the experiment, only 8 of them were also measured in a target population. We propose a method to estimate a separating set—a set of variables affecting both the sampling mechanism and treatment effect heterogeneity—and show that the population average treatment effect (PATE) can be identified by adjusting for estimated separating sets. Our algorithm only requires a rich set of covariates in the experimental data, not in the target population, by incorporating researcher-specific constraints on what variables are measured in the population data. Analysing the development experiment in Uganda, we show that the proposed algorithm can allow for the PATE estimation in situations where conventional methods fail due to data requirements.

中文翻译：

泛化实验结果的协变量选择：在乌干达的大规模开发计划中的应用*

科学家们感兴趣的是将实验因果效应的估计推广到目标人群。然而，研究人员通常受到可用协变量信息的限制。分析师通常可以从总体样本中收集到的变量比从实验样本中收集的变量少得多，这限制了现有方法的适用性，这些方法假设来自实验和总体样本的丰富协变量数据。在本文中，我们将研究如何选择在此类数据约束下泛化实验结果所需的协变量。在我们乌干达大规模开发项目的具体背景下，尽管实验中有 40 多个预处理协变量可用，但在目标人群中也仅测量了其中的 8 个。我们提出了一种估计分离度的方法set——影响抽样机制和治疗效果异质性的一组变量——并表明可以通过调整估计的分离集来识别总体平均治疗效果 (PATE)。我们的算法只需要实验数据中的一组丰富的协变量，而不是目标人群中，通过对在人群数据中测量哪些变量的研究人员特定的约束。分析乌干达的开发实验，我们表明所提出的算法可以允许在传统方法由于数据要求而失败的情况下进行 PATE 估计。

更新日期：2021-10-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文