Matching One Sample According to Two Criteria in Observational Studies,Journal of the American Statistical Association

当前位置： X-MOL 学术 › J. Am. Stat. Assoc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Matching One Sample According to Two Criteria in Observational Studies
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2021-11-17 , DOI: 10.1080/01621459.2021.1981337
B Zhang ₁ , D S Small ₁ , K B Lasater ₁ , M McHugh ₁ , J H Silber ₁ , P R Rosenbaum ₁

Affiliation

Abstract

Multivariate matching has two goals (i) to construct treated and control groups that have similar distributions of observed covariates, and (ii) to produce matched pairs or sets that are homogeneous in a few key covariates. When there are only a few binary covariates, both goals may be achieved by matching exactly for these few covariates. Commonly, however, there are many covariates, so goals (i) and (ii) come apart, and must be achieved by different means. As is also true in a randomized experiment, similar distributions can be achieved for a high-dimensional covariate, but close pairs can be achieved for only a few covariates. We introduce a new polynomial-time method for achieving both goals that substantially generalizes several existing methods; in particular, it can minimize the earthmover distance between two marginal distributions. The method involves minimum cost flow optimization in a network built around a tripartite graph, unlike the usual network built around a bipartite graph. In the tripartite graph, treated subjects appear twice, on the far left and the far right, with controls sandwiched between them, and efforts to balance covariates are represented on the right, while efforts to find close individual pairs are represented on the left. In this way, the two efforts may be pursued simultaneously without conflict. The method is applied to our on-going study in the Medicare population of the relationship between superior nursing and sepsis mortality. The match2C package in R implements the method. Supplementary materials for this article are available online.

中文翻译：

观察研究中根据两个标准匹配一个样本

抽象的

多变量匹配有两个目标（i）构建观察到的协变量具有相似分布的处理组和对照组，以及（ii）生成在几个关键协变量中同质的匹配对或集合。当只有几个二元协变量时，可以通过对这几个协变量进行精确匹配来实现这两个目标。然而，通常存在许多协变量，因此目标 (i) 和 (ii) 是分开的，并且必须通过不同的方式来实现。正如随机实验中的情况一样，高维协变量可以实现类似的分布，但只有少数协变量可以实现接近的对。我们引入了一种新的多项式时间方法来实现这两个目标，该方法基本上概括了几种现有方法；特别是，它可以最小化两个边缘分布之间的推土机距离。该方法涉及围绕三部分图构建的网络中的最小成本流优化，这与围绕二部分图构建的通常网络不同。在三方图中，接受治疗的受试者出现两次，在最左侧和最右侧，中间夹有对照，平衡协变量的努力表示在右侧，而寻找密切个体对的努力表示在左侧。这样，两种努力就可以同时进行，不会发生冲突。该方法应用于我们正在进行的针对医疗保险人群的优质护理与脓毒症死亡率之间关系的研究。 R 中的 match2C 包实现了该方法。本文的补充材料可在线获取。

更新日期：2021-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11