当前位置: X-MOL 学术Int. Stat. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discussions
International Statistical Review ( IF 2 ) Pub Date : 2011-08-01 , DOI: 10.1111/j.1751-5823.2011.00144.x
Anastasios A Tsiatis 1 , Marie Davidian
Affiliation  

We congratulate the authors (henceforth LSD) on a long overdue, detailed review of the connection between the AIPW estimators derived in the incomplete data context via semiparametric theory by Robins, Rotnitzky, and colleagues and the survey calibration estimators used widely in survey sampling. Although this connection has been noted previously (e.g. Robins & Rotnitzky, 1998; Rotnitzky, 2009), the present article appears to be the first in the statistical literature to offer a more comprehensive account. We had only a passing familiarity with this connection, and we are grateful to the editors for the opportunity to offer this discussion, whose preparation required us to acquire a deeper understanding. In what follows, we hope to complement the presentation of LSD by highlighting some further relationships and differences between the two perspectives. We adopt the notation used by the authors and consider estimation of the population total T. We focus on the regression estimator T^reg, which, as in Section 2.1 of LSD may be written equivalently as T^(β^), where T^(β)=T^−∑i=1N(Ri−πiπi)xiβ, (1) a representation that may be more familiar to statisticians well-versed in the AIPW literature. We reiterate and expand upon some important differences between the missing data and survey sampling contexts noted by LSD. In survey sampling, the realizations (x1, y1), . . . , (xN, yN) that comprise the population are regarded as fixed, and inference on T=∑i=1Nyi, or, equivalently, the population mean N−1T, is the goal. This is based on data (xi, Ri, Riyi), i = 1, . . . , N, drawn from the population according to a fixed, known design, where n=∑i=1NRi, and Pr(Ri = 1) = πi and Pr(Ri = 1, Rj = 1) = πij for πi known and πij known or unknown, i, j = 1, . . . , N. The (xi, yi) may be viewed as realizations of random variables (Xi, Yi), i = 1, . . . , N, representing an independent and identically distributed (iid) sample from some superpopulation; however interest focuses on the fixed quantity T (Sarndal et al., 2003). In contrast, in the incomplete data context, interest is in estimation of μ = E(Y), a parameter associated with the superpopulation. Instead of observing a realization of iid (Xi, Yi), i = 1, . . . , N, we observe a realization of iid (Xi, Ri, RiYi), i = 1, . . . , N, where Ri is an indicator of whether or not the value of Yi is observed or missing. Ordinarily, the probabilities of observing Yi for each i are not fixed by design; rather, missingness arises according to some unknown mechanism about which some assumption is made. A common assumption is that Ri is conditionally independent of Yi given Xi, the so-called “missing at random” (MAR) assumption, under which π(Xi) = Pr(Ri = 1|Xi, Yi) = Pr(Ri = 1|Xi). MAR cannot be verified from the observed data, so the validity of inference on μ depends on its unknown relevance. Even if MAR is plausible, which we assume henceforth, the function π(X) is not known and thus must be estimated based on the observed data, usually via maximum likelihood for a posited parametric model, yielding predicted values π^i. From the survey sampling perspective, because the πi are known, T^(β) in (1) is a consistent estimator for T for any fixed β, and the choice β^ given in Section 2 of LSD leading to T^reg is meant to yield an estimator that is more precise than T^. In the incomplete data setting, N−1T^(β) with the π^i substituted for the πi, need not be consistent for μ unless the model for π(Xi) is correctly specified. If this model is correct, then it follows from semiparametric theory that all regular, asymptotically linear estimators for μ may be written in the form (Robins et al., 1994) N−1∑i=1N{RiYiπ^i−(Ri−π^iπ^i)ϕ(Xi)}, (2) where ϕ(X) is an arbitrary function of X. The choice of ϕ(X) leading to the most precise estimator within class (2) is ϕ(X) = E(Y|X). Accordingly, a (usually parametric) model for E(Y|X) may be posited and fitted, and the predicted values substituted for ϕ(Xi) in (2). Scharfstein et al. (1999) made the critical observation that such an estimator is “doubly robust” (DR), i.e. is consistent for μ as long as at least one of the posited models for π(X) or E(Y|X) is correct. Because of the protection afforded by this property, DR estimators have been advocated for routine use. In practice, posited models for E(Y|X) might be linear, generalized linear, or arbitrarily nonlinear in a parameter β, depending on the nature of Y . Usually, the posited model is fitted using ordinary or iteratively reweighted least squares based on the pairs (xi, yi) for which Ri = 1, which, under MAR, would yield a consistent estimator for in a correctly specified model m(X, β), say, for E(Y|X). Kang and Schafer (2007) evaluated the performance of the usual DR estimator for μ in a missing data context under specific simulation scenarios with continuous Y, linear m(x, β) = xβ, and β estimated by ordinary least squares. The estimator exhibited poor performance under scenarios where the models for π(X) and E(Y|X) were only slightly misspecified and/or when the relative magnitudes of the π^i were extremely disparate, with π^i relatively very small for some i, leading Kang and Schafer to issue a strong warning against its routine use. Because of the observational nature of the data, where missingness is by happenstance rather than design, such π^i may be encountered in practice. These results led us (Cao et al., 2009; see also Tan, 2006, 2007 and Tsiatis & Davidian, 2007) to speculate that this poor performance may be partly a consequence of the method used to estimate β in the posited model m(x, β). We proposed considering the class of DR estimators in (2), where ϕ(Xi) is replaced by m(Xi, β), and, among such estimators indexed by β, found the value of β that minimizes the variance of estimators within the class when π(X) is correctly specified regardless of whether or not m(Xi, β) is correct, and a means of estimating this optimal β. The estimator for the optimal β is not ordinary or usual weighted least squares but, rather, involves, in the ideal case where the πi were known, a weighted regression with weights (1 − πi)/πi2, with modification when πi are estimated by π^i as above. Cao et al. (2009) reported simulations showing that the DR estimator incorporating this estimator for the optimal β demonstrated vastly improved performance in the Kang and Schafer and other scenarios. Tsiatis et al. (2011) extended this idea to DR estimators in the more complex setting of longitudinal studies with monotone dropout. We were interested to learn that the same tactic of finding the optimal β minimizing the variance of estimators for T of the form T^(β) in (1) and an estimator for this optimal β using a weighting scheme similar in spirit to that in Cao et al. (2009) was proposed by Montanari (1987); see also Berger et al. (2003). Given that in the survey sampling context, with the πi determined by design, the issue of disparate πi would not be as pronounced, we suspect that the gains in performance realized by such an approach may not be as dramatic. We again compliment the authors on an insightful and useful article.

中文翻译:

讨论

我们祝贺作者(此后称为 LSD)对 Robins、Rotnitzky 及其同事通过半参数理论在不完整数据上下文中得出的 AIPW 估计量与广泛用于调查抽样的调查校准估计量之间的联系进行了早该详细审查。尽管之前已经注意到这种联系(例如 Robins & Rotnitzky,1998;Rotnitzky,2009),但本文似乎是统计文献中第一个提供更全面说明的文章。我们对这种联系只是暂时的熟悉,我们感谢编辑提供这次讨论的机会,他们的准备需要我们获得更深入的了解。在接下来的内容中,我们希望通过强调两种观点之间的一些进一步关系和差异来补充 LSD 的介绍。我们采用作者使用的符号并考虑对总体 T 的估计。我们关注回归估计量 T^reg,如 LSD 的第 2.1 节所述,它可以等价地写为 T^(β^),其中 T^ (β)=T^−∑i=1N(Ri−πiπi)xiβ,(1) 对于精通 AIPW 文献的统计学家来说可能更熟悉这种表示。我们重申并扩展了 LSD 指出的缺失数据和调查抽样上下文之间的一些重要差异。在调查抽样中,实现 (x1, y1), . . . , (xN, yN) 构成总体的被认为是固定的,并且对 T=∑i=1Nyi 或等价的总体均值 N−1T 的推断是目标。这是基于数据 (xi, Ri, Riyi), i = 1, . . . , N, 根据固定的、已知的设计从总体中抽取,其中 n=∑i=1NRi,并且 Pr(Ri = 1) = πi 和 Pr(Ri = 1, Rj = 1) = πij 对于 πi 已知和 πij 已知或未知, i, j = 1, . . . , N. (xi, yi) 可以看作是随机变量 (Xi, Yi), i = 1, 的实现。. . ,N,代表来自某个超群的独立同分布(iid)样本;然而,兴趣集中在固定数量 T (Sarndal et al., 2003)。相比之下,在不完整的数据上下文中,感兴趣的是 μ = E(Y) 的估计,这是一个与超群相关的参数。而不是观察 iid (Xi, Yi) 的实现, i = 1, . . . , N, 我们观察到 iid (Xi, Ri, RiYi), i = 1, 的实现。. . , N, 其中 Ri 是 Yi 值是否被观察到或缺失的指标。按说,每个 i 观察到 Yi 的概率不是设计固定的;相反,缺失是根据某种未知机制产生的,对此做出了一些假设。一个常见的假设是 Ri 条件独立于 Yi 给定 Xi,即所谓的“随机缺失”(MAR)假设,在该假设下 π(Xi) = Pr(Ri = 1|Xi, Yi) = Pr(Ri = 1|Xi)。MAR 无法从观测数据中得到验证,因此对 μ 推断的有效性取决于其未知的相关性。即使 MAR 是合理的,我们以后假设,函数 π(X) 是未知的,因此必须根据观察到的数据估计,通常通过假设参数模型的最大似然,产生预测值 π^i。从调查抽样的角度来看,因为 πi 是已知的,所以 (1) 中的 T^(β) 是任何固定 β 的 T 的一致估计量,LSD 第 2 节中给出的选择 β^ 导致 T^reg 旨在产生比 T^ 更精确的估计量。在不完整的数据设置中,除非正确指定了 π(Xi) 的模型,否则 N−1T^(β) 用 π^i 代替 πi,不需要与 μ 一致。如果这个模型是正确的,那么从半参数理论可以得出所有正则的、渐近线性的 μ 估计量可以写成这样的形式 (Robins et al., 1994) N−1∑i=1N{RiYiπ^i−(Ri− π^iπ^i)ϕ(Xi)}, (2) 其中 ϕ(X) 是 X 的任意函数。 导致类 (2) 中最精确估计量的 ϕ(X) 的选择是 ϕ(X) = E(Y|X)。因此,可以建立和拟合 E(Y|X) 的(通常是参数化的)模型,并且预测值替代 (2) 中的 ϕ(Xi)。沙夫斯坦等人。(1999) 做出了批判性的观察,即这样的估计量是“双重稳健的”(DR),即只要 π(X) 或 E(Y|X) 的假设模型中的至少一个是正确的,则对 μ 是一致的。由于此属性提供的保护,已提倡将 DR 估计器用于常规使用。实际上,E(Y|X) 的假设模型在参数 β 中可能是线性的、广义线性的或任意非线性的,这取决于 Y 的性质。通常,假设模型使用普通或迭代重加权最小二乘法拟合,基于 Ri = 1 的对 (xi, yi),在 MAR 下,这将在正确指定的模型中产生一致的估计量 m(X, β ),比如说,对于 E(Y|X)。Kang 和 Schafer (2007) 在具有连续 Y 的特定模拟场景下,在缺失数据上下文中评估了通常的 DR 估计器 μ 的性能,线性 m(x, β) = xβ,β 由普通最小二乘法估计。在 π(X) 和 E(Y|X) 的模型只是略微错误指定和/或当 π^i 的相对幅度极其不同时,估计器表现出较差的性能,其中 π^i 对于一些我,导致康和谢弗对其日常使用发出强烈警告。由于数据的观察性质,其中缺失是偶然而不是设计,在实践中可能会遇到这样的 π^i。这些结果使我们(Cao 等人,2009 年;另见 Tan,2006 年、2007 年和 Tsiatis & Davidian,2007 年)推测,这种糟糕的表现可能部分是由于在假定模型 m( x, β)。我们建议考虑 (2) 中的 DR 估计量类别,其中 ϕ(Xi) 替换为 m(Xi, β),并且,在这些由 β 索引的估计量中,找到了当 π(X) 被正确指定时使类内估计量的方差最小化的 β 值,而不管 m(Xi, β) 是否正确,以及估计这个最优值的方法β。最佳 β 的估计量不是普通的或通常的加权最小二乘法,而是在 πi 已知的理想情况下,涉及权重 (1 − πi)/πi2 的加权回归,当 πi 估计为π^i 同上。曹等人。(2009) 报告的模拟表明,DR 估计器结合了最佳 β 的估计器在 Kang 和 Schafer 以及其他场景中表现出极大的改进性能。齐蒂斯等人。(2011) 将这个想法扩展到 DR 估计器,在更复杂的纵向研究设置中使用单调丢失。我们有兴趣了解相同的策略,即找到最优 β 以最小化 (1) 中形式为 T^(β) 的 T 的估计量的方差,以及使用与曹等人。(2009) 由 Montanari (1987) 提出;另见伯杰等人。(2003)。鉴于在调查抽样环境中,πi 由设计确定,不同的 πi 问题不会那么明显,我们怀疑通过这种方法实现的性能提升可能不会那么显着。我们再次称赞作者有见地和有用的文章。(2009) 由 Montanari (1987) 提出;另见伯杰等人。(2003)。鉴于在调查抽样环境中,πi 由设计确定,不同的 πi 问题不会那么明显,我们怀疑通过这种方法实现的性能提升可能不会那么显着。我们再次称赞作者有见地和有用的文章。(2009) 由 Montanari (1987) 提出;另见伯杰等人。(2003)。鉴于在调查抽样环境中,πi 由设计确定,不同的 πi 问题不会那么明显,我们怀疑通过这种方法实现的性能提升可能不会那么显着。我们再次称赞作者有见地和有用的文章。
更新日期:2011-08-01
down
wechat
bug