Improved Doubly Robust Estimation in Marginal Mean Models for Dynamic Regimes

Hao Sun; Ashkan Ertefaie; Xin Lu; Brent A. Johnson

doi:10.1515/jci-2020-0015

Open Access Published by De Gruyter December 31, 2020

Improved Doubly Robust Estimation in Marginal Mean Models for Dynamic Regimes

Hao Sun , Ashkan Ertefaie , Xin Lu and Brent A. Johnson

From the journal Journal of Causal Inference

https://doi.org/10.1515/jci-2020-0015

Abstract

Doubly robust (DR) estimators are an important class of statistics derived from a theory of semiparametric efficiency. They have become a popular tool in causal inference, including applications to dynamic treatment regimes. The doubly robust estimators for the mean response to a dynamic treatment regime may be conceived through the augmented inverse probability weighted (AIPW) estimating function, defined as the sum of the inverse probability weighted (IPW) estimating function and an augmentation term. The IPW estimating function of the causal estimand via marginal structural model is defined as the complete-case score function for those subjects whose treatment sequence is consistent with the dynamic regime in question divided by the probability of observing the treatment sequence given the subject's treatment and covariate histories. The augmentation term is derived by projecting the IPW estimating function onto the nuisance tangent space and has mean-zero under the truth. The IPW estimator of the causal estimand is consistent if (i) the treatment assignment mechanism is correctly modeled and the AIPW estimator is consistent if either (i) is true or (ii) nested functions of intermediate and final outcomes are correctly modeled.

Hence, the AIPW estimator is doubly robust and, moreover, the AIPW is semiparametric efficient if both (i) and (ii) are true simultaneously. Unfortunately, DR estimators can be inferior when either (i) or (ii) is true and the other false. In this case, the misspecified parts of the model can have a detrimental effect on the variance of the DR estimator. We propose an improved DR estimator of causal estimand in dynamic treatment regimes through a technique originally developed by [4] which aims to mitigate the ill-effects of model misspecification through a constrained optimization.

In addition to solving a doubly robust system of equations, the improved DR estimator simultaneously minimizes the asymptotic variance of the estimator under a correctly specified treatment assignment mechanism but misspecification of intermediate and final outcome models. We illustrate the desirable operating characteristics of the estimator through Monte Carlo studies and apply the methods to data from a randomized study of integrilin therapy for patients undergoing coronary stent implantation. The methods proposed here are new and may be used to further improve personalized medicine, in general.

Keywords: causal inference; informative eligibility; missing data; treatment competing events

MSC 2010: 92B15

1 Introduction

Estimating population parameters in the presence of missing data is a common but challenging problem when one desires robust precise estimates under a missing at random assumption [e.g.22]. Horvitz-Thompson [1952] estimators, defined by multiplying the outcome by the reciprocal of the missingness probability and also called inverse-probability weighted (IPW) estimators, are popular because they are straightforward to implement and consistent if the missingness probability is modeled correctly. At the same time, Horvitz-Thompson estimators can be imprecise and sensitive to estimated missingness probabilities that are very close to zero. Augmented inverse probability weighted (AIPW) estimating functions [16] are constructed as the sum of the IPW estimating function and a mean-zero augmentation term. The optimal augmentation is the orthogonal projection of the IPW estimating function onto the nuisance tangent space [2, 26] and is a function of one or more regression functions, i.e. mean outcomes modeled as a function of covariates and also called outcome regression (OR) models. The AIPW estimators are doubly robust (DR) which implies they are consistent whether the missingness probability is modeled correctly or the OR models in the augmentation term are modeled correctly. Moreover, if both models are correctly specified, the AIPW is semiparametric efficient [2, 16, 22]. However, this adaptive estimator is not optimal when only some models are correctly specified [e.g.9, 19, 27].

In this paper, our contribution is to propose a coefficient estimator of the causal parameters in a marginal model for dynamic regimes that has better operating characteristics than the doubly robust estimator proposed by [11]. In particular, we seek to define an estimator that has two properties:

(P1) doubly robust;
(P2) minimum asymptotic variance within a class of augmented estimator assuming the PS model is correct, but the assumed OR model may or may not be correct.

The first optimal doubly robust estimators for missing data problems satisfying properties (P1)-(P2) was Tan's [2006] restricted maximum likelihood estimator (MLE) and the targeted MLE by [27]. Later, [4] proposed a competing estimator that had better finite sample performance than the estimator by [19]. Other improved DR estimators available in the literature include the augmented restricted MLE [21], the augmented OR estimator [17], numerically-derived locally efficient estimators [5], and an adaptation of Godambe's [1960] optimal estimating equation for missing data [13]. Most of the aforementioned techniques demonstrate the principles of their method in the problem of estimating mean outcome when the outcome may be missing due to measured covariates or, analogously, estimating the average causal effect from observational data. These principles can, in theory, be extended to new problems but the details for any given strategy are a substantial challenge when working with complex temporal data.

Our proposed estimator is developed by applying a constrained augmentation technique [4] to the DR estimator in Murphy et al. The challenge of this extension is deriving the correct expression for the constrained augmentation. Although both our estimator and the estimator by Murphy et al. possess the double robustness property in (P1), our proposed estimator has minimum asymptotic variance when the OR models are incorrect in property (P2) while the estimator by Murphy et al. does not. The reason property (P2) is important for this problem, in particular, is because it is extremely difficult to specify OR models that satisfy the nested constraints induced by the marginal model much less to expect that they are specified correctly (See the constraints in (6) and our discussion in Section 3.3). Thus, the proposed method is a hedge against imprecise parameter estimation due to OR model misspecification, which seems more likely than not in the case of marginal models for dynamic regimes.

The remainder of this paper is organized as follows. Section 2 reviews DR estimation of causal estimands in dynamic treatment regimes via marginal models [11] whereas Section 3 details our new methodological contribution. In Section 4, we demonstrate the operating characteristics of our methods through simulation studies and illustrate the utility of our methods through application to data from an infusion study conducted at Duke University. Proofs of technical results are relegated to the Web Appendix.

2 Doubly Robust Estimation in Marginal Models for Dynamic Regimes

2.1 Notation and Assumptions

Without loss of generality, we follow much the same notation given in [11]. Let A_j, j = 1, . . . , K, be the stochastic treatment decision at time t_j and Ā_j = (A₁, . . . , A_j) be the history of treatment decisions through time t_j. Similarly, define tailoring and auxiliary variables, S_j and X_j, respectively, for j = 0, . . . , K − 1, as well as their histories S̅_j = (S₀, . . . , S_j) and X̅_j = (X₀, . . . , X_j). Let 𝒢 denote subgroups of interest. The set of all possible potential outcomes is {Y(a¯K) | a¯K∈A¯K} , where A¯K is the collection of all possible treatment allocation vectors; similarly, the set of potential intermediate outcomes is {S¯K(a¯K) | a¯K∈A¯K} . Hence, we assume that the potential outcome does not depend on the treatment assignment mechanism nor is affected by others’ treatment [14, 18]. We also assume a sequential randomization assumption [14], that is, the treatment assignment A_j is conditionally independent of {𝒢, S₀, . . . , S_K(ā_K₋₁), Y(ā_K₋₁)} given {L₀, A₁, L₁, . . . , A_j₋₁, L_j₋₁}, with L_j = (S_j, X_j). In words, the sequential randomization implies (i) that at no point in time does treatment assignment depend on future outcomes, and (ii) that the history of auxiliary variables X̅_j₋₁ is sufficiently rich to ensure that treatment assignment A_j does not vary systematically by intermediate or potential outcomes within levels of (Ā_j₋₁, L̅_j₋₁). The observed data is O = {𝒢, L₀, A₁, L₁, A₂, . . . , L_K₋₁, A_K, Y}, where Y = Y(Ā_K), S_j = S_j(Ā_j), X_j = X_j(Ā_j) for all j = 1, . . . , K.

Let d_j(L̅_j₋₁) be a treatment decision or treatment assignment rule at time t_j, j = 1, . . . , K, and the sequence of decision rules for the entire treatment allocation period defines the dynamic treatment regimen, i.e. d̅_K = (d₁, . . . , d_K). Now, define the product

(1)Wd¯j(A¯j,L¯j−1;π¯j0)=∏m=1jI{Am=dm(L¯m−1)}πm0(Am | A¯m−1,L¯m−1),(j=1,…,K),

where, at time t_m, π_m₀(· | Ā_m₋₁, L̅_m₋₁) is the treatment assignment probability in the observational data and treatment assignment in the dynamic regime is degenerate according to the treatment decision rule d_m(L̅_m₋₁). Without loss of generality, we define π¯j=(π1,…,πj) and the vector of true treatment assignment probabilities π¯j0=(π10,…,πj0) for j = 1, . . . , K. The product in (1) plays a critical role in our ability to estimate causal estimands from the observed data and is a version of the non-stabilized inverse PS weight. Let P_{d̅_K} and E_{d̅_K} be probability distribution and expectation under the dynamic regime, respectively, and P and E be the probability distribution and expectation in the observed data, respectively. Then, if there is non-trivial probability that a randomly selected subject can follow the regime d̅_K in the observational, then by Lemma 4.1 of [11], the distribution of (Y, S̅_K₋₁, Ā_K, 𝒢) under P_{d̅_K} is absolutely continuous with respect to the distribution of (Y, S̅_K₋₁, Ā_K, 𝒢) under P and a version of the Radon-Nikodym derivative is

E[Wd¯K(A¯K,L¯K−1;π¯K0) | Y=y,S¯K−1=s¯K−1,A¯K=a¯K,G=g].

2.2 Ordinary doubly robust estimation

Suppose that we are interested in estimating the parameter vector β from the marginal model,

(2)Ed¯K(Y | G)=μ(β,G),

where μ(β, 𝒢) is a parameterization of the conditional mean of Y given subgroups 𝒢 under the dynamic regime d̅_K. Then, under regularity assumptions outlined in Section 2.1 as well as in Lemma 4.1 of [11], the statistic

Wd¯K(A¯K,L¯K−1;π¯K0)μ˙(β,G)(Y−μ(β,G)),

where μ˙(β,G)=(∂/∂β)μ(β,G) , has mean zero at the true marginal parameter β = β₀. If the treatment assignment probabilities {π_j₀} were known a priori, this statistic could be used to define an estimator for β. However, in observational studies, the treatment assignment probabilities are unknown a priori and instead one posits a statistical model. To this end, we model the treatment assignment probabilities through a finite-dimensional vector of parameters γ and estimate it through the estimating function,

(3)Uγ(O;γ)=∑j=1K∑ajI(Aj=aj)uj(aj | A¯j−1,L¯j−1;γ)

where uj(aj | A¯j−1,L¯j−1;γ)=π˙j(aj | A¯j−1,L¯j−1;γ)Vj−1(aj | A¯j−1,L¯j−1;γ){aj−πj(aj | A¯j−1,L¯j−1;γ)} , π˙j(aj | A¯j−1,L¯j−1;γ)=(∂/∂γ)πj(aj | A¯j−1,L¯j−1;γ) and 𝒱_j(a_j | Ā_j₋₁, L̅_j₋₁; γ) is the conditional variance of a_j given Ā_j₋₁ and L̅_j₋₁. The estimating function U_γ(O; γ) may be regarded as proportional to the first-order partial derivative of the log likelihood with respect to γ for the treatment assignment mechanism. As such, EU_γ(O; γ₀) = 0 when the treatment assignment probabilities are correctly modeled. Let γ^ denote the maximum likelihood estimator of γ₀, i.e. the solution to the estimating equations, 0 = ℙ_nU_γ(O; γ). Then, the inverse probability weighted estimator (IPW) for β is β^IPW , the solution to the system of estimating equations,

0=ℙn[ψIPW(O;β,π¯K(γ^))],

where

(4)ψIPW(O;β,π¯K(γ))=Wd¯K(A¯K,L¯K−1;π¯K(γ))μ˙(β,G)(Y−μ(β,G)),

π¯j(γ)=(π1(γ),…,πj(γ)) and π_j(γ) = π_j(a_j | Ā_j₋₁, L̅_j₋₁; γ)) for j = 1, . . . , K. The IPW estimator β^IPW is consistent as long as the treatment selection probabilities {π_j} are correctly modeled such that derivative is consistently estimated, i.e.Wd¯K(A¯K,L¯K−1;π¯K0)=Wd¯K(A¯K,L¯K−1;π¯K(γ0)) .

A concern of IPW estimators, assuming {π_j} are correctly modeled, is their efficiency. Briefly, the likelihood may be written as the product of two expressions, ℒ(O) = ℒ_OR(O)ℒ_PS(O), where ℒPS(O)=∏j=1Kπj0(Aj | A¯j−1,L¯j−1) and ℒ_OR(O) is the product of conditional densities, ∏j=1KP(Lj*∈dLj | A¯j−1,L¯j−1) , where Y ≡ L_K and P(Lj*∈B | ⋅)=P(Lj∈B | ⋅) for every measureable rectangle B. The causal estimand E_{d̅_K} (Y | 𝒢) is a function of the conditional densities in ℒ_OR but not {π_j₀} in ℒ_PS; hence, the treatment selection probabilities are nuisance parameters for the estimand of interest. The efficient estimator for β will be orthogonal to the score function for the nuisance parameters and, hence, the IPW estimator can be improved by subtracting from ψ_IPW its projection onto the score function of the treatment selection probabilities. The projection of ψ_IPW onto the score function of the treatment selection probabilities [15] is

(5)A(O;β,π¯K0,g¯K0})=∑j=1KWd¯j(A¯j,L¯j−1;π¯j0)μ˙(β,G){gj0(A¯j,L¯j−1)−μ(β,G)} −∑j=1K[∑ajπj0(aj | A¯j−1,L¯j−1)×Wd¯j(aj,A¯j−1,L¯j−1;π¯j0)μ˙(β,G){gj0(aj,A¯j−1,L¯j−1)−μ(β,G)}],

where the regression functions are nested within one another through the following relationship: g_K₀(Ā_K, L̅_K₋₁) = E(Y | Ā_K, L̅_K₋₁) and for j = 1, . . . , K − 1,

(6)gj0(A¯j,L¯j−1)=E[∑aj+1I{aj+1=dj+1(L¯j)}gj+1,0(aj+1,A¯j,L¯j) | A¯j,L¯j−1].

Again, we adopt the notation g̅_j = (g₁, . . . , g_j) and the history of true regression models is g̅_j₀ = (g₁₀, . . . , g_j₀). In semi-parametric theory for missing data problems [22, 26], the augmentation term A(O;β,π¯K0,g¯K0) in (5) has mean zero by construction when evaluated at β = β₀, and the treatment assignment probabilities are modeled correctly so that {π_j₀} ≡ {π_j(γ₀)}.

In order to use the augmentation in practice, the true regression functions {g_j₀} must be known or modeled. Often, g_j₀(Ā, L̅_j₋₁) is modeled parametrically or semi-parametrically as g_j(Ā_j, L̅_j₋₁; α) through the finite dimensional parameter α. For example, Murphy et al. [2001] suggested the semi-parametric estimator α^MLR , the solution to the estimating equations, 0 = ℙ_n[U_α(O; α)],

(7)Uα(O;α)=g˙K(A¯K,L¯K−1;α)(Y−gK(A¯K,L¯K−1;α))+∑j=1Kg˙j(A¯j,L¯j−1;α)×[∑aj+1I{aj+1=dj+1(L¯j)}gj+1(aj+1,A¯j,L¯j;α)−gj(A¯j,L¯j−1;α)],

where g˙j(A¯j,L¯j−1;α)=(∂/∂α)gj(A¯j,L¯j−1;α) for j = 1, . . . , K. Using α^MLR defined through (7), then the usual augmented inverse probability weighted (AIPW) estimator, β^USUAL , is defined as the solution to the system of equations

0=ℙn[ψAIPW(O;β,π¯K(γ^),g¯K(α^MLR))],

where

ψAIPW(O;β,π¯K(γ),g¯K(α))=ψIPW(O;β,π¯K(γ))−A(O;β,π¯K(γ),g¯K(α)),

and {g_j(α)} = {g_j(Ā_j, L̅_j₋₁; α)}. The AIPW estimator has the double robustness property: that is, β^USUAL is consistent if one of either the treatment selection probabilities {π_j} or the regression functions {g_j} are modeled correctly. If both functions are modeled correctly, the AIPW estimator is semi-parametric efficient [16, 22].

3 New methods

When only one set of functions {π_j} or {g_j} are modeled correctly, the incorrect model can have an adverse effect on the precision of β^USUAL unless steps are taken to mitigate the ill-effects of model misspecification. Without loss of generality, suppose that the PS model {π_j} is correctly modeled but {g_j} may or may not be. Under the true {g_j₀}, EU_α(O; α₀) = 0 and α^MLR→pα0 . Under incorrectly modeled {g_j}, however, α^MLR→pα∗ , for some α_* ≠ α₀. Even though the usual estimator β^USUAL is consistent regardless of whether {g_j} is correctly modeled or not, the variance of β^USUAL is only optimal when {g_j} is modeled correctly. Thus, we aim to construct an estimator of β that will:

(P1) be doubly robust;
(P2) have smallest asymptotic variance among the class of AIPW estimators when the PS is modeled correctly regardless of whether {gj}j=1K are modeled incorrectly.

3.1 New Estimation with {π_j} as Known Functions

In this subsection, we consider constrained doubly robust estimation of β when {π_j} are known functions of (Ā_j₋₁, L̅_j₋₁), i.e., when γ = γ₀ is known. To begin, consider the coefficient estimator β˜ defined as the solution to

(8)0=ℙn[ψAIPW(O;β0,π¯K0,g¯K(α*))],

where α^* is the limiting value of the estimator α^ regardless of whether {g_j} are modeled correctly or not. The asymptotic variance of n1/2(β˜−β0) is

V˜(α∗)=Γ−1{ var limn→∞n1/2ℙn[ψAIPW(O;β0,π¯K0,g¯K(α*))]}(Γ−1)⊤,

where Γ is the asymptotic slope matrix of the right-hand side of (8) with respect to β₀, say,

Γ=E{∂∂βψAIPW(O;β,π¯K0,g¯K(α)) | β=β0,α=α∗}.

The asymptotic slope matrix is invariant to the choice of α_* either directly or in the limit. Therefore, when the {π_j₀} are modeled correctly, minimizing the asymptotic variance V˜(α∗) reduces to a problem of minimizing var [ψAIPW(O;β0,π¯K0,g¯K(α*))] as a function in α_*. [For a related problem, see expression (4) on p.8 of [23]].

Now, because ψAIPW(O;β0,π¯K0,g¯K(α*)) has mean zero when evaluated at the true parameters, then minimizing var [ψAIPW(O;β0,π¯K0,g¯K(α*))] is equivalent to minimizing

(9)E[ψAIPW(O;β0,π¯K0,g¯K(α*))2],

where α_* is the probabilistic limit of some estimator α^ . Define α_opt as the minimizer of (9) as a function in α_*. Note that α_opt also satisfies the following system of equations,

(10)0=E[A˙(O;β0,π¯K0,g¯K(α*))ψAIPW(O;β0,π¯K0,g¯K(α*))],

where

A˙(O;β,π¯K,g¯K(α*))=∑j=1K∑aj{I(Aj=aj)−πj(aj | A¯j−1,L¯j−1)}×Wd¯j(aj,A¯j−1,L¯j−1;π¯j)μ˙(β,G)g˙j(A¯j,L¯j−1;α*).

When {g_j} are correctly specified, α_opt = α₀; otherwise, α_opt is some other value in the parameter space. To satisfy (P2), we wish to define an estimator α^opt→pαopt while, at the same time, α^opt→pαopt≡α0 when {g_j} are correctly specified, so that (P1) is satisfied.

To proceed with construction of the improved DR estimator via constrained augmentation, it is convenient to re-express (10). Using Lemmas 1–3 in Appendix A, we show that the estimating function on the right-hand side of (10) can be written as

(11)E[{Y−μ(β0,G)}×∑j=1K1−πj0(dj(L¯j−1) | A¯j−1,L¯j−1)∏m=1jπm0(Am | A¯m−1,L¯m−1)μ˙2(β0,G)g˙j(dj(L¯j−1),A¯j−1,L¯j−1;α*) −∑j=1K[{gj(dj(L¯j−1),A¯j−1,L¯j−1;α*)−μ(β0,G)} ×{1−πj0(dj(L¯j−1) | A¯j−1,L¯j−1)∏m=1jπm0(Am | A¯m−1,L¯m−1)μ˙2(β0,G)g˙j(dj(L¯j−1),A¯j−1,L¯j−1;α*)}]

We can then use (11) to propose an estimator for α. We propose to estimate α by solving the system of estimating equations,

(12)0=ℙn[∑j=1K−1Wd¯j(A¯j,L¯j−1;π¯j0)μ˙2(β0,G)qj(L¯j−1;π¯j0,α)×{gj+1(dj+1(L¯j),A¯j,L¯j;α)−gj(dj(L¯j−1),A¯j−1,L¯j−1;α)}+Wd¯K(A¯K,L¯K−1;π¯K0)μ˙2(β0,G)qK(L¯K−1;π¯K0,α)×{Y−gK(dK(L¯K−1),A¯K−1,L¯K−1;α)}],

where, for r = 1, . . . , K,

(13)qr(L¯r−1;π¯r,α)=∑j=1r1−πj(dj(L¯j−1) | A¯j−1,L¯j−1)∏m=1jπm(Am | A¯m−1,L¯m−1)g˙j(dj(L¯j−1),A¯j−1,L¯j−1;α).

In Appendix A, we show that the estimating function on the right-hand side of (12) has mean zero when one of either {π_j} or {g_j} are correctly specified and, hence, the double robustness property (P1) is satisfied. Furthermore, when the PS model is correctly specified and (12) is evaluated with {π_j₀}, we show that (12) has mean zero at α = α_opt, and therefore, the constrained augmentation improved doubly robust (IDR) estimator β^IDR has the smallest asymptotic variance among the class of AIPW estimators with correctly specified {π_j}, where β^IDR is the solution to the AIPW estimating equations with α^ solving (12). Thus, both properties (P1) and (P2) are satisfied.

Remark 1

The right-hand side of the proposed estimating equation in (12) is a function of the unknown causal estimand β₀. In practice, when the {π_j} are known, the unknown parameters (β₀, α_*) are estimated jointly by augmenting the estimating equation in (12) with the AIPW estimating equation in (8). The root-solving method one uses in practice may depend, in part, by the complexity of the marginal mean μ(β₀, 𝒢) as a function in β₀ and 𝒢 in (12). An alternative asymptotically-equivalent practical solution would be to replace β₀ in (12) with a consistent estimator β^ of β₀, e.g., the usual AIPW estimator, and then estimate (β₀, β_*) together by solving the pair of estimating equations (12) and (8).

3.2 New Estimation with {π_j} as Functions of Unknown γ

We now consider the case where the probabilities {π_j} are functions of an unknown parameter vector γ. In general, var [ψAIPW(O;β,π¯K(γ0),g¯K(α)] is not equal to var [ψAIPW(O;β,π¯K(γ^),g¯K(α)] in either finite or large samples. Exceptions include when {π_j} are known by design or when the estimators γ^ are constructed to be asymptotically orthogonal. The details given in this subsection are appropriate for a majority of cases where there is a non-negligible effect on var [ψAIPW(O;β,π¯K(γ0),g¯K(α)] when γ₀ is replaced with a consistent estimator γ^ . When the variance cost to the estimating function ψ_AIPW is zero or asymptotically negligible, methods described in Section 3.1 are appropriate exactly or in the limit.

For j = 1, . . . , K, let π_j(γ) = π_j(a_j | Ā_j₋₁, L̅_j₋₁; γ) and π¯j(γ)={π1(γ),…,πj(γ)} . Similar to the outline in Section 3.1, we aim to define an estimator β^ that solves the AIPW estimating equations,

(14)0=ℙn[ψAIPW(O;β,π¯K(γ^),g¯K(α^))],

where γ^ and α^ are estimators of γ and α, respectively, such that the double robustness property in (P1) and optimality property in (P2) are both satisfied. From Theorem 9.1 in [22], when the PS model is correctly specified, the AIPW estimating equations in (14) are asymptotically equivalent to

(15)0=ℙn[ψAIPW(O;β,π¯K(γ0),g¯K(α*))−H0⊤(β,γ0,α*)ℐγ−1Uγ(O;γ0)],

where α_* is the limiting value of some estimator α^ and

H0(β,γ0,α*)=E{∂∂γψAIPW(O;β,π¯K(γ),g¯K(α)) | β=β0,γ=γ0,α=α*},ℐγ=E{Uγ(O;γ0)Uγ⊤(O;γ0)}.

Let c_* be the value of c that minimizes

(16)E[ψAIPW(O;β0,π¯K(γ0),g¯K(α))−c⊤Uγ(O;γ0)]2

when α_* is substituted for α. Our objective is thus to find ζ_opt that minimizes (16) where ζ = (a, c), and the procedure is analogous to finding α_opt that solves (10) in Section 3.1. However, in the current scenario, we allow for the fact that {π_j₀} are functions of the unknown parameter γ that is estimated from the data. We can then use this correspondence to Section 3.1 to propose an estimator of ζ_opt, say ζ^opt , that will in turn lead to the improved doubly robust (IDR) estimator β^IDR by solving (14) with smallest asymptotic variance when the PS models are correctly specified but the OR models may or may not be; see Appendix B for a detailed derivation of this section.

Building on the principles and logic laid out above, we therefore propose to estimate ζ by solving the following estimating equation,

(17)0= ℙn[∑j=1K−1Wd¯j(A¯j,L¯j−1;π¯j(γ^))μ˙2(β,G)q˜j(L¯j−1;ζ,γ^)× {g˜j+1(dj+1(L¯j),A¯j,L¯j;ζ,γ^)−g˜j(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ^)}]+ Wd¯K(A¯K,L¯K−1;π¯K(γ^))μ˙2(β,G)q˜K(L¯K−1;ζ,γ^)× {Y−g˜K(dK(L¯K−1),A¯K−1,L¯K−1;ζ,γ^)}],

where

q˜r(L¯r−1;ζ,γ)=∑j=1r1−πj(dj(L¯j−1) | A¯j−1,L¯j−1;γ)∏m=1jπm(Am | A¯m−1,L¯m−1;γ)[g˜jα(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ)g˜jc(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ)],

and

g˜j(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ)=gj(dj(L¯j−1),A¯j−1,L¯j−1;α)+c⊤g˜jc(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ), g˜jc(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ)=−Wd¯j−1−1(A¯j−1,L¯j−2;π¯j−1(γ))πj(Aj | A¯j−1,L¯j−1;γ)∑ajI(Aj=aj)uj(aj | A¯j−1,L¯j−1;γ){I(Aj=dj(L¯j−1))−πj(dj(L¯j−1) | A¯j−1,L¯j−1;γ)},

and g˜jα(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ)=g˙j(dj(L¯j−1),A¯j−1,L¯j−1;α) , and u_j(a_j | Ā_j₋₁, L̅_j₋₁; γ) is defined in (3). We observe that γ^ converges to γ₀ when the PS models {π_j} are correctly specified regardless of whether the OR models {g_j} are correctly specified or not. Using similar arguments in Appendix A that show the results in Section 3.1, ζ^opt solving the system of estimating equations in (17), will converges to ζ_opt and thus satisfying (P2). On the other hand, when the PS models are misspecified but the OR models are correct, the expectation of (17) can be shown to be zero when ζ ≡ (α, c) = (α₀, 0) and therefore the double-robustness property (P1) is satisfied. Thus, our estimator β^IDR , defined as the solution to 0=ψAIPW(O;β0,π¯K(γ^),g¯K(α^opt)) where α^opt is defined through ζ^opt via (17) and γ^ is an M-estimator defined in (3), is DR and has smallest asymptotic variance among all DR estimators when the PS models are correct but OR models may not be.

We may use a theory of Z-estimation to describe the asymptotic properties of β^IDR . Specifically, we combine the estimating equations for θ = (β^⊤, γ^⊤, α^⊤)^⊤ and derive the asymptotic properties for β^ from the asymptotic properties of θ^=(β^⊤,γ^⊤,α^⊤)⊤ . Combining the estimating functions for β and γ with the proposed estimating function (17) with respect to ζ = (α, c), θ^ solves the system of estimating equations,

0=ℙn[ψAIPW(O;β,π¯K(γ),g¯K(α))Uγ(O;γ)[∑j=1K−1Wd¯j(A¯j,L¯j−1;π¯j(γ))μ˙2(β,G)q˜j(L¯j−1;ζ,γ)×{g˜j+1(dj+1(L¯j),A¯j,L¯j;ζ,γ)−g˜j(dj(L¯j−1),A¯j−1,L¯j−1;ζ,γ)}+Wd¯K(A¯K,L¯K−1;π¯K(γ))μ˙2(β,G)q˜K(L¯K−1;ζ,γ)×{Y−g˜K(dK(L¯K−1),A¯K−1,L¯K−1;ζ,γ)}]].

Under conditions used to derive the improved DR estimator [e.g., 11, Appendix], θ^ is a Z-estimator with an asymptotic normal distribution whose covariance follows standard formulae [e.g.3, Ch. 7] and can be estimated in various ways, including direct evaluation of the empirical covariance matrix and resampling methods.

3.3 Caveats

The technique described above is general and applicable for all non-random dynamic treatment regimes via the marginal model in (2). This development applies to the work by [4] and [23] who detailed the idea for a population mean in the presence of missing outcome and monotone coarsening, respectively.

Although the principles behind locally efficient doubly robust estimation is technically sound, putting the principles into practice can be a substantial challenge. Firstly, for K-stage trials with K even moderately large and non-trivial treatment effect, there are many regression functions to specify. Such bookkeeping can be an onerous task and encourages the use of a subclass of {g_j} that restrict regression coefficients to be common across stages. Second, the regression functions {g_j} are highly constrained through their nested definition in (6). This caveat led [11] to propose that {g_j} depend on auxiliary variables X̅_j₋₁ but not on treatment Ā_j₋₁, i.e. no treatment effect. Unfortunately, if there is a treatment effect, then one is relying entirely on a correct PS model via {π_j} for consistency of the AIPW estimator. As we demonstrated above, even if the PS model is correct, incorrectly specified {g_j} could have undesirable consequences on the precision of the AIPW estimator and downstream statistical inference. Thus, efficiency of the AIPW estimator in the current context is both germane and important.

4 Examples

4.1 The Setup: Adaptive Treatment Length Studies

Our numerical and real-data examples are illustrations of our continued work in dynamic regimes for infusion studies, where a clinical goal of interest is to estimate the mean outcome if an attending physician stops the infusion at a time of their choice or at the time of an unexpected random event (i.e. a time not of their choice), whichever time comes first [8]. In these applications, as with many dynamic treatment regimes, evaluating eligibility criteria for treatment assignment is a key feature of the analysis. Eligibility for treatment assignment is evaluated sequentially at each stage or clinic visit at time t_j and depends on treatment history, covariate history, and the history of eligibility. In the infusion study, treatment assignment at t_j is synonymous with a physician's decision to stop or continue treatment by choice at time t_j provided the patient satisfies eligibility criteria. In the observational study, treatment assignment depends on treatment history, covariate history, and the history of eligibility whereas treatment assignment depends on treatment and eligibility history in the non-random dynamic regime for our infusion study application.

To define the decision rule leading to our dynamic regime, let a target treatment length t_r be given, r ∈ {1, . . . , K}. Let S_j be a binary tailoring variable. We consider the decision rule

(18)dj(L¯j−1)={stop treatment, if tj=tr and S¯j−1=0;continue, otherwise.

In the infusion study application, a subject initiates infusion treatment at t₀. According to the decision rule in (18), at t_j, j < r, the attending physician continues treatment as long as the subject satisfies the eligibility criteria, i.e. S̅_j = 0, where S_j is a random (in)eligibility event in the interval (t_j, t_j₊₁]. If Y is the clinical endpoint of interest in the infusion study, then our goal is to estimate the causal estimand, E_{d̅_K} (Y | 𝒢) = β, using the data.

The observed data for the proposed application is outlined in Table 1. In contrast to a dynamic regime whose treatment length is stopped at t_r provided S̅_r₋₁ = 0, subjects may stop at any t_j, j < K, in the observational study. In addition, the concept of eligibility for treatment assignment at t_j is broader in the observational study. In order to be eligible for treatment assignment at t_j, a subject must have not stopped treatment at any prior time point t_j_′, j′ < j, as well as avoided random adverse events {S_j_′ = 1} for all j′ < j. In our notation, A_j = 1 implies a decision to stop or “assign” treatment at t_j by choice and the probability that a provider stops treatment at t_j given a subject is eligible at t_j is

λj(X¯j−1)=P(Aj=1 | X¯j−1,A¯j−1=0,S¯j−1=0).

Then, the treatment assignment mechanism is modeled as

(19)πj(aj | A¯j−1,L¯j−1)={λj(X¯j−1)}aj{1−λj(X¯j−1)}I(A¯j=0,S¯j−1=0).

Adhering to the non-stochastic dynamic regime that defines the adaptive treatment length policy is given by the degenerate treatment assignment mechanism,

I{aj=dj(L¯j−1)}=I(tj=tr)aj×{1−I(A¯r=0,S¯r−1=0)}I(A¯j=0,S¯j−1=0).

Combining the degenerate treatment assignment mechanism above with the mechanism in (19) from the observed data leads to the definition of the Radon-Nikodym derivative, i.e.,

Wd¯j(A¯j,L¯j−1)=∏m=1jI{Am=dm(L¯m−1)}πm(Am | A¯m−1,L¯m−1)=∏m=1j{I(tm=tr)λm(X¯m−1)}Am{1−I(A¯r=0,S¯r−1=0)1−λm(X¯m−1)}I(A¯m=0,S¯m−1=0),

that is critical for estimating the causal estimand E_{d̅_K} [Y | 𝒢] from observational data.

Table 1

The observed data in adaptive treatment length studies. Time tj+ denotes the moment just after t_j.

	Innovation at t_j	Description
Treatment assignment, A_j	A_jI(Ā_j₋₁ = 0, S̅_j₋₁ = 0)	stop or continue at t_j given subject is eligible at t_j;
History of covariates and tailoring variables, L̅_j(A̅_j)	(X̅_j, S_j)I(Ā_j = 0, S̅_j₋₁ = 0)	covariate history and ineligibility indicator in (t_j, t_j₊₁] given subject eligible at tj+ ;
	I(Ā_j = 0, S̅_j₋₁ = 0)	indicator that subject is eligible for treatment assignment at tj+ ;
Endpoint, Y(Ā_K)	NA	outcome measured at end of study.

4.2 Simulation Studies

We performed simulation studies to evaluate the finite sample performance of competing estimators of for the causal parameter β in the marginal model μ(β, 𝒢), including outcome regression (OR), inverse probability weighted (IPW), ordinary double robust and improved double robust estimators. First, we partition covariate history as X¯j=((V0(1))⊤,(V¯j(2))⊤)⊤ , where V0(1) baseline (time-invariant) covariates, and V¯j(2)=(V0(2),V1(2),…,Vj(2)) is the history of time-varying covariate information, respectively. Next, treatment assignment probabilities are generated as Bernoulli random variables with success probability λj(X¯j−1;γ)=H(γj+γX⊤X¯j−1) , j = 1, . . . , K, where H(z) = 1/{1 + exp(−z)} and γX=(γV(1)⊤,γV(2)⊤)⊤ . Intermediate outcomes S_j are similarly simulated as Bernoulli random variables with success probability λjξ(X¯j;α0)=P(Sj=1 | X¯j,A¯j−1=0,S¯j−1=0) , j = 0, . . . , K − 1, where λjξ(X¯j;α)=H(αξ,j+αξX⊤X¯j) and αξX=(αξ,V(1)⊤,αξ,V(2)⊤)⊤ . Finally, the endpoint Y is assumed to follow a linear model, Y = g_K₀(Ā_K, L̅_K₋₁; α) + €, where g_K₀(Ā_K, L̅_K₋₁; α₀) = E(Y | Ā_K, L̅_K₋₁),

gK0(A¯K,L¯K−1;α)=αy0+∑j=1KαyjI({Aj=1}∪{Sj−1=1})+∑j=1KαySj−1I{Sj−1=1}+αyX⊤X¯K−1,

αyX=(αyV(1)⊤,αyV(2)⊤)⊤ , and € is an independent standard normal random variable. The parameters in the conditional mean model g_K₀(Ā_K, L̅_K₋₁) have practical interpretation for our application. The parameters α_yj describe the adjusted effect of treatment length on the outcome Y irregardless of the reason why treatment was stopped. Recall, treatment starts at t₀ and must end in one of the intervals, (t_j₋₁, t_j], j = 1, . . . , K. The parameters α_{yS_j} allow for the possibility that the presence or absence of the intermediate events (S₀, . . . , S_K₋₁) modifies the adjusted effect of treatment length on mean outcome. The parameters α_yX describe the adjusted effect of covariate history on mean outcome. An overview of the simulation scheme used to generate data in Table 1 is outlined in Algorithm 1.

Algorithm 1 Pseudocode for simulated data
1:	Simulate baseline covariate information: V0(1)∼N2(0,I2)
2:	Set j = 0
3:	whilej < Kdo
4:	Simulate time-dependent covariate information: if j = 0, V0(2)∼N(4,1) , else Vj(2)∼N(Vj−1(2),1)
5:	Simulate tailoring variable: Sj∼Bern{λjξ(X¯j;α)} , where λjξ(X¯j;α)=H(αξ,j+αξ,V(1)⊤V0(1)+αξ,V(2)Vj(2))
6:	ifs_j = 1 then
7:	Infusion stopped in (t_j, t_j₊₁] due to terminating event; Simulate outcome Y=αy0+αy(j+1)+αySj+αyV(1)⊤V0(1)+αyV(2)Vj(2)+ϵ ; Stop
8:	ifj = K − 1 then
9:	Set a_j₊₁ = K; Simulate outcome Y=αy0+αyK+αyV(1)⊤V0(1)+αyV(2)Vj(2)+ϵ ; Stop
10:	else
11:	Simulate treatment assignment at t_j₊₁: A_j₊₁ ~ Bern{λ_j₊₁(X̅_j; γ)}, where λj(X¯j;γ)=H(γj+γV(1)⊤V0(1)+γV(2)Vj(2))
12:	ifa_j₊₁ = 1 then
13:	Infusion stopped at t_j; Simulate outcome Y=αy0+αy(j+1)+αyV(1)⊤V0(1)+αyV(2)Vj(2)+ϵ ; Stop
14:	else Increment j = j + 1; Continue

We generated data according to the aforementioned scenario for K = 2 stages, with two baseline covariates V0(1) and history of time-varying covariate information defined as its current value, i.e.V¯j(2)=Vj(2) . The parameter vector for treatment assignment is γ=(γ1,γV(1)⊤,γV(2))⊤ , γ₁ = 10/7, γ_V⁽¹⁾ = (−1/5, −1/5)^⊤, and γ_V⁽²⁾ = −4/5. The parameter vector for intermediate outcomes is αξ=(αξ,0,αξ,1,αξ,V(1)⊤,αξ,V(2))⊤ with α_ξ,0 = 13/7, α_ξ,1 = 26/7, α_ξ,V⁽¹⁾ = (−1/10, −1/10)^⊤, and α_ξ,V⁽²⁾ = −9/10. The outcome parameter vector is αy=(αy0,αy1,αy2,αyV(1)⊤,αyV(2))⊤ , with (α_y₀, α_y₁, α_y₂) = (3/2, 1/2, 3/10), α_V⁽¹⁾ = (1, 1/2)^⊤, and α_V⁽²⁾ = 1/2 or 1. For the particular simulation results presented below, we set α_{yS_j} = 0 which assumes that the adjusted effect of treatment length on mean outcome is not modified by the reason for stopping treatment. True values for causal estimands are computed by numerical integration via Monte Carlo method.

In addition to studying the simulation scenario where all models are correctly specified, we also studied scenarios when one or both of the propensity score (PS) or outcome regression (OR) models are incorrectly specified. Specifically, we adopt the following:

(PS incorrect): In the treatment assignment model, the true success probability of the Bernoulli random variable is modeled as a quadratic function in the time-dependent covariate V⁽²⁾, i.e., λj0(X¯j−1;γ)=H{γj+γV(1)⊤V0(1)+γV(2)(Vj(2))2} , where γ_V⁽²⁾ = 0.2, but the procedure incorrectly fits the model λj(X¯j−1;γ)=H(γj+γV(1)⊤V0(1)+γV(2)Vj(2)) ;
(OR incorrect): We misspecify the model on Y by ignoring the time-dependent covariate Vj(2) in and modeling baseline covariates V0(1) only.

Note that the true causal estimands remain the same under potentially misspecified PS or OR models when the regression parameter for the time-dependent covariate Vj(2) , i.e. α_V⁽²⁾, is fixed at 1/2 or 1.

A total of five competing estimators are compared under each scenario. The inverse probability weighted (IPW) and three variations of doubly robust estimators rely on a correctly specified treatment assignment mechanism for optimal performance. Here, the treatment assignment probabilities are estimated via maximum likelihood, i.e., γ^=−argminγlogℒPS(O;γ) ,

logℒPS(O;γ)=ℙn[∑j=1KAjlogλj(X¯j−1;γ)+I(A¯j=0,S¯j−1=0)log{1−λj(X¯j−1;γ)}],

In addition to the ordinary double robust (USUAL) and improved doubly robust (IDR) estimators, we also computed a naïve improved doubly robust estimator, that ignores the variation in the estimator γ^ in optimizing α, i.e.β^IDRnaive estimates α by minimizing the L₂-norm ℙ_n{ψ_AIPW(O; β, γ₀, α)}². An outcome regression (OR) estimator is computed as

β^OR=ℙn[∑a1I{a1=d1(L0)}g1(a1,L0;α^)],

where α^=(α^y⊤,α^ξ⊤)⊤ . These estimators are compared in Monte Carlo studies through the following statistics: Monte Carlo bias and standard deviation (SSE), the Monte Carlo average of the sample standard errors (SEE), and empirical coverage probability (ECP) based on a Wald-type 95% confidence interval. Simulation results for a sample size of n = 300 are presented in Table 2.

Table 2

Simulation results for estimating the mean potential outcome in a dynamic treatment regime. Table entries include Monte Carlo bias, simulation standard error (SSE), standard error estimate (SEE), and empirical coverage probability (ECP) for a 95% confidence interval.

α_V⁽²⁾		Bias	SSE	SEE	ECP	Bias	SSE	SEE	ECP
0.5		PS correct, OR correct				PS incorrect, OR correct
	ipw	−0.013	0.247	0.238	0.920	−0.103	0.306	0.253	0.842
	or	−0.003	0.114	0.110	0.934	−0.014	0.114	0.111	0.946
	usual	0.003	0.192	0.193	0.962	−0.005	0.206	0.209	0.946
	idr_naive	0.005	0.194	0.194	0.956	0.013	0.204	0.214	0.956
	idr	0.006	0.193	0.196	0.954	0.017	0.207	0.220	0.960
		PS correct, OR incorrect				PS incorrect, OR incorrect
	ipw	−0.012	0.247	0.238	0.920	−0.102	0.306	0.253	0.842
	or	−0.263	0.111	0.110	0.360	−0.295	0.114	0.110	0.248
	usual	−0.001	0.207	0.216	0.936	−0.096	0.264	0.241	0.850
	idr_naive	0.002	0.195	0.198	0.950	−0.003	0.209	0.229	0.960
	idr	0.005	0.196	0.200	0.956	0.035	0.217	0.237	0.950
1.0		PS correct, OR correct				PS incorrect, OR correct
	ipw	−0.014	0.299	0.276	0.920	−0.183	0.340	0.282	0.770
	or	−0.008	0.125	0.122	0.954	−0.009	0.126	0.124	0.942
	usual	0.018	0.203	0.202	0.950	0.021	0.211	0.204	0.944
	idr_naive	0.017	0.202	0.202	0.944	0.038	0.219	0.217	0.936
	idr	0.019	0.202	0.206	0.948	0.040	0.220	0.226	0.954
		PS correct, OR incorrect				PS incorrect, OR incorrect
	ipw	−0.018	0.299	0.276	0.920	−0.187	0.340	0.282	0.770
	or	−0.530	0.133	0.132	0.014	−0.573	0.135	0.132	0.008
	usual	−0.001	0.288	0.293	0.926	−0.172	0.379	0.297	0.750
	idr_naive	0.006	0.226	0.224	0.962	0.003	0.236	0.248	0.956
	idr	0.013	0.210	0.218	0.954	0.071	0.241	0.259	0.932

When both PS and OR models are correctly specified, all estimators have relatively small bias and approximately correct coverage probabilities. The OR estimators have smaller finite sample variance than IPW and doubly robust (DR) estimators while the IPW estimator is least precise among all estimators, about 20% larger than any of the DR estimators. When the PS model is incorrect but the OR model is correct, the IPW estimator exhibits substantial finite sample bias while the other estimators do not. When the PS model is correct but the OR model is incorrect, the OR estimator exhibits substantial finite sample bias while the other estimators do not. These finite sample results agree with theoretical results and is the primary motivation for doubly robust estimators.

The usual and improved doubly robust (IDR) estimators have similar performance when the OR model is correct. The main advantage of the IDR estimator compared to the usual DR estimator occurs when the PS model is correct but the OR model is incorrect. In this case, we see that Monte Carlo standard deviation of the usual DR estimator is 6% larger when α_V⁽²⁾ = 0.5 and 37% larger when α_V⁽²⁾ = 1. Moreover, the simulation results when α_V⁽²⁾ = 1 show clearly that the naïve IDR estimator yields suboptimal estimates of the causal estimand even though the performance seems very similar to IDR when α_V⁽²⁾ = 0.5. The rationale for suboptimal performance, in general, is that choosing α^ to minimize the variance of the AIPW estimator when the PS is correct and γ₀ known is not the same as the variance of the AIPW estimator when PS is correct but γ₀ is unknown and must be estimated [4, 19, 21, 23, 27]. Thus, because α is not estimated optimally in the naïve IDR procedure, i.e.α^ does not converge in probability to α_opt, the naïve IDR estimator of the causal estimand is not locally efficient.

Additionally, in our simulation studies, the DR estimators had considerably better performance than the OR estimator even when both the PS and OR models were incorrect. Similar results have been shown by other authors [4, 23] and illustrate sensitivity of the DR estimators and scale of the bias when these estimators are not guaranteed to be consistent. In short, OR estimators are superior when the OR model is specified correctly. But if one is interested in guarding against possible misspecification of the OR model, doubly robust estimators are worth the effort.

As discussed in Section 3, the asymptotic covariance of β^IDR can be estimated by the empirical sandwich matrix. Alternatively, one can use resampling or perturbation methods to estimate the sampling variation of the estimator and can be justified along the lines of Lu and Johnson [10, Appendix]. As we demonstrate in Table 2, the empirical variance of the bootstrap resamples approximates well the sampling variation of the estimator and results in confidence intervals with approximately nominal coverage.

4.3 Data Analysis

We also demonstrate the utility of the methods through an analysis of data from the Enhanced Suppression of the Platelet IIb/IIIa Receptor with Integrilin Therapy (ESPRIT) trial [12]. Briefly, ESPRIT was designed to investigate a novel dosing regimen of eptifibatide infusion in patients undergoing elective stent percutaneous coronary intervention (PCI). For those patients randomized to the experimental infusion arm, eptifibatide was delivered as two boluses plus an infusion, with infusion starting immediately after the first bolus and continuing until it was stopped by physician discretion or treatment-competing event. Although the protocol provided general recommendations on when providers ought to stop infusion, the decision to stop infusion was ultimately left to physician choice. The protocol also articulated when infusion must be discontinued due to adverse events. Thus, the adaptive treatment length policy treats to the target time t_j or infusion-terminating event, whichever comes first. In the ESPRIT study, however, infusion continues until infusion-terminating event or when the attending physician chooses to discontinue it, whichever come first. The study endpoint is a composite endpoint of death, myocardial infarction (MI), or urgent target revascularization in 30 days. The techniques described above in Section 2, coupled with the details in Section 4.1, allow us to estimate the mean composite endpoint in the adaptive treatment length policy using data from the observed ESPRIT trial data.

Results from our data analysis on 1040 patients receiving the experimental infusion regimen are presented in Table 3. All estimators of causal estimands adjust for potential confounders including angina and weight at baseline as well as time-dependent enzyme level measured during post-surgery observation. Note that the composite endpoint is a Bernoulli outcome but that the IPW estimator will work the same regardless of whether the outcome is continuous or discrete; other estimator rely on outcome regression (OR) models via {g_j}. To exemplify potential advantages of the improved doubly robust estimator, we adopt linear regression models in {g_j} thus choosing a potentially poor link function in a generalized linear model for a binary outcome. We computed proportions of the composite endpoint under adaptive treatment length policies at t₁ = 16 and t₂ = 22 hours. We first observe that the range of estimates across methods is modest: 6.2–6.7% at 16 hours and 8.6 – 8.7% at 22 hours. However, the range in standard error estimates is huge by comparison. The standard error of the ususal doubly robust estimator is 77% larger than all other estimators, including the inefficient IPW estimator. We conjecture this large standard error of the usual doubly robust estimator is likely due to our choice of link function in {g_j}. But the improved doubly robust estimators counterbalance, in part, the deficiencies in the outcome regression models and yields a precise estimate of the estimand as the theory suggests ought to happen.

Table 3

Results from the infusion trial. Table entries are multiplied by 10⁴.

	ipw	usual	idr_naive	idr
t₁	624 (92)	667 (163)	630 (93)	626 (93)
t₂	866 (128)	864 (128)	870 (128)	877 (128)

5 Conclusions

In this paper, we proposed an improved doubly robust estimator of causal estimands in marginal models for dynamic treatment regimes [11] through a technique first proposed by [4]. The primary objective of the technique is to improve the precision of the doubly robust estimator when the propensity score (PS) via {π_j} is correctly specified but the outcome regression (OR) models via {g_j}, g_j = g_j(Ā_j, L̅_j₋₁; α), may be incorrectly specified. This is achieved by defining a new estimator for the regression parameter β to minimize the asymptotic variance of the modified AIPW estimator via (17).

Our numerical and real data examples in Section 4 illustrate potentially unnerving aspects of AIPW and ordinary doubly robust estimation [19, 20, 27]. While the augmentation in AIPW estimation is intended to improve the precision of the IPW estimator, specifying the OR models may not be straightforward for complex data and incorrectly specifying OR models can have unfortunate consequences. The {g_j} for dynamic regimes are examples of complicated OR regression functions and, as shown in Section 4.3, can have a dramatic adverse impact on statistical inference if modeled incorrectly. In our data example in Section 4.3, poor choice of models for {g_j} increased standard error estimates for one of the causal estimands by more than 75% compared to other inverse-weighted estimators. Improved doubly robust estimators do not suffer from these deficiencies but require more effort on the part of the data analyst.

As the proposed method aims to improve the efficiency of DR estimators in marginal models for dynamic treatment regimes, there are other methods [e.g.17, 19, 20, 27] that aim to improve the efficiency and robustness of ordinary doubly robust (DR) estimators. It would be interesting to compare some of these competing methods in examples of dynamic treatment regimes, such as the treatment length problem. Such investigation would take considerable programming effort and resources and it is beyond the scope of the current manuscript.

In this paper, we have imposed finite-dimensional parametric models on the nuisance parameters. Alternatively, one could relax this modeling assumption and use data-adaptive techniques to estimate the nuisance parameters. However, in this case, the asymptotic linearity of the resulting estimator will rely on certain rate assumptions for all the nuisance parameter estimators. Targeted learning and undersmoothing of regularization parameters offer a possible remedy to this issue [e.g.1, 24, 25]. Generalizations of these methods to treatment length policy settings merit further research.

References

[1] Benkeser, D., M. Carone, M. J. van der Laan, and P. Gilbert (2017): “Doubly robust nonparametric inference on the average treatment effect,” Biometrika, 104, 863–880.10.1093/biomet/asx053Search in Google Scholar PubMed PubMed Central

[2] Bickel, P., C. Klassen, Y. Ritov, and J. Wellner (1993): “Efficient and adaptive estimation for semiparametric models,” Baltimore: Johns Hopkins University Press.Search in Google Scholar

[3] Boos, D. D. and L. A. Stefanski (2013): Essential Statistical Inference, Springer: New York.10.1007/978-1-4614-4818-1Search in Google Scholar

[4] Cao, W., A. A. Tsiatis, and M. Davidian (2009): “Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data,” Biometrika, 96, 723–734, https://doi:10.1093/biomet/asp033.10.1093/biomet/asp033Search in Google Scholar PubMed PubMed Central

[5] Frangakis, C. E., T. Qian, Z. Wu, and I. Diaz (2015): “Deductive derivation and Turing-computerization of semiparametric efficient estimation,” Biometrics, 71, 867–874, https://doi:10.1111/biom.12362.10.1111/biom.12362Search in Google Scholar PubMed PubMed Central

[6] Godambe, V. (1960): “An optimum property of regular maximum likelihood estimation,” Annals of Mathematical Statistics, 31, 1208–1211.10.1214/aoms/1177705693Search in Google Scholar

[7] Horvitz, D. G. and D. J. Thompson (1952): “A generalization of sampling without replacement from a finite universe,” Journal of the American Statistical Association, 47, 663–685, https://doi:10.1080/01621459.1952.10483446.10.1080/01621459.1952.10483446Search in Google Scholar

[8] Johnson, B. A. and A. A. Tsiatis (2004): “Estimating mean response as a function of treatment duration in an observational study, where duration may be informatively censored,” Biometrics, 60, 315–323, https://doi:10.1111/j.0006-341X.2004.00175.x.10.1111/j.0006-341X.2004.00175.xSearch in Google Scholar PubMed

[9] Kang, D. Y. J. and J. L. Schafer (2007): “Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data (with discussion and rejoinder),” Statistical Science, 22, 523–539, https://doi:10.1214/07-STS227.10.1214/07-STS227Search in Google Scholar PubMed PubMed Central

[10] Lu, X. and B. A. Johnson (2015): “Direct estimation of the mean outcome on treatment when treatment assignment and discontinuation compete,” Biometrika, 102, 797–807, https://doi:10.1093/biomet/asv043.10.1093/biomet/asv043Search in Google Scholar

[11] Murphy, S. A., M. J. van der Laan, and J. M. Robins (2001): “Marginal mean models for dynamic regimes,” Journal of the American Statistical Association, 96, 1410–1423, https://doi:10.1198/016214501753382327.10.1198/016214501753382327Search in Google Scholar PubMed PubMed Central

[12] O’Shea, J. C., M. Madan, W. J. Cantor, C. M. Pacchiana, S. Greenberg, D. M. Joseph, M. M. Kitt, T. J. Lorenz, and J. E. Tcheng (2000): “Design and methodology of the esprit trial: evaluating a novel dosing regimen of eptifibatide in percutaneous coronary intervention,” American Heart Journal, 140, 834–839, https://doi:10.1067/mhj.2000.110458.10.1067/mhj.2000.110458Search in Google Scholar PubMed

[13] Qin, J., B. Zhang, and D. H. Y. Leung (2017): “Efficient augmented inverse probability weighed estimation in missing data problems,” Journal of Business & Economic Statistics, 35, 86–97, https://doi:10.1080/07350015.2015.1058266.10.1080/07350015.2015.1058266Search in Google Scholar

[14] Robins, J. M. (1997): “Causal inference from complex longitudinal data,” in M. Berkane, ed., Latent Variable Modeling and Applications to Causality, number 120 in Lecture Notes in Statistics, New York: Springer Verlag, 69–117, https://doi:10.1007/978-1-4612-1842-5_4.10.1007/978-1-4612-1842-5_4Search in Google Scholar

[15] Robins, J. M. (1999): “Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models,” Computation, Causation, and Discovery, 349–405.Search in Google Scholar

[16] Robins, J. M., A. Rotnizky, and L. P. Zhao (1994): “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American Statistical Association, 89, 846–866, https://doi:10.1080/01621459.1994.10476818.10.1080/01621459.1994.10476818Search in Google Scholar

[17] Rotnizky, A., Q. Lei, M. Sued, and J. M. Robins (2012): “Improved double-robust estimation in missing data and causal inference models,” Biometrika, 99, 439–456, https://doi:10.1093/biomet/ass013.10.1093/biomet/ass013Search in Google Scholar PubMed PubMed Central

[18] Rubin, D. B. (1986): “Comment: Which ifs have causal answers,” Journal of the American Statistical Association, 81, 961–962, https://doi:10.1080/01621459.1986.10478355.10.1080/01621459.1986.10478355Search in Google Scholar

[19] Tan, Z. (2006): “A distributional approach for causal inference using propensity scores,” Journal of the American Statistical Association, 101, 1619–1637, https://doi:10.1198/016214506000000023.10.1198/016214506000000023Search in Google Scholar

[20] Tan, Z. (2007): “Understanding OR, PS and DR,” Statistical Science, 22, 560–568, https://doi:10.1214/07-STS227A.10.1214/07-STS227ASearch in Google Scholar

[21] Tan, Z. (2010): “Bounded, efficient, and doubly robust estimation with inverse weighting,” Biometrika, 97, 661–682, https://doi:10.1093/biomet/asq035.10.1093/biomet/asq035Search in Google Scholar

[22] Tsiatis, A. A. (2006): Semiparametric Theory and Missing Data, Springer.Search in Google Scholar

[23] Tsiatis, A. A., M. Davidian, and W. Cao (2011): “Improved doubly robust estimation when data are monotonely coarsened, with application to longitudinal studies with dropout,” Biometrics, 67, 536–545, https://doi:10.1111/j.1541-0420.2010.01476.x.10.1111/j.1541-0420.2010.01476.xSearch in Google Scholar PubMed PubMed Central

[24] van der Laan, M. J. (2014): “Targeted estimation of nuisance parameters to obtain valid statistical inference,” International Journal of Biostatistics, 10, 29–57.10.1515/ijb-2012-0038Search in Google Scholar PubMed

[25] van der Laan, M. J., D. Benkeser, and W. Cai (2019): “Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso,” arXiv preprint arXiv:1908.05607.Search in Google Scholar

[26] van der Laan, M. J. and J. M. Robins (2003): Unified methods for censored longitudinal data and causality, Springer Science & Business Media.10.1007/978-0-387-21700-0Search in Google Scholar

[27] van der Laan, M. J. and D. Rubin (2006): “Targeted maximum likelihood learning,” International Journal of Biostatistics, 2, https://doi:10.2202/1557-4679.1043.10.2202/1557-4679.1043Search in Google Scholar

Received: 2019-09-24

Accepted: 2020-11-03

Published Online: 2020-12-31

This work is licensed under the Creative Commons Attribution 4.0 International License.

Improved Doubly Robust Estimation in Marginal Mean Models for Dynamic Regimes

Abstract

1 Introduction

2 Doubly Robust Estimation in Marginal Models for Dynamic Regimes

2.1 Notation and Assumptions

2.2 Ordinary doubly robust estimation

3 New methods

3.1 New Estimation with {π_j} as Known Functions

Remark 1

3.2 New Estimation with {π_j} as Functions of Unknown γ

3.3 Caveats

4 Examples

4.1 The Setup: Adaptive Treatment Length Studies

4.2 Simulation Studies

4.3 Data Analysis

5 Conclusions

References

Journal and Issue

Articles in the same Issue

Improved Doubly Robust Estimation in Marginal Mean Models for Dynamic Regimes

Abstract

1 Introduction

2 Doubly Robust Estimation in Marginal Models for Dynamic Regimes

2.1 Notation and Assumptions

2.2 Ordinary doubly robust estimation

3 New methods

3.1 New Estimation with {πj} as Known Functions

Remark 1

3.2 New Estimation with {πj} as Functions of Unknown γ

3.3 Caveats

4 Examples

4.1 The Setup: Adaptive Treatment Length Studies

4.2 Simulation Studies

4.3 Data Analysis

5 Conclusions

References

Journal and Issue

Articles in the same Issue

3.1 New Estimation with {π_j} as Known Functions

3.2 New Estimation with {π_j} as Functions of Unknown γ