Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter June 23, 2017

Longitudinal Mediation Analysis with Time-varying Mediators and Exposures, with Application to Survival Outcomes

  • Wenjing Zheng EMAIL logo and Mark van der Laan

Abstract:

1 In this paper, we study the effect of a time-varying exposure mediated by a time-varying intermediate variable. We consider general longitudinal settings, including survival outcomes. At a given time point, the exposure and mediator of interest are influenced by past covariates, mediators and exposures, and affect future covariates, mediators and exposures. Right censoring, if present, occurs in response to past history. To address the challenges in mediation analysis that are unique to these settings, we propose a formulation in terms of random interventions based on conditional distributions for the mediator. This formulation, in particular, allows for well-defined natural direct and indirect effects in the survival setting, and natural decomposition of the standard total effect. Upon establishing identifiability and the corresponding statistical estimands, we derive the efficient influence curves and establish their robustness properties. Applying Targeted Maximum Likelihood Estimation, we use these efficient influence curves to construct multiply robust and efficient estimators. We also present an inverse probability weighted estimator and a nested non-targeted substitution estimator for these parameters.

1 Introduction

An exposure often acts on an outcome of interest directly, and/or indirectly through the mediation of some intermediate variables. Identifying and quantifying these two types of effects contribute to further understanding of the underlying causal mechanism. Modern developments in formal nonparametric causal inference have produced many advances in causal mediation analysis in non-longitudinal settings. (e.g. Robins and Greenland [1], Pearl [2], Robins [3], Petersen et al. [4], van der Laan and Petersen [5], VanderWeele [6], Hafeman and VanderWeele [7], Imai et al. [8, 9], Pearl [10], Tchetgen Tchetgen and Shpitser [11, 12], Zheng and van der Laan [13], Lendle and van der Laan [14]). Causal mediation in a longitudinal setting, by contrast, has received relatively little attention. One option is the controlled direct effect (e.g. Pearl [2]), which compares the outcomes under different exposure regimens while the mediators are fixed to some common pre-specified values. Its analysis is very similar to that of a time-varying exposure in a non-mediation setting; we refer the reader to existing literature on this topic (e.g. Robins [15], Hernan et al. [16], Stitelman et al. [17], Petersen et al. [18]). Controlled direct effects are of interest if the exposure effect at a particular mediator value constitutes a meaningful scientific question. But in many cases, one wishes to ask a different direct effect question: what would be the effect of exposure on the outcome if the mediator acts as if exposure was absent? This question is formalized using the so-called natural direct effect parameter by Robins and Greenland [1] and Pearl [2] in a non-longitudinal setting. The natural direct effect has a complementary natural indirect effect; together they provide a decomposition of the overall effect of the exposure on the outcome. The challenges in extending the above mediation formulation to the longitudinal setting have been studied in Avin et al. [19], which established that the corresponding natural direct effect and indirect effect parameters would not be identifiable in the presence of confounders of the mediator-outcome relationship that are affected by the exposure. Such confounders, however, are ubiquitous in longitudinal applications.

As an alternative, random interventions (RI) based formulation to causal mediation was proposed in Didelez et al. [20]. Under this formulation, a mediator is regarded as an intervention variable onto which a given distribution is enforced, as opposed to a counterfactual variable resulting from a different intervention. The corresponding natural direct effect and indirect effect parameters have different interpretation than those under the formulations in Robins and Greenland [1] and Pearl [2], but their identifiability is at hand even in the presence of exposure-induced mediator-outcome confounders (e.g. VanderWeele et al. [21]). Zheng and van der Laan [22] proposed an RI formulation to longitudinal causal mediation, through conditional mediator distributions, in a survival setting with point exposure and time-varying mediators and confounders. VanderWeele and Tchetgen Tchetgen [23] proposed an RI formulation, through marginal mediator distributions, in a time-varying exposure and mediator setting. In this paper, we extend the work in Zheng and van der Laan [22] to formulate causal mediation through conditional mediator distributions in general time-varying exposure and mediator settings with survival or non-survival outcomes.

The challenges in longitudinal mediation analysis are exemplified in the different effects captured (and corresponding identifiability conditions) under the marginal distribution intervention in VanderWeele and Tchetgen Tchetgen [23] and the conditional distribution intervention proposed here. We will illustrate these differences with examples and further discussions in Section 2.4.

The natural direct and indirect effects proposed here can all be defined in terms of the corresponding version of the mediation formula. We develop a general semiparametric inference framework for this conditional mediation formula. More specifically, in Section 3, we will derive the efficient influence curves under a locally saturated semiparametric model, and establish their robustness properties. In Section 4, we present three estimators for the conditional mediation formula: a nested non-targeted substitution estimator, which uses a regression-based representation of the identifying expression, an Inverse Probability Weighted (IPW) estimator, and an efficient and multiply robust Targeted Maximum Likelihood Estimator (TMLE). We study the empirical performance of these estimators in a simulation study in Section 5.

2 The mediation formula, natural direct and indirect effects

2.1 Data structure, causal model and likelihood

Consider the data structure

O=L0,A1,R1,Z1,L1,,At,Rt,Zt,Lt,,Aτ,Rτ,Zτ,LτYτP0,

where L0 encodes baseline covariates, and for t1, At encodes the time-varying exposure (and may also include a censoring indicator), Zt denotes the time-varying mediators. Rt and Lt denote the time-varying covariates. This representation of the data structure encodes a time-ordering of the variables within each time t: At precedes other variables, Rt are covariates that may precede Zt and Lt, and Lt are covariates preceded by At,Rt,Zt. The variables in Lt may include the outcome process YtLt. In particular, Lτ will include the final outcome of interest Yτ. This data structure allows for confounders of the exposure-outcome relation and exposure-induced confounders of the mediator-outcome relation, both within time and across time. In a survival setting with right censoring, the outcome indicator Yt indicates whether one has died by time t, i.e. Yt=I(T˜<=t,TC) with survival time T, censoring time C and T˜=min(T,C). We would encode the intervention variables as At=(AtC,AtE), AtC is the indicator of remaining uncensored by time t and AtE is the exposure of interest at time t. In the case of a survival outcome or if censoring exists, Zt,Rt,Lt are encoded with a default value after censoring or death. After a linear transformation, one may assume that Yt is bounded between 0 and 1. The data consists of n i.i.d. copies of O.

We consider an example from diabetic care. Suppose within a large primary healthcare network, all diabetic patients are to receive ongoing education sessions and regular nutrition counseling (meetings with counselor), in addition to their routine care with their family physician. Complications are referred to higher level specialists. Consider a pilot program that integrates simplified referral procedures, enhanced curriculum for education sessions and more streamlined operations for nutrition counseling (e.g. less wait time, easier scheduling). Each year, patients may opt in and out of this program. Suppose we wish to evaluate how much the effect of the program on long-term control of type 2 diabetes is mediated by changes in attendance of nutrition counseling. The final outcome of interest is blood glucose levels at five years after diagnosis (Yτ). Observations are taken annually (t=0 is time of diagnosis, τ=5). The exposure of interest At is whether patient is in the program at year t; the mediator of interests Zt is whether the patient has utilized his/her alloted nutrition counseling meetings this year. The time-varying covariates Rt denote attendance in education sessions and patient knowledge around disease and self-care (as assessed by survey). The time-varying covariates Lt denote health-related covariates such as nutritional status, disease progression, comorbidities, glucose tests results, etc. The covariates Rt are affected by program participation, and may affect utilization of nutrition counseling (better educated patients may have higher utilization of nutrition counseling) as well as subsequent disease progression and nutritional status. The covariates Lt are affected by current and previous program participation and patient engagement (some captured in Rt and Zt), and will affect subsequent program participation and patient engagement. Therefore, there are time-varying confounders of the exposure-outcome relation, as well as time-varying exposure-induced confounders of the mediator-outcome relation.

From here on, for any 1tτ and a time-dependent variable V, we will use the boldface Vt to denote the vector V1,,Vt, and use Vs,t to denote the vector Vs,,Vt. When referring to the entire vector Vτ, we will also use the shorthand V. Degenerate indices such as V1 signify the empty set. We encode the time-ordering of the variables using the following Non-Parametric Structural Equations Model [24]:

L0=f(UL0),At=fAt(At1,Rt1,Zt1,Lt1,UAt),Rt=fRt(At,Rt1,Zt1,Lt1,URt),Zt=fZt(At,Rt,Zt1,Lt1,UZt),Lt=fLt(At,Rt,Zt,,Lt1,ULt).

In words, within each time point, we assume the temporal relation between the measured variables is exposure At, then covariates Rt, then mediator Zt, then covariates Lt, then outcome. For each variable V, the model posits that V is an unknown deterministic function of all variables preceding it and some unmeasured exogenous factors. It is important to note that our formulation, identifying expression, and proposed estimators can be adapted to other choices of temporal ordering.

The observed data structure is generated from the above structural equations model without any intervention, and the likelihood of OP0 can be factored into the following conditional probabilities according to that time-ordering:

(1)p0(O)=p0,L0(L0)×t=1τ(p0,A(AtAt1,Rt1,Zt1,Lt1)p0,R(RtAt,Rt1,Zt1,Lt1)×p0,Z(ZtAt,Rt,Zt1,Lt1)p0,L(LtAt,Rt,Zt,Lt1).

In the case of a survival outcome or if censoring exists, subsequent At,Zt,Rt,Lt are assigned a default value with probability 1 after censoring or death, and therefore do not contribute to the likelihood.

2.2 The counterfactual outcome under conditional mediator distribution

Let a(a1,,aτ) and a(a1,,aτ) be two possible exposure regimens. Let

Γtaztrt,zt1,lt1pZt(a)=ztRt(a)=rt,Zt1(a)=zt1,Lt1(a)=lt1

denote the conditional probabilities of the mediators at t1, if the exposure had been set to A=a in the population. At each time t, within each stratum rt,zt1,lt1, this provides a random draw of ZtΓta. For convenience, we will denote Γˉa=Γ1a,,Γτa.

Consider an intervention on the structural equations model to statically set At=at and randomly draw ZtΓta. For Xt{Rt,Lt}, t1, we will use Xt(a,Γˉa) to denote the corresponding covariates resulting from this intervention. In terms of an ideal experiment, the data would be generated as follows. At baseline, we measure the covariates L0 (say L0=l0). At t=1, intervene to set A1=a1, and measure the resulting covariates R1(a,Γˉa) (say it’s measured to be r1). Then, intervene to draw Z1 according Γ1al0,r1. Suppose we have drawn value z1, then we measure the resulting covariates L1(a,Γˉa) (say it’s measured to be l1). At time t=2, intervene to set A2=a2; measure the resulting covariates R2(a,Γˉa) (say it’s measured to be r2); intervene to draw Z2 according to Γ2al0,r1,z1,l1,r2; measure the resulting covariates L2(a,Γˉa). We continue in this manner through the time points. At the end of the experiment, we denote the final outcome as Yτ(a,Γˉa) – this is the outcome if exposures were set to fixed values a and the mediators were set to have conditional distributions Γta. In contrast to the traditional non-random intervention formulation, Γta is not the would-be mediator on the same person had she been under a different exposure, but simply a random variable whose distribution is specified by a given conditional probability function Γta and the person’s accruing history.

Let’s illustrate this ideal experiment with the motivating example. Suppose a=1 denotes program participation throughout the study, and a=0 denotes non-participation. Under the intervention described in the above experiment, at time t=1, a person with covariates L0=l0 is assigned to the program at year 1, i.e. A1=1. His education session attendance and knowledge attainment, R1(a,Γˉa), is a consequence of this participation (say it is likely to be high as a result of program). The distribution of his nutrition counseling utilization Z1 would be Γ10, i.e. that of a person with his same baseline characteristics l0 and his high diabetes education, but who did not participate in the program, say we drew this value to be z1. His disease progress, nutritional status and comorbidities L1(a,Γˉa) would be a consequence of his baseline l0, his program participation A1=1, his high diabetic education r1, and his counseling utilization pattern Γ10=z1. At year t=2, this person remains in the program i.e. A2=1. His education session attendance and knowledge attainment, R2(a,Γˉa) is a consequence of having been in the program for two years, had high attendance r1 and had nutrition counseling utilization z1 at year 1. The distribution of his nutrition counseling utilization Z2 would be Γ20, i.e. that of a person with his same baseline characteristics l0, who did not participate in the program so far, but had his same diabetes education history r1,r2 in the two years, nutrition counseling utilization z1 at year 1, and disease progression at year 1. Say we drew Γ20=z2. His year 2 disease progression, nutritional status and comorbidities L2(a,Γˉa) would be a consequence of his baseline l0, his program participation A2=1, his diabetic education r1,r2, his counseling utilization patterns z1,z2, and his disease progression at year 1. We continue the ideal experiment in this manner, the final counterfactual Yτ(1,Γˉ0) is the outcome of a person participating in the program, but has the nutrition counseling utilization pattern of an individual sharing his baseline characteristics, disease progression and comorbidities development, and diabetic education, but who otherwise did not participate in the program.

2.3 Causal parameters and identifiability

From the above formulation, we define as the conditional mediation formula:

(2)EYτ(a,Γˉa).

To contrast the effects of two exposure regimens on an outcome at time τ, the corresponding natural indirect effect is defined as

(3)EYτ(1,Γˉ1)EYτ(1,Γˉ0),

and the natural direct effect is

(4)EYτ(1,Γˉ0)EYτ(0,Γˉ0).

These two effects provide a decomposition of the total effect EYτ(1,Γˉ1)EYτ(0,Γˉ0). As we will see in the next section, this total effect is the same mathematical quantity as the traditional total effect measure EYτ(1)EYτ(0), where Yτ(a) is the would-be outcome under intervention to set exposure A=a and no intervention on Z.

In our example, the pilot program can impact disease progression as a result of increased patient education and increased utilization of nutrition counseling; the latter can be due to more streamlined counseling operations and/or due to better educated patients. Suppose compared to other factors, program participation has large impact on increased R1 (better diabetic-educated patients). Then the variable R1(1,Γˉ0)=r1 would be relatively high, and the nutrition counseling utilization Z1 would have distribution Γ10(l0,r1) i.e. that of an individual who did not participate in the program, but shares same baseline characteristics L0=l0 and has the same high diabetes education. Suppose the program’s more streamlined nutrition counseling operations has large impact on increased utilization of this service, then Γ10(l0,r1) would be heavily distributed around lower utilization and Γ11(l0,r1) would be distributed around higher utilization for the same individual characterized by (l0,r1). Subsequently, L1(1,Γˉ0) vs L1(1,Γˉ1) would be the nutritional status and disease progression of individuals with the same baseline l0, diabetes education level r1, but with lower vs higher nutrition counseling utilization patterns as a result of programmatic improvements. Then, the indirect effect EYτ(1,Γˉ1)EYτ(1,Γˉ0) would capture the indirect effect (on disease progression) of differential nutrition counseling utilization due to program’s streamlined operation of this service, but not due to program’s effect on differential demand for nutrition counseling as a result of better educated patients. So this indirect effect compares only the paths from exposures (program) into mediators (nutrition counseling utilization), but not from exposures into covariates (patient education) into mediators. Dual to this, the direct effect EYτ(1,Γˉ0)EYτ(0,Γˉ0) capture the paths from exposure into final outcome (diabetes control), from exposure into covariates into final outcome, as well as the paths from exposure into covariates into mediator into final outcome. In our example, this last set of paths would be the differential effect on diabetes progression due to increased demand for nutrition counseling as a result of the program producing better educated patients.

To proceed with the identifiability result, let us denote Lt(a,z) the counterfactual covariate at time t under an intervention to deterministically set A=a and Z=z. The identifiability of the corresponding causal parameters only rely on the Sequential Randomization Assumptions of Robins [25] and positivity assumptions.

Lemma 1

Suppose the following assumptions hold

  1. Rst(a),Zst(a),Lst(a)AtAt1,Rt1,Zt1,Lt1.

  2. Rst(a,z),Lst(a,z)AtAt1,Rt1,Zt1,Lt1. In words, A1 and A2 require that conditional on observed history, there are no unmeasured confounders of the relationship between each exposure At and all its subsequent covariates and mediators, i.e. At is randomized conditional on observed history

  3. Rs>t(a,z),Lst(a,z)ZtAt,Rt,Zt1,Lt1. Conditional on observed history, there are no unmeasured confounders of the relationship between each mediator Zt and all its subsequent covariates, i.e. Zt is randomized conditional on observed past.

  4. Positivity: for all t1 and all r,l,z, (i) if p0(at1,rt1,zt1,lt1)>0, then p0(atat1,rt1,zt1,lt1)>0; (ii) if p0(at1,rt1,zt1,lt1)>0, then p0(atat1,rt1,zt1,lt1)>0; (iii) If p0(rtat,rt1,zt1,lt1)>0, then p0(rtat,rt1,zt1,lt1)>0; (iv) If p0(ltat,rt,zt,lt1)>0, then p0(ltat,rt,zt,lt1)>0; (v) if p0(at,rt,zt1,lt1)>0 and p0(ztat,rt,zt1,lt1)>0, then p0(ztat,rt,zt1,lt1)>0. Conditions (i) and (ii) require that the exposures of interest are observed within each supported covariate and mediator stratum; (iii) and (iv) require that covariate values supported under A=a are also supported under A=a, and (v) requires that the mediator values supported under A=a are also supported under A=a.

Then the conditional mediation formula in eq. (2) identifies to

(5)EYτ(a,Γˉa)=Ψa,a(P0)r,l,z{yτp0,L0(l0)×t=1τp0,R(rtat,rt1,zt1,lt1)p0,Z(ztat,rt,zt1,lt1)p0,L(ltat,rt,zt,lt1)}.

Consequently, the natural indirect and direct effects are respectively identified to

(6)EYτ(1,Γˉ1)EYτ(1,Γˉ0)=Ψ1,1(P0)Ψ1,0(P0)
(7)EYτ(1,Γˉ0)EYτ(0,Γˉ0)=Ψ1,0(P0)Ψ0,0(P0).

proof.

In appendix.

As we alluded to in our earlier discussion, when a=a and the identifiability assumptions hold, then, EYτ(a,Γˉa)=Ψa,a(P0)=EYτ(a). Therefore, the proposed natural direct and indirect effects provide a decomposition of the total effect

EYτ(1)EYτ(0)=EYτ(1,Γˉ1)EYτ(1,Γˉ0)+EYτ(1,Γˉ0)EYτ(0,Γˉ0).

The positivity assumptions (i) and (ii) are typical in the study of total exposure effects, whereas (iii) – (v) are unique to the proposed conditional random intervention. Assumptions A1–A3 are the so-called strong sequential randomization assumptions.

In our illustrative example, at a period t, assumptions A1 and A2 requires that the history of program participation, nutrition counseling and covariates (diabetes education, disease progression, comorbidities, nutritional status, lifestyle factors, etc.) account for all the confounders of the relationship between current program participation and current and future covariates and nutrition counseling utilization. Assumption A3 requires that current program participation, and previous history of program participation, nutritional service utilization and other covariates, account for all the confounders of the relationship between current nutrition counseling utilization and current and future covariates. These assumptions would be violated if, for example, there was an unrecorded event that would affect one’s current program participation as well as current and future nutrition counseling utilization and covariates.

2.4 Longitudinal mediation analysis with marginal vs. conditional random interventions

An alternative formulation of longitudinal mediation analysis has been proposed in VanderWeele and Tchetgen Tchetgen [23] using random interventions with marginal mediator distributions (conditioning only on baseline covariates). Specifically, let Gta be the marginal distribution of Zt (conditioning on baseline L0) under an ideal experiment setting A=a, i.e. Gta(ztL0)=pZt(a=ztL0. Let Ga(G1a,,Gτa). In our illustrative example, to generate Yτ(1,G0), at time t=1, a person with baseline characteristic L0=l0 is set to participate in the program, and his diabetes education level R1 is high as a result. But his nutritional service utilization Z1 would have distribution G10, which corresponds to that of a person with the same baseline L0=l0 but who does not participate in the program and has a resulting low diabetes education level. His disease progression L1(1,G0)=l1 would be a consequence of his L0, his program participation, his resulting high diabetes education level, and his counseling service utilization pattern G10. The difference in the formulation of Yτ(a,Ga) and the proposed Yτ(a,Γˉa) lies in that at each time t, Gta is drawn as that of a random person with A=a and sharing the same baseline (i.e. marginalizing over all time-varying covariate histories under the influence of A=a), whereas Γta is drawn as that of a random person with A=a, and sharing the same baseline and same time-varying covariate history (which are under the influence of A=a). This difference in formulation has implications for applicability, interpretation and identifiability. We discuss each in turn.

In a survival setting, the marginal-intervention counterfactual Yτ(a,Ga) is not well-defined since a person who is still alive under A=a would be allowed to draw the mediator value of someone who has died under A=a . On the other hand, by conditioning on the person’s own time-varying history, the conditional-intervention counterfactual Yτ(a,Γˉa) circumvents this problem and thus is well-defined in the survival setting. Beyond formal definition, such time-varying covariate histories still need to be well-supported under both exposure regimens, as is apparent in the identifiability conditions for Yτ(a,Γˉa).

Consider now a non-survival setting. In our illustrative example, suppose program participation increases patients’ diabetic education through its enhanced curriculum, and increases use of nutrition counseling utilization through better patient education and more streamlined counseling services, and both education and nutrition counseling have equal direct contribution to changing disease progression. Then, at time t=1, under both interventions, the patient education R1(1,Γˉ0) and R1(1,G0) would be high due to program participation A1=1. But the nutritional service utilization G10 would be that of a non-participant who has a low diabetic education due to his non-participation in the program (e.g. less demand for nutrition counseling due to limited patient knowledge), whereas the nutritional service utilization Γ10 would be that of a non-participant who has a high diabetic education but who otherwise did not participate in the program. Therefore, the indirect effect EYτ(1,G1)EYτ(1,G0) would capture the effect due to differential nutritional service utilization as a result of both the program’s streamlined nutrition counseling services and the increased demand for these services due to program’s impact on increasing patient education. It would capture effect due to the paths from program into nutrition service utilization as well as the paths from program into patient education into nutritional service utilization. Contrast this with the indirect effect EYτ(1,Γˉ1)EYτ(1,Γˉ0), which captures the effect due to differential nutritional service utilization as a result of the program’s streamlined nutrition counseling services (but not of increased demand due to better education). Dual to the indirect effect, the direct effect EYτ(1,G0)EYτ(0,G0) capture the effects due to direct paths from program into disease progression and the paths from program into patient education into disease. Contrasting these with the definitions and motivating example in Section 2.3, we see that the different mediation formulations in this longitudinal setting allows one to ask different mediation questions, as the increased complexity of the data structure also offers more options for pathway analyses of interest.

We saw in eq. (5) that EYτ(a,Γˉa)=EYτ(a) as mathematical quantities, though under different ideal experiment formulation. On the other hand, as derived in VanderWeele and Tchetgen Tchetgen [23],

EYτ(a,Ga)=r,l,z{yτp0,L0(l0)t=1τp0,R(rtat,rt1,zt1,lt1)p0,L(ltat,rt,zt,lt1)×[r,lτ1t=1τp0,R(rtat,rt1,zt1,lt1)p0,Z(ztat,rt,zt1,lt1)×t=1τ1p0,L(ltat,rt,zt,lt1)]}.

Therefore, EYτ(a,Ga) does not equal EYτ(a), in the presence of time-varying confounding. Hence, the total effect measure EYτ(1,G1)EYτ(0,G0) is an alternative quantification of the total effect to the traditional EYτ(1)EYτ(0). Hence, the choice of formulation also depends on which total effect decomposition one wishes to study.

While the proposed conditional distribution intervention provides more flexibility by allowing application in survival setting and decomposition of the standard total effect, this, however, comes at the expense of stronger identifiability conditions. To identify the marginal intervention parameter EYτ(a,Ga), one can use the weaker versions of assumptions A2–A3 which only require no unmeasured confounding with respect to the final outcome of interest (as opposed to all subsequent covariates). For instance, in our example, if there was an unrecorded short-term event (e.g. short-term unemployment) that would affect one’s current program participation as well as current disease status, lifestyle, nutritional service utilization, but not the final outcome, then EYτ(a,Ga) would still be identified, whereas EYτ(a,Γˉa) would not. Therefore, in non-survival settings, the tradeoff of identifiability versus effect interpretation and total effect decomposition would need to be carefully weighted.

The rest of this paper is devoted to the statistical inference of Ψa,a(P0) in eq. (5) and the corresponding natural indirect and direct effects in eqs (6) and (7).

3 Efficient influence curve

In this section, we establish a general semiparametric inference framework for these parameters. In particular, we derive the Efficient Influence Curves (EIC) of eqs (5), (7) and (6) under a (locally saturated) semiparametric model, and establish their robustness properties. For a given pathwise-differentiable parameter Ψ, under certain regularity conditions, the variance of the EIC of Ψ is a generalized Cramer-Rao lower bound for the variances of the influence curves of asymptotically linear estimators of Ψ. Therefore, the variance of the EIC provides an efficiency bound for the regular and asymptotically linear (RAL) estimators of Ψ. Moreover, under a locally saturated model, the influence curve of any RAL estimator is in fact the EIC. We refer the reader to Bickel et al. [26] for general theory on efficient semiparametric inference.

3.1 Nested expectation representation of the conditional mediation formula

Let M denote a locally saturated semiparametric model containing the true data generating distribution P0. Following an important observation by Bang and Robins [27], we define recursively the following functionals for t=τ,,1, at PM.

(8)QˉRτ+1a,aYτ.\;\;\; Then, for each t=τ,,1:QˉLta,aRt,Zt,Lt1ltQˉRt+1a,aRt,Zt,Lt1,ltpltAt=at,Rt,Zt,Lt1QˉZta,aRt,Zt1,Lt1ztQˉLta,aRt,Zt1,zt,Lt1pztAt=at,Rt,Zt1,Lt1QˉRta,aRt1,Zt1,Lt1rtQˉZta,aRt1,rt,Zt1,Lt1prtAt1=at1,Rt1,Zt1,Lt1.

Evaluating these functionals at the data generating P0, we obtain a nested expectation-based representation of the identifying expression (5):

(9)Ψa,a(P0)=EP0QˉR1a,a(P0)L0.

3.2 Notations

We will use Qˉa,a to denote the nested expectations QˉLta,a,QˉZta,a,QˉRta,a:t1.

We use Pn to denote the empirical distribution of n i.i.d. copies of OP0. Given a function Of(O), Pnf denotes the empirical mean Pnf1ni=1nf(Oi).

3.3 Efficient influence curves for the mediation formula and the direct and indirect effects

The mediation formula in eq. (5) can be considered as the value at P0 of the map PΨa,a(P)EPQˉR1a,a(P)L0 on M. In particular, this map depends on P through Qˉa,a, i.e. Ψa,a(P)=Ψa,a(Qˉa,a). Similarly, the natural direct effect in eq. (7) and the natural indirect effect in eq. (6) are, respectively, the values at P0 of the maps PΨNDE(P)=Ψ1,0(P)Ψ0,0(P) and PΨNIE(P)=Ψ1,1(P)Ψ1,0(P).

Theorem 1

[Efficient Influence Curve] Let Ψa,a:MR be defined as above. Suppose at PM the conditional probabilities of At, Rt, Zt, Lt under the likelihood decomposition eq. (1) are all bounded away from 0 and 1. The efficient influence curve of Ψa,a at P is given by D,a,a(P)D,a,a(P,Ψa,a(Qˉa,a)), with

(10)D,a,a(P,ψ)t1τDLta,a(P)+DZta,a(P)+DRta,a(P)+DL0a,a(P,ψ),

where

DLta,a(P)HLta,aQˉRt+1a,aRt,Zt,LtQˉLta,aRt,Zt,Lt1DZta,a(P)HZta,aQˉLta,aRt,Zt,Lt1QˉZta,aRt,Zt1,Lt1DRta,a(P)HRta,aQˉZta,aRt,Zt1,Lt1QˉRta,aRt1,Zt1,Lt1DL0a,a(P,ψ)QˉR1a,a(L0)ψ,

with

(11)HLta,aI(Atat)j1tpAajaj1,Rj1,Zj1,Lj1j1tpZ(Zjaj,Rj,Zj1,Lj1)pZ(Zjaj,Rj,Zj1,Lj1),
(12)HZta,aI(Atat)j1tpA(ajaj1,Rj1,Zj1,Lj1)×j1t1pL(Ljaj,Rj,Zj,Lj1)pL(Ljaj,Rj,Zj,Lj1)j1tpR(Rjaj,Rj1,Zj1,Lj1)pR(Rjaj,Rj1,Zj1,Lj1)
(13)HRta,aI(Atat)j1tpAajaj1,Rj1,Zj1,Lj1j1t1pZ(Zjaj,Rj,Zj1,Lj1)pZ(Zjaj,Rj,Zj1,Lj1),

Moreover, D,a,a(P) is a multiply robust estimating function of Ψa,a(P) in the sense that if one of the following holds:

  1. The conditional probabilities pR, pL and pZ correctly specified;

  2. The conditional probabilities pA, pR and pL are correctly specified;

  3. The conditional probabilities pA and pZ are correctly specified;

then EP0D,a,a(P)=0 implies Ψa,a(P)=Ψa,a(P0).

Proof.

See appendix.

It is easy to note that if a=a, then eq. (10) equals the efficient influence curve for the overall treatment effect of a time varying exposure (see e.g. van der Laan and Gruber [28]). The EICs of both the NDE and NIE can be derived from eq. (10) by a simple application of the delta method. We state them in a corollary without proof.

Suppose the conditions in theorem 1 hold for a,a{0,1}. The efficient influence curve of the natural direct effect is given by

D,NDE(P)(O)=D,1,0(P)D,0,0(P),

and the efficient influence curve of the natural indirect effect is given by

D,NIE(P)(O)=D,1,1(P)D,1,0(P).

Moreover, D,NDE and D,NIE satisfy the same robustness condition in theorem 1 for a=0,1 and a=0,1.

The variances VarP0(D,a,a(P0)), VarP0(D,NDE(P0)), and VarP0(D,NIE(P0)) are generalized Cramer-Rao lower bounds for the asymptotic variances of the RAL estimators of Ψa,a(P0), ΨNDE(P0), and ΨNIE(P0), respectively.

Estimators that satisfy the EIC equations will also inherit their robustness properties. We will present three estimators in the next section, one of which is robust and locally efficient.

3.3.1 Notes on estimating components of the Efficient Influence Curve

The parameter of interest eq. (5) and the corresponding EIC eq. (10) are represented in terms of conditional probabilities pR,pL,pZ,pA. In applications where the covariates or the mediator are high-dimensional, estimating these conditional densities may be difficult. To proceed with the estimation in these situations, firstly we note that due to the law of iterated expectations,

(14)QˉLta,aRt,Zt,Lt1=EPQˉRt+1a,aRt,Zt,LtAt=at,Rt,Zt,Lt1QˉZta,aRt,Zt1,Lt1=EPQˉLta,aRt,Zt,Lt1At=at,Rt,Zt1,Lt1QˉRta,aRt1,Zt1,Lt1=EPQˉZta,aRt,Zt1,Lt1At1=at1,Rt1,Zt1.Lt1.

Therefore, one may directly estimate the expectation QˉLta,aRt,Zt,Lt1 by regressing QˉRt+1a,aRt,Zt,Lt on the covariates (At,Rt,Zt,Lt1) using a parametric or data-adaptive algorithm, without estimating the conditional probabilities of Lt. Similarly for the expectations corresponding to Zt and Rt.

Secondly, to replace the estimation of densities for Zt, Lt and Rt in eqs (11)– (13) with estimation of conditional probabilities of A, we define

γ1,s,j(AsAs1,Rj,Zj,Lj1)p(AsAs1,Rj,Zj,Lj1)

and

γ2,s,j(AsAs1,Rj,Zj1,Lj1)p(AsAs1,Rj,Zj1,Lj1).

Note that these conditional probabilities of As differ from the conditional probabilities encoded by pA in that the γ’s do not condition on the time-ordered parents of As. However, as we shall see in the following lemma, they offer an alternative to obtain robust estimators that are more suitable to real life settings where Lt,Rt,Zt may be high dimensional.

After a few algebraic derivations (Appendix), we can rewrite the expressions in the EIC as

HLta,a=I(Atat)j1tpAajaj1,Rj1,Zj1,Lj1×j1ts=1jγ1,s,j(asas1,RjZj,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1γ1,s,j(asas1,RjZj,Lj1)HRta,a=I(Atat)j1tpAajaj1,Rj1,Zj1,Lj1×j1t1s=1jγ1,s,j(asas1,RjZj,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1γ1,s,j(asas1,RjZj,Lj1)

and

HZta,a=I(Atat)j1tpA(ajaj1,Rj1,Zj1,Lj1)×j1t1s=1jγ1,s,j(asas1,RjZj,Lj1)γ1,s,j(asas1,RjZj,Lj1)×j1tpA(ajaj1,Rj1,Zj1,Lj1)pA(ajaj1,Rj1,Zj1,Lj1)s=1jγ2,s,j(asas1,Rj,Zj1,Lj1γ2,s,j(asas1,Rj,Zj1,Lj1)

We write γ=γ1,s,t,γ2,s,t:1tτ,st. Based on this representation, the robustness conditions in theorem 1 can be generalized.

Corollary 2

Let γ be defined as above. If one of the following holds,

  1. The nested regressions Qˉa,a, as represented in eq. (14), are correctly specified;

  2. pA, QˉRta,a, QˉLta,a, and either (pL,pR) or γ are correctly specified;

  3. pA, QˉZta,a, and either pZ or γ are correctly specified,

then EP0D,a,a(P)=0 implies Ψa,a(P)=Ψa,a(P0).

4 Estimators

In this section, we develop a nested non-targeted substitution estimator, an IPW estimator and a TMLE for the mediation formula (5). The estimators for the natural direct and indirect effects can be obtained by taking the corresponding differences. The first two estimators are consistent if the estimates of all the relevant components of P0 are consistent. On the other hand, the TMLE satisfies the efficient influence curve equation, and hence remains unbiased under the model mis-specifications described in theorem 1. Under appropriate regularity conditions, if all the nuisance parameters are consistently estimated, then TMLE will be asymptotically efficient (e.g. Bickel et al. [26], van der Laan and Robins [29], van der Laan and Rose [30]).

Let pn,A, pn,L, pn,R and pn,Z denote the estimators of the conditional probabilities. We will use shorthand pˉn to denote these estimators. Let

Qˉna,aQˉn,Lta,a,Qˉn,Zta,a,Qˉn,Rta,a:t

denote the estimators of the nested expectations. These may be density-based estimators that are obtained by plugging in the density estimates pn,L, pn,R and pn,Z into the definition of the expectations in eq. (8), or they may be regression-based estimators that are obtained using the relations in eq. (14).

4.1 Non-targeted substitution estimator

The identification formula in eq. (5) which defines that statistical estimand is generally known as the G-computation formula [25]. Readily, it delivers a non-targeted substitution estimator, which is generally known as the G-computation estimator. To avoid estimation of densities, one can recast it in terms of Qˉa,a, as they are represented in eqs (9) and (14), and obtain a non-targeted substitution estimator Ψa,a(Qˉna,a) of Ψa,a(P0), through non-targeted estimates of the regressions Qˉa,a.

To estimate the series of nested conditional expectations Qˉa,a(P0), we can use the following algorithm, which exploits the relations in eq. (14) to make use of available regression techniques in the literature.

  1. Initiate QˉRτ+1a,aYτ. In our example this is the final glucose level, one may have dummy values for those that were lost to follow up.

  2. At each t=τ,1, in decreasing order, we would have obtained estimators Qˉn,Rt+1a,a from the previous step. We now obtain Qˉn,Lta,a, Qˉn,Zta,a, and Qˉn,Rta,a, in that order, as follows:

    1. Regress Qˉn,Rt+1a,a(Rt,Zt,Lt) on observed values (At,Rt,Zt,Lt1) among observations that remained uncensored at time t. In our example, the independent variables in this regression would be histories of program participation, diabetic education, and nutrition counseling attendance up to time t and disease progression and other health-related variables up to t1. We then evaluate the fitted function at the observed mediator and covariates histories Rt,Zt,Lt1 and the intervened exposure At=at for these uncensored observations. This results in the estimates Qˉn,Lta,a(Rt,Zt,Lt1)=EQˉn,Rt+1a,a(Rt,Zt,Lt)At=at,Rt,Zt,Lt1 for those uncensored individuals.

    2. Regress the newly minted Qˉn,Lta,a(Rt,Zt,Lt1) on At,Rt,Zt1,Lt1 among observations that remained uncensored at time t. In our example, the independent variables in this regression would be histories of program participation and diabetic education up to time t, and nutrition counseling attendance, disease progression and other health-related variables up to t1. We then evaluate the fitted function at the observed mediator and covariate histories Rt,Zt1,Lt1 and the intervened exposure levels At=at for these uncensored observations. This results in the estimates Qˉn,Zta,a(Rt,Lt1,Zt1).

    3. Regress the newly obtained Qˉn,Zta,a(Rt,Lt1,Zt1) on At,Rt1,Zt1,Lt1 among observations that remained uncensored at time t. In our example, the independent variables in this regression would be histories of program participation up to time t, and diabetic education, nutrition counseling attendance, disease progression and other health-related variables up to t1. We then evaluate the fitted function at the observed mediator and covariate histories Rt1,Zt1,Lt1 and the intervened exposure levels At1=at1 for observations that were uncensored at time t1. This results in the estimates Qˉn,Rta,a(Rt1,Lt1,Zt1).

  3. After running the algorithm in step (2) sequentially from t=τ down to t=1, we now have Qˉn,R1a,a(L0) for each of the n observations.

The Non-Targeted Substitution Estimator is given by

(15)ψnNTsubΨa,a(Qˉna,a)=1ni=1nQˉn,R1a,a(L0,i)

Consistency of ψnNTsub relies on consistency of Qˉna,a. Correct specification of Qˉa,a(P0) under a finite dimensional parametric model is possible only in limited applications. Alternatively, we may use machine learning algorithms, such as Super Learner. This option is more enticing, especially when used with the regression-based approach, since there are more data-adaptive techniques available to estimate the conditional mean via regression. Variance estimates of this estimator can be obtained by bootstrap. Theoretical results on the asymptotic behavior, such as a central limit theorem, of the resulting estimator Ψa,a(Qˉna,a) are not available. Moreover, a non-targeted estimator Qˉna,a of Qˉa,a(P0) is obtained by minimizing a global loss function for Qˉa,a(P0), not for Ψa,a(P0). This means, in particular, that the bias-variance tradeoff in Qˉna,a is optimized for the high-dimensional nuisance parameter Qˉa,a(P0), instead of a much lower-dimensional parameter of interest Ψa,a(P0). The proposed targeted estimator in Section 4.3 aims to address these two issues by providing a substitution estimator that is asymptotically linear (under appropriate regularity conditions), and optimizes the bias-variance tradeoff of Qˉna,a towards Ψa,a(P0) via an updating step.

4.2 Inverse probability weighted estimator

Instead of estimating the condition expectations Qˉa,a(P0), one may wish to employ the researcher’s knowledge about the treatment assignment and mediator densities. To this end, consider the following function:

(16)DIPW,a,a(pA,pZ)(O)YτI(Aτ=aτ)j=1τpA(ajaj1,Rj1,Zj1,Lj1)j=1τpZ(Zjaj,Rj,Zj1,Lj1)pZ(Zjaj,Rj,Zj1,Lj1).

Note that

EP0DIPW,a,a(P0)=r,z,l{yτ1j=1τp0,A(ajaj1,rj1,zj1,lj1)j=1τp0,Z(zjaj,rj,zj1,lj1)p0,Z(zjaj,rj,zj1,lj1)×p0,L0(l0)t=1τ[p0,A(atat1,rt1,zt1,lt1)p0,R(rtat,rt1,zt1,lt1)×p0,Z(ztat,rt,zt1,lt1)p0,L(ltat,rt,zt,lt1)]}=Ψa,a(P0)=0.

Therefore, given estimators pn,A and pn,Z, the Inverse Probability Weighted (IPW) Estimator of Ψa,a(P0) is given by

(17)ψnIPW1ni=1nDIPW,a,a(pn,A,pn,Z)(Oi).

In our example, this estimate can be obtained by taking the weighted average of the final outcome of all those with observed exposure level Aτ=aτ, using weights 1j=1τpA(ajaj1,Rj1,Zj1,Lj1)j=1τpZ(Zjaj,Rj,Zj1,Lj1)pZ(Zjaj,Rj,Zj1,Lj1). The factors pA would be the probabilities of having program participation Aj=aj at each time, under the individual’s covariate (diabetes education and health status) and mediator history, and the factors pZ are the conditional probabilities of nutritional service utilization at each time, given the individual’s observed covariates and utilization history, and under the two program exposures considered.

Consistency of ψnIPW relies on consistency of pA and pZ. As noted in Section 3.3.1, if Z is high dimensional, we may replace estimation of the densities pZ with estimation of the conditional probabilities γ,s,t. These can be estimated by regressing As onto the corresponding independent variables, for every pair (s,t). This way, using eq. (11), we can rewrite

(18)DIPW,a,a(pA,pZ)=DIPW,a,a(pA,γ)=YτI(Aτaτ)j1τpAajaj1,Rj1,Zj1,Lj1×j1τs=1jγ1,s,j(asas1,RjZj,Lj1)γ1,s,j(asas1,RjZj,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1γ2,s,j(asas1,Rj,Zj1,Lj1).

This IPW estimator is an asymptotically linear estimator; its influence curve involves DIPW,a.a plus a first-order residue due to estimation of pZ, which is part of the parameter definition. If a parametric model is used to estimate p0,Z, the influence curve for this residual term can be derived using the Delta method; otherwise, more sophisticated techniques [31] must be used. The corresponding variance estimation, while important, is beyond the scope of this paper. We will approximate nψnIPWΨa,a(P0) by the sample variance VarˆDIPW,a.a(pn,A,pn,Z).

Due to its inverse weighting by treatment and censoring probabilities, this estimator could be sensitive to near positivity violations. In particular, if the outcome of interest has a bounded range, the IPW estimator is not guaranteed to stay within this range when the inverse weights become large. Substitution estimators like the non-targeted estimator in eq. (15) and the next estimator, TMLE, can slightly mitigate this problem by incorporating global information in the parameter map. However, the effect of near positivity violations can still take form of poor smoothing in these estimators.

4.3 Targeted maximum likelihood estimator

To maximize finite sample gain and provide more stable estimates in the presence of near positivity violations, one can make use of the substitution principle. The targeted maximum likelihood estimation (TMLE, [32]) provides a substitution-based estimator which also satisfies the EIC equation, thereby remaining unbiased under specific types of model mis-specifications.

In a glimpse, our strategy consists of updating in a targeted manner the initial estimators Qˉna,a of Qˉa,a(P0) by minimizing a pre-specified loss along a least favorable sub-model through Qˉna,a, and then obtaining a substitution estimator of the parameter by evaluating Ψa,a at the updated estimator Qˉn,a,a. As a result of this updating procedure, pˉn and Qˉn,a,a satisfy PnD,a,aQˉn,a,a,pˉn=0, and hence the estimator Ψa,a(Qˉn,a,a) is multiply robust, as specified in theorem 1 and corollary 2.

From the nested relationships noted in eq. (14), to update the estimators of QˉLta,a, QˉZta,a and QˉRta,a, we will use loss functions

(19)L(QˉLta,a)QˉRt+1a,alogQˉLta,a+(1QˉRt+1a,a)log1QˉLta,a,L(QˉZta,a)QˉLta,alogQˉZta,a+(1QˉLta,a)log1QˉZta,a,L(QˉRta,a)QˉZta,alogQˉRta,a+(1QˉZta,a)log1QˉRta,a.

Recall that upon linear transformation, our outcome is bounded between 0 and 1, and hence these loss functions are well-defined. We define the corresponding least favorable submodels through QˉLta,a, QˉZta,a and QˉRta,a, respectively, to be

(20)QˉLta,a(ϵ)=expitlogitQˉLta,a+ϵ,QˉZta,a(ϵ)=expitlogitQˉZta,a+ϵ,QˉRta,a(ϵ)=expitlogitQˉRta,a+ϵ,

and note that

ϵHLta,a(pA,pZ)LQˉLta,a(ϵ)ϵ=0=DLta,a(Qˉa,a,pA,pZ),
ϵHZta,a(pA,pL,pR)LQˉZta,a(ϵ)ϵ=0=DZta,a(Qˉa,a,pA,pL,pR),
ϵHRta,a(pA,pZ)LQˉRta,a(ϵ)ϵ=0=DRta,a(Qˉa,a,pA,pZ).

We are now ready to describe the TMLE algorithm, which will estimate, in a targeted manner, QˉLta,a, QˉZta,a and QˉRta,a sequentially in order of decreasing t.

  1. Obtain estimators pn,A,pn,Z,pn,L,pn,R (if high dimensional settings, estimation of pn,Z,pn,L,pn,R can be replaced with the estimators γn for γ defined in Section 3.3.1). These estimators will be used to obtain estimates Hn,Lta,a, Hn,Zta,a and Hn,Rta,a, see eqs (11), (12) and (13).

  2. Initiate QˉRτ+1,a,aYτ.

  3. At each t=τ,1, in decreasing order, we have obtained targeted estimator Qˉn,Rt+1,a,a from a previous step. We now obtain targeted estimator Qˉn,Lt,a,a, Qˉn,Zt,a,a, and Qˉn,Rt,a,a, in that order, as follows:

    1. Regress Qˉn,Rt+1,a,a(Rt,Zt,Lt) on observed values (At,Rt,Zt,Lt1) among observations that remained uncensored at time t. We then evaluate the fitted function at the observed mediator and covariates histories Rt,Zt,Lt1 and the intervened exposure At=at for these uncensored observations. This results in the estimates Qˉn,Lta,a(Rt,Zt,Lt1) for those uncensored individuals. Update this estimate using Qˉn,Lt,a,aQˉn,Lta,a(ϵn,Lt), where

      ϵn,LtargminϵPnHLta,a(pˉn)LQˉn,Lta,a(ϵ).

      This ϵn,Lt is the coefficient of a weighted logistic regression of Qˉn,Rt+1,a,a(Rt,Zt,Lt) onto the intercept model with an offset logitQˉn,Lta,a(Rt,Zt,Lt1), and weights HLta,a(pˉn)At,Rt,Zt,Lt1.

    2. Next, to obtain an initial estimator Qˉn,Zta,a, we regress the targeted estimate Qˉn,Lt,a,a(Rt,Zt,Lt1) constructed above on At,Rt,Zt1,Lt1 among observations that remained uncensored at time t. We then evaluate the fitted function at the observed mediator and covariate histories Rt,Zt1,Lt1 and the intervened exposure levels At=at for these uncensored observations. This results in the initial estimates Qˉn,Zta,a(Rt,Lt1,Zt1). Update this estimate using Qˉn,Zt,a,aQˉn,Zta,a(ϵn,Zt), where{

      ϵn,ZtargminϵPnHZta,a(pˉn)LQˉn,Zta,a(ϵ).
    3. Finally, to obtain an initial estimator Qˉn,Rta,a, we regress the targeted estimate Qˉn,Zt,a,a(Rt,Zt1,Lt1) constructed above on At,Rt1,Zt1,Lt1 among observations that remained uncensored at time t. We then evaluate the fitted function at the observed mediator and covariate histories Rt1,Zt1,Lt1 and the intervened exposure levels At1=at1 for observations that are uncensored at time t1. This results in the initial estimates Qˉn,Rta,a(Rt1,Lt1,Zt1). Update this estimate using Qˉn,Rt,a,aQˉn,Rta,a(ϵn,Rt), where{

      ϵn,RtargminϵPnHRta,a(pˉn)LQˉn,Rta,a(ϵ).

      }

  4. After running the algorithm in step (3) sequentially from t=τ down to t=1, we have targeted estimates Qˉn,R1,a,a(L0) for each of the n observations.

The TMLE for Ψa,a is given by

(21)ψnTMLEΨa,a(Qˉn,a,a)=1ni=1nQˉn,R1,a,a(L0,i)

By construction, this estimator satisfies PnD,a,a(Qˉn,a,a,pˉn)=0. Consequently, it inherits robustness of EIC described in Section 3.3. Under certain regularity conditions, it is efficient at the data-generating P0, with influence curve D,a,a(P0). We could estimate the asymptotic variance of nψnTMLEΨa,a(P0) can be estimated using the sample variance VarˆD,a,aQˉn,a,a,pˉn.

5 Simulation study

In this section, we conduct a simulation study to evaluate the comparative performance of these three estimators in estimating the mediation formula (5) for survival outcome Yτ.

5.1 Data generating distribution

Consider the data structure O=L0,A1,R1,Z1,L1,,Aτ,Rτ,Zτ,Lτ, with τ=2. L0 encodes two baseline covariates L01 and L02, At encodes a censoring indicator AtC of whether patient remained in the study by time t and a binary exposure AtE, Rt encodes covariates at time t that are directly affected by At and may influence Zt and Lt, Zt is a binary mediator of interest, Lt includes a time varying covariate Lt1 and a death indicator Yt of whether patient had died by time t. These variables are distributed according to the following data generating distribution

L01Bern(0.4);L02Bern(0.6);AtCBern(expit(1.50.8L020.4I(t>1)×Lt11+I(t=1)L01+0.5I(t>1)At1E))AtEBern(expit(0.1+1.2L02+0.7I(t>1)×Lt11+I(t=1)L010.1I(t>1)At1E))RtBern(expit(0.8+AtE+0.1I(t>1)×Lt11+I(t=1)L01+0.3I(t>1)×Rt1+I(t=1)L02))ZtBern(expit(0.5+0.8L02+0.8AtE+Rt))Lt1Bern(expit(1+0.3L02+AtE+0.7Zt0.2I(t>1)Lt11))YtBern(expit(0.2+1.5L02+Rt+0.2Lt10.3AtE0.3Zt0.2AtE×Zt0.1I(t>1)Rt1)).

After either censoring or death, all subsequent variables take a default value. The target parameter of interest is Ψ1,0(P0)0.912. To obtain this approximate, we first generated a large sample (1,000,000 observations) using the above distributions by setting AtE=0 in the equation for Zt and AtE=1 elsewhere, as well as assigning the indicator AtC=1 (i.e. all remain in study), and then take the sample mean outcome Yτ in this large sample.

5.2 Estimators

Correctly specified conditional probabilities pˉ=pA,pZ,pR,pL are obtained using logistic regressions as specified in the data-generating distributions. We estimate QˉLta,a, QˉZta,a and QˉRta,a using the regression-based approach: the so-called correctly specified estimators are obtained using Super Learner to regress the expectant on all the parent exposure, mediator and covariate history up to point t, as described in steps in Sections 4.1 and 4.3. The mis-specified counterparts of pˉ and Qˉa,a only regress on At, in the case of nuisance parameters related to Zt and Rt; or only regress on At and Zt, in the case of nuisance parameters related to Lt; or uses the marginal distribution, in the case of pA. We note here that due to the nature of nested regressions, the so-called “correct” Qˉa,a do not actually correctly specify the functional form of these expectations, they only adjust for all the relevant terms; this abuse of terminology is meant to contrast with the estimators which omit important covariates. The Super Learner will be implemented using the default library of candidate algorithms, which include glm,stepAIC, bayesglm, each coupled with a correlation-based variable screening method, as well as a version with no variable screening.

We will implement the Non-Targeted Substitution Estimator (NTsub, Section 4.1), the Inverse Probability Weighted Estimator (IPW, Section 4.2) and the Targeted Maximum Likelihood Estimator (TMLE, Section 4.3) using these nuisance parameter specifications.

6 Results

We considered sample sizes n=400 and n=4000. Bias, variance and mean squared error (MSE) for each sample size were estimated over the 1,000 datasets. In the Table 1 below, legend for model specifications were as follows:

Table 1

Bias, variance and MSE over 1,000 simulations.

{@{}llrrrr}BiasVar
n40040004004000
All correctTMLE6.57e-041.87e-051.32e-031.01e-04
IPW1.12e-034.63e-051.24e-031.05e-04
NTsub8.42e-047.53e-049.81e-047.93e-05
Qˉa,a correctTMLE1.02e-032.61e-041.35e-031.15e-04
IPW4.20e-036.33e-031.86e-031.78e-04
L misspec.TMLE4.63e-043.11e-051.22e-031.02e-04
IPW1.12e-034.63e-051.24e-031.05e-04
NTsub5.28e-035.85e-036.64e-046.30e-05
Z misspec.TMLE1.41e-025.61e-033.02e-032.51e-04
IPW1.22e-021.46e-022.37e-032.12e-04
NTsub7.66e-027.73e-023.53e-032.93e-04
Qˉa,a misspec.TMLE1.90e-047.73e-051.25e-031.02e-04
NTsub6.86e-036.95e-037.33e-047.05e-05

As predicted by general robustness conditions in corollary 2, when the nested regressions Qˉa,a are correct and the intervention node probabilities are mis-specified (“Qˉa,a correct”), TMLE provides bias reduction over a mis-specified IPW. Similarly, when only the covariate-related nuisance parameters (pR,pL and QˉRta,a,QˉLta,a) are mis-specified (“L misspec.”), TMLE also provides substantial bias reduction over the mis-specified NTsub. When only the Zt-related nuisance parameters (pZ and QˉZta,a) are mis-specified (“Z misspec.”), TMLE provides bias reduction over the mis-specified NTsub estimator across sample sizes, but its bias reduction over IPW is only apparent after large sample sizes. When Qˉa,a are all misspecified, but correct conditional probabilities pA,pZ,pL,pR are used in the weights Ha,a of the TMLE updating step, we still observe bias reduction of TMLE over NTsub.

We recall that the correctly specified Qˉa,a in this implementation are only correct up to specification of key terms, but not functional form, as that is difficult to implement. This may have resulted in certain loss of finite sample gain for TMLE under “all correct” specifications, as it was expected to be more efficient than the IPW, and this efficiency gain was not apparent until larger sample sizes. This may also have contributed to speed of convergence of the corresponding correct NTsub estimator.

7 Summary

In this paper, we proposed a random interventions approach to formulate parameters of interest in longitudinal mediation analysis with time varying mediator and exposures. Specifically, we defined the random interventions based on conditional distributions for the mediator. In comparison to an alternative random interventions formulation based on marginal distributions of the mediator [23], the proposed formulation capture different pathways of mediated effects, allows for a natural decomposition of the total exposure effect, and for application in survival settings, but it also trades-off stronger sequential randomization assumptions.

Under the RI formulation, the treatment of interest as well as the mediator variables are regarded as intervention variables. Under the proposed formulation, one can obtain a total effect decomposition and the subsequent definition of natural direct and indirect effects that are analogous to those in Pearl [2]. By regarding the mediator variables as intervention variables, the RI formulation requires external specification of a conditional mediator distribution. This aims to capture, say, a natural direct effect that would be measured if one could first run a trial on a random sample of the population where all receive exposure A=0 to ascertain the conditional distributions of mediator under null exposure Γt0, and then run a second trial on a separate sample where one would randomly assign exposures to A=1 vs. A=0 and intervene on mediators to have conditional distributions Γt0. It is also important to note that causal mediation, under either RI or non-RI approaches, presupposes that the mediator of interest is amenable to external manipulation. In applications where the mediator variables are not amenable to intervention in practice, we should be cautious that causal mediation could only offer answers to purely mechanistic questions defined under hypothetical experiments.

The second contribution of this paper is a general semiparametric inference framework for the resulting effect parameters. More specifically, efficient influence curves under a locally saturated semiparametric model are derived, and their robustness properties are established. In many applications where the mediator densities are difficult to estimate, regression-based estimators of these iterated expectations are viable alternatives to substitution-based estimators that rely on consistent estimation of the mediator densities. We also developed the non-targeted substitution estimator, IPW estimator and TMLE for the mediational formula.

Funding statement: This work was supported by NIH grant 2R01 A1074345-07.

Acknowledgements

We would also like to thank Ivan Diaz and Ilya Shpitser for the literature references and helpful discussions. We are very grateful to two anonymous reviewers for their generous and constructive comments and insights that helped improve this paper.

Appendix

8

8.1 Proof of Lemma 1 identifiability result

For Xt(Rt,Lt), we also denote Xta,Γˉ1,sa,zs,t the counterfactual covariate generated at time t by intervening to set At=at,Z1,sΓˉ1,sa,Zs,t=zs,t.

E(Yτ(a,Γˉa))=r,l,zyτpL0=l0×t{pRt(a,Γˉa)=rtL0=l0,Rt1(a,Γˉa)=rt1,Γˉt1a=zt1,Lt1(a,Γˉa)=lt1×pΓta=ztL0=l0,Rt(a,Γˉa)=rt,Γˉt1a=zt1,Lt1(a,Γˉa)=lt1×pLt(a,Γˉa)=ltL0=l0,Rt(a,Γˉa)=rt,Γˉt(a)=zt,Lt1(a,Γˉa)=lt1}.

By definition of Γta, pΓta=ztRt(a,Γˉa)=rt,Γˉt1a=zt1,Lt1(a,Γˉa)=lt1pZt(a)=ztRt(a)=rt,Zt1(a)=zt1,Lt1(a)=lt1. This quantity arises in traditional longitudinal total causal effect problems where one sets A=a, and measure Z and L (Robins [25]). Therefore, under A1, it is identifiable as the conditional probability p0Zt=ztAt=at,Zt1=zt1,Lt1=lt1 from the observed data distribution.

To identify the conditional probabilities of Rt(a,Γˉa) and Lt(a,Γˉa)we demonstrate the steps for the first two, the results for the subsequent covariates can be induced thereafter. Firstly

pR1(a,Γˉa)=r1L0=l0=pR1(a1)=r1L0=l0=pR1(a1)=l1A1=a1,L0=l0=p0R1=r1A1=a1,L0=l0.

The first equality is by definition of the counterfactuals R1(a,Γˉa). The second equality is due to the assumption A2 that given L0, A1 is independent of R1(a). The last equality follows from consistency. Next,

pL1(a,Γˉa)=l1L0=l0,R1(a,Γˉa)=r1,Γ1a=z1=pL1(a1,z1)=l1L0=l0,R1(a,Γˉa)=r1,Γ1a=z1=pL1(a1,z1)=l1L0=l0,R1(a,Γˉa)=r1=pL1(a1,z1)=l1L0=l0,R1(a1)=r1=p0L1=l1L0=l0,A1=a1,R1=r1,Z1=z1

The first equality is by definition of the counterfactuals L1(a,Γˉa) and L1(a,z). The second equality is due to the fact that in our ideal experiment conditional on L0=l0 and R1(a,Γa)=R1(a1)=r1, Z1 is a random draw from the distribution Γ1a(l0,r1), and does not affect the covariates L1(a1,z1), whose value only depend on R1(a1)=r1 and l0. The last equality follows from the usual argument of sequential randomization under static interventions on (A,Z) by applying assumptions A2, A3.

The positivity assumptions in A4 assure that the conditional probabilities in the identifying expression (5) are well-defined.

8.2 Proof of Theorem 1

For any PM, we recall the likelihood decomposition in eq. (1):

p(O)=p(L0)×t=1τ(p(AtAt1,Rt1,Zt1,Lt1)p(RtAt,Rt1,Zt1,Lt1)×p(ZtAt,Rt,Zt1,Lt1)p(LtAt,Rt,Zt,Lt1).

For Oj{L0,At,Rt,Zt,Lt:t=1,,τ}, let POj denote the conditional probability of POjOjPa(Oj).

Let L2(P) denote the Hilbert space of mean zero functions of O, endowed with the covariance operator. Consider a rich class of one-dimensional parametric submodels P(ϵ) that are generated by only fluctuating POj. Under our model, no restrictions are imposed on the conditional probabilities POj. As a result, given any function SOjL2(P) of (Oj,Pa(Oj)) with finite variance and EP(SOj(Oj,Pa(Oj))Pa(Oj))=0, the fluctuation POj(ϵ)=(1+ϵSOj(Oj,Pa(Oj)))POj is a valid one-dimensional submodel with score SOj. Therefore, the tangent subspaces corresponding to fluctuations of each POj are given by

T(PL0)={SL0(L0):EP(SL0)=0}T(PAt)={SAt(At,At1,Rt1,Zt1,Lt1):EP(SAtAt1,Rt1,Zt1,Lt1)=0}T(PRt)={SRt(Rt,At,Rt1,Zt1,Lt1):EP(SRtAt,Rt1,Zt1,Lt1)=0}T(PZt)={SZt(Zt,At,Rt,Zt1,Lt1):EP(SZtAt,Rt,Zt1,Lt1)=0}T(PLt)={SLt(Lt,At,Rt,Zt,Lt1):EP(SLtAt,Zt,Rt,Lt1)=0}.

Due to the factorization in eq. (1), T(POi) is orthogonal to T(POj) for OiOj. Moreover, the tangent space T(P), corresponding to fluctuations of the entire likelihood, is given by the orthogonal sum of these tangent subspaces, i.e. T(P)=jT(POj), and any score S(O)T(P) can be decomposed as jSOj(O).

Under this generous definition of the tangent subspaces, any function S(O) that has zero mean and finite variance under P is contained in T(P). This implies in particular that any gradient for the pathwise derivative of Ψa,a() is contained in T(P), and is thus in fact the canonical gradient. Therefore, it suffices to show that D,a,a() in eq. (10) is a gradient for the pathwise derivative of Ψa,a().

Indeed, for any S(O)=jSOj(O)T(P), let PS(ϵ) denote the fluctuation of P with score S. Under appropriate regularity conditions, the pathwise derivative at P can be expressed as

(22)ddϵΨa,a(PS(ϵ))ϵ=0=ddϵϵ=0r,z,ly{(1+ϵSL0)pL0(l0)×t=1τ((1+ϵSRt)pRtrtat,rt1zt1,lt1×(1+ϵSZt)pZtztat,rt,zt1,lt1(1+ϵSLt)pLtltat,rt,zt,lt1)}=r,z,lypL0(l0)PR(ra,z,l)PZ(za,r,l)PL(la,r,z)t=1τSLt
(23)+r,z,lypL0(l0)PR(ra,z,l)PZ(za,r,l)PL(la,r,z)t=1τSZt
(24)+r,z,lypL0(l0)PR(ra,z,l)PZ(za,r,l)PL(la,r,z)t=1τSRt
(25)+r,z,lypL0(l0)PR(ra,z,l)PZ(za,r,l)PL(la,r,z)SL0,

where PR(ra,z,l)t=1τpR(rtAt=at,rt1,zt1,lt1), similarly for PL. Analogously, PZ(za,r,l)t=1τpZ(ztAt=at,rt,zt1,lt1).

Note firstly that by definition of DtL and SLt, for every t=1,,t0,

EPDLta,a(Rt,Zt,Lt)SLt(At,Rt,Zt,Lt)=r,z,lypL0(l0)PR(ra,z,l)PZ(za,r,l)PL(la,r,z)SLt.

Moreover, DLta,a(P)(Rt,Zt,Lt) satisfies EPDLta,a(P)(Rt,Zt,Lt)Rt,Zt,Lt1=0. Therefore DLta,a(P)T(PLtPa(Lt)) by the definition of these tangent subspaces. It thus follows from the orthogonal decomposition of T(P) that

EPDLta,a(P)×SLt=EPDLta,a(P)SL0+t=1τSAt+SZt+SLt.

By similar arguments, eq. (23) can be written as

EPDZta,a(P)×SZt=EPDZta,a(P)SL0+t=1τSAt+SZt+SLt,

equation (24) can be written as

EPDRta,a(P)×SRt=EPDRta,a(P)SL0+t=1τSAt+SZt+SLt.

and eq. (25) can be written as

EPDL0a,a(P)×SL0=EPDL0a,a(P)SL0+t=1τSAt+SZt+SLt.

Combining these results, one concludes that

ddϵΨa,a(PS(ϵ))ϵ=0=EPDL0a,a(P)+t=1τDRta,a(P)+DZta,a(P)+DLta,a(P)S

Therefore, D,a,a(P)DL0a,a(P)+t=1τDRta,a(P)+DZta,a(P)+DLta,a(P) is a gradient for the pathwise derivative of Ψa,a at P. As discussed above, under the nonparametric model, D,a,a(P) is in fact the canonical gradient.

The first case of the robustness condition is trivial. In the second case, correct pL and pR yield that EP0DRta,a(P)=0 and EP0DLta,a(P)=0; correct pL, pR and pA produce a telescopic sum over t of DZta,a. Specifically

EP0tDZta,a(P)=lτ1,r,zQˉLτ(P0)a,a(r,z,lτ1)p0,L0(l0)P0,R(ra,z,lτ1)P0,Z(za,r,lτ1)P0,L(lτ1a,r,z)EP0QˉZ1a,a(R1,L0)=Ψa,a(P0)EP0QˉZ1a,a(R1,L0)

On the other hand, EP0DL0a,a(P)=EP0r1p0,R(r1l0,a1)QˉZ1a,a(r1,L0)Ψa,a(P). Therefore, EP0D,a,a(P)=0 implies that

0=EP0D,a,a(P)=EP0tDZta,a(P)+EP0DL0a,a(P)=Ψa,a(P0)EP0QˉZ1a,a(R1,L0)+EP0r1p0,R(r1l0,a1)QˉZ1a,a(r1,L0)Ψa,a(P)=Ψa,a(P0)Ψa,a(P).

Similarly, in the third case, correct pZ yields P0DZta,a(P)=0; the correct pA and correct pZ will produce the desired telescopic sums over t of EP0DLta,a(P)+EP0DRta,a(P), leaving

0=EP0D,a,a(P)=tEP0DLta,a(P)+EP0DRta,a(P)+EP0DL0a,a(P)=Ψa,a(P0)EP0QˉR1a,a(L0)+EP0QˉR1a,a(L0)Ψa,a(P)=Ψa,a(P0)Ψa,a(P).

8.3 Derivations of γ in Section 3.3.1

Firstly for j=1,,t, we note that

pZZjaj,Rj,Zj1,Lj1=pZj,aj,Rj,Zj1,Lj1paj,Rj,Zj1,Lj1=paj,Rj,Zj,Lj1paj,Rj,Zj1,Lj1=s=1jpas,as1,Rj,Zj,Lj1p(Rj,Zj,Lj1)s=1jpas,as1,Rj,Zj1,Lj1p(Rj,Zj1,Lj1)s=1jγ1,s,j(asas1,RjZj,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1)p(Rj,Zj,Lj1)p(Rj,Zj1,Lj1).

The last equality is the definition of γ. Consequently,

pZZjaj,Rj,Zj1,Lj1pZZjaj,Rj,Zj1,Lj1=paj,Rj,Zj,Lj1paj,Rj,Zj1,Lj1paj,Rj,Zj1,Lj1paj,Rj,Zj,Lj1=s=1jγ1,s,j(asas1,RjZj,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1)p(Rj,Zj,Lj1)p(Rj,Zj1,Lj1)×s=1jγ2,s,j(asas1,Rj,Zj1,Lj1)γ1,s,j(asas1,RjZj,Lj1)p(Rj,Zj1,Lj1)p(Rj,Zj,Lj1)=s=1jγ1,s,j(asas1,RjZj,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1)γ2,s,j(asas1,Rj,Zj1,Lj1)γ1,s,j(asas1,RjZj,Lj1)

This derives the alternative expression of HLta,a and HRta,a in Section 3.3.1. Next, for HZta,a, we note that

pLLjaj,Rj,Zj,Lj1×pRRjaj,Rj1,Zj1,Lj1=paj,Rj,Zj,Ljpaj,Rj,Zj,Lj1×paj,Rj,Zj1,Lj1paj,Rj1,Zj1,Lj1=paj,Rj,Zj1,Lj1paj,Rj,Zj,Lj1×paj,Rj,Zj,Ljpajaj1,Rj1,Zj1,Lj1paj1,Rj1,Zj1,Lj1=s=1jγ2,s,j(asas1,RjZj1,Lj1)s=1jγ1,s,j(asas1,Rj,Zj,Lj1)p(Rj,Zj1,Lj1)p(Rj,Zj,Lj1)×paj,Rj,Zj,Ljpajaj1,Rj1,Zj1,Lj1paj1,Rj1,Zj1,Lj1.

When we take the product for a in HZta,a, we obtain

j=1t1pLLjaj,Rj,Zj,Lj1×j=1tpRRjaj,Rj1,Zj1,Lj1=j=1tpaj,Rj,Zj1,Lj1j=1t1paj,Rj,Zj,Lj1×j=1t1paj,Rj,Zj,Ljj=1tpajaj1,Rj1,Zj1,Lj1paj1,Rj1,Zj1,Lj1
=j=1ts=1jγ2,s,j(asas1,RjZj1,Lj1)j=1t1s=1jγ1,s,j(asas1,Rj,Zj,Lj1)p(Rj,Zj1,Lj1)p(Rj,Zj,Lj1)×1j=1tpajaj1,Rj1,Zj1,Lj1p(L0).

Consequently

j=1t1pLLjaj,Rj,Zj,Lj1pLLjaj,Rj,Zj,Lj1×j=1tpRRjaj,Rj1,Zj1,Lj1pRRjaj,Rj1,Zj1,Lj1=j=1t1s=1jγ1,s,j(asas1,Rj,Zj,Lj1)γ1,s,j(asas1,Rj,Zj,Lj1)j=1ts=1jγ2,s,j(asas1,RjZj1,Lj1)γ2,s,j(asas1,RjZj1,Lj1)×j=1tpajaj1,Rj1,Zj1,Lj1pajaj1,Rj1,Zj1,Lj1.

This derives the alternative expression of HZta,a in Section 3.3.1.

References

1. Robins JM, Greenland, S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992;3:143–55.10.1097/00001648-199203000-00013Search in Google Scholar PubMed

2. Pearl J. Direct and indirect effects. Proceedings of the seventeenth conference on uncertainty in artificial intelligence, Citeseer, 2001:411–20.Search in Google Scholar

3. Robins JM. Semantics of causal dag models and the identification of direct and indirect effects. In Hjort N, Green P, Richardson S, editors. Highly structured stochastic systems. Oxford: Oxford University Press, 2003:70–81.Search in Google Scholar

4. Petersen ML, Sinisi SE, van der Laan MJ. Estimation of direct causal effects. Epidemiology 2006;17(3):276–84.10.1097/01.ede.0000208475.99429.2dSearch in Google Scholar PubMed

5. van der Laan MJ, Petersen M. Direct effect models. Int J Biostat 2008;4(1):1–27.10.2202/1557-4679.1064Search in Google Scholar PubMed

6. VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology 2009;20:18–26.10.1097/EDE.0b013e31818f69ceSearch in Google Scholar PubMed

7. Hafeman DM, VanderWeele TJ. Alternative assumptions for the identification of direct and indirect effects. Epidemiology 2011;22(6):753–6410.1097/EDE.0b013e3181c311b2Search in Google Scholar PubMed

8. Imai K, Keele L, Identification Yamamoto T., inference and sensitivity analysis for causal mediation effects. Stat Sci 2010;25(1):51–71.10.1214/10-STS321Search in Google Scholar

9. Imai K, Keele L, Tingley D. A general approach to causal mediation analysis 2010.10.1037/a0020761Search in Google Scholar PubMed

10. Pearl J. The mediation formula: A guide to the assessment of causal pathways in nonlinear models. In Berzuini C, Dawid P, Bernardinelli L, editors. Causality: statistical perspectives and applications. Chichester, UK: John Wiley and Sons, 2012:151–179,Search in Google Scholar

11. Tchetgen Tchetgen EJ, Shpitser I. Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Technical Report 130, Biostatistics, Harvard University, June 2011. Available at: http://www.bepress.com/harvardbiostat/paper130.10.1214/12-AOS990Search in Google Scholar PubMed PubMed Central

12. Tchetgen Tchetgen EJ, Shpitser I. Semiparametric estimation of models for natural direct and indirect effects. Technical Report 129, Biostatistics, Harvard University, June 2011. Available at: http://www.bepress.com/harvardbiostat/paper129.Search in Google Scholar

13. Zheng W, van der Laan MJ. Targeted maximum likelihood estimation of natural direct effects. Int J Biostat 2012;8(1):1–40.10.2202/1557-4679.1361Search in Google Scholar PubMed PubMed Central

14. Lendle SD, van der Laan MJ. Identification and efficient estimation of the natural direct effect among the untreated. UC Berkeley Division of Biostatistics Working Paper Series, 2011:293.Search in Google Scholar

15. Robins JM. Causal inference from complex longitudinal data. In Berkane M, editor. Latent variable modeling and applications to causality. New York: Springer Verlag, 1997:69–117.10.1007/978-1-4612-1842-5_4Search in Google Scholar

16. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000;11(5):561–70.10.1097/00001648-200009000-00012Search in Google Scholar

17. Stitelman OM, De Gruttola V, Wester CW, van der Laan MJ. RCTs with time-to-event outcomes and effect modification parameters. In van der Laan MJ, Rose S, editors. Targeted learning: causal inference for observational and experimental data. NY: Springer, 2011:271–98.10.1007/978-1-4419-9782-1_18Search in Google Scholar

18. Petersen M, Schwab J, Gruber S, Blaser N, Schomaker M, van der Laan MJ. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. J Causal Inference 2014;2(2):147–85.10.1515/jci-2013-0007Search in Google Scholar

19. Avin C, Shpitser I, Pearl J. Identifiability of path-specific effects. In Proc. of the International Joint Conf. on Artificial Intelligence, Edinburgh, Scotland, 2005:357–363.Search in Google Scholar

20. Didelez V, Dawid AP, Geneletti S. Direct and indirect effects of sequential treatments. Proceedings of the 22nd Annual Conference on Uncertainty in Artifical Intelligence, 2006:138–46.Search in Google Scholar

21. VanderWeele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology 2014;25(2):300–6.10.1097/EDE.0000000000000034Search in Google Scholar

22. Zheng W, van der Laan MJ. Causal mediation in a survival setting with time-dependent mediators. Technical Report 295, Division of Biostatistics, University of California, Berkeley, June 2012. Available at: http://biostats. bepress.com/ucbbiostat/paper295.Search in Google Scholar

23. VanderWeele TJ, Tchetgen Tchetgen EJ. Mediation analysis with time-varying exposures and mediators. J R Stat Soc Ser B 2017. (To appear)10.1111/rssb.12194Search in Google Scholar

24. Pearl J. Causality: models, reasoning and inference, 2nd ed. New York: Cambridge University Press, 2009.10.1017/CBO9780511803161Search in Google Scholar

25. Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Math Modell 1986;7:1393–512.10.1016/0270-0255(86)90088-6Search in Google Scholar

26. Bickel PJ, Klaassen CA, Ritov Y, Wellner J. Efficient and adaptive estimation for semiparametric models. NY: Springer-Verlag, 1997.Search in Google Scholar

27. Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics 2005;61:962–72.10.1111/j.1541-0420.2005.00377.xSearch in Google Scholar PubMed

28. van der Laan MJ, Gruber S. Targeted minimum loss based estimation of causal effects of multiple time point interventions. Int J Biostat 2012;8(1): Article 9.10.1515/1557-4679.1370Search in Google Scholar PubMed

29. van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer, 2003.10.1007/978-0-387-21700-0Search in Google Scholar

30. van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. NY: Springer, 2011.10.1007/978-1-4419-9782-1Search in Google Scholar

31. van der Laan MJ. Targeted estimation of nuisance parameters to obtain valid statistical inference. Int J Biostat 2014;10(1): 29–57.10.1515/ijb-2012-0038Search in Google Scholar PubMed

32. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat 2006;2(1): Article 2.10.2202/1557-4679.1043Search in Google Scholar

Published Online: 2017-6-23

© 2017 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 28.3.2024 from https://www.degruyter.com/document/doi/10.1515/jci-2016-0006/html
Scroll to top button