Abstract
Marginal structural models (MSMs) allow for causal analysis of longitudinal data. The standard MSM is based on discrete time models, but the continuous-time MSM is a conceptually appealing alternative for survival analysis. In applied analyses, it is often assumed that the theoretical treatment weights are known, but these weights are usually unknown and must be estimated from the data. Here we provide a sufficient condition for continuous-time MSM to be consistent even when the weights are estimated, and we show how additive hazard models can be used to estimate such weights. Our results suggest that continuous-time weights perform better than IPTW when the underlying process is continuous. Furthermore, we may wish to transform effect estimates of hazards to other scales that are easier to interpret causally. We show that a general transformation strategy can be used on weighted cumulative hazard estimates to obtain a range of other parameters in survival analysis, and explain how this strategy can be applied on data using our R packages ahw and transform.hazards.
Similar content being viewed by others
References
Aalen O, Cook R, Røysland K (2015) Does cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Anal 21(4):579–593
Andersen P, Borgan Ø, Gill R, Keiding N (1993) Statistical models based on counting processes. Springer series in statistics. Springer, New York. ISBN 0-387-97872-0
Cole S, Hernán M (2008) Constructing inverse probability weights for marginal structural models. Am J Epidemiol 168(6):656–664
Havercroft W, Didelez V (2012) Simulating from marginal structural models with time-dependent confounding. Stat Med 31(30):4190–4206
Hernán M (2010) The hazards of hazard ratios. Epidemiology 21(1):13
Hernán M, Brumback B, Robins J (2000a) Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 11(5):561–570 ISSN 10443983. http://www.jstor.org/stable/3703998
Hernán M, Brumback B, Robins J (2000b) Marginal structural models to estimate the causal effect of zidovudine on the survival of hiv-positive men. Epidemiology 11(5):561–570
Huffer F, McKeague I (1991) Weighted least squares estimation for Aalen’s additive risk model. J Am Stat Assoc 86(413):114–129 ISSN 01621459. http://www.jstor.org/stable/2289721
Jacod J (1975) Multivariate point processes: predictable projection, Radon–Nikodym derivatives, representation of martingales. Probab Theory Relat Fields 31:235–253
Jacod J, Shiryaev A (2003) Limit theorems for stochastic processes. In: Grundlehren der Mathematischen Wissenschaften [Fundamental principles of mathematical sciences], vol 288, 2nd edn. Springer, Berlin, ISBN 3-540-43932-3
Joffe M, Ten Have T, Feldman H, Kimmel S (2004) Model selection, confounder control, and marginal structural models: review and new applications. Am Stat 58(4):272–279
McKeague I (1987) Asymptotic theory for weighted least squares estimators in Aalen’s additive risk model
Pearl J (2000) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press, Cambridge
Robins J (2014) Structural nested failure time models. Wiley StatsRef: Statistics Reference Online, New York
Robins J, Greenland S (1989) The probability of causation under a stochastic model for individual risk. Biometrics 45(4):1125–1138
Robins J, Hernán M, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560
Røysland K (2011) A martingale approach to continuous-time marginal structural models. Bernoulli 17:895–915
Ryalen P, Stensrud M, Fosså S, Røysland K (2018a) Causal inference in continuous time: an example on prostate cancer therapy. Biostatistics. https://doi.org/10.1093/biostatistics/kxy036
Ryalen P, Stensrud M, Røysland K (2018b) Transforming cumulative hazard estimates. Biometrika. https://doi.org/10.1093/biomet/asy035
Stensrud M, Valberg M, Røysland K, Aalen O (2017) Exploring selection bias by causal frailty models: the magnitude matters. Epidemiology 28(3):379–386
Stensrud M, Røysland K, Ryalen P (2018) On null hypotheses in survival analysis. ArXiv e-prints, July
Vansteelandt S, Sjolander A (2016) Revisiting g-estimation of the effect of a time-varying exposure subject to time-varying confounding. Epidemiol Methods 5(1):37–56
Funding
The authors were all supported by The Research Council of Norway, Grant NFR239956/F20—Analyzing clinical health registries: Improved software and mathematics of identifiability.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix: proofs
We need some lemmas to prove Theorem 1.
Lemma 1
Suppose that \(\{V^i\}_i\) are processes on [0, T] such that \(\sup _ i E\big [ \sup _{s} | V^i_s | \big ] < \infty \), then
Proof
By Markov’s inequality, we have for every \(a > 0\) that
which proves the claim. \(\square \)
Lemma 2
(A perturbed law of large numbers) Suppose
-
(I)
\(p^{-1} + q^{-1} = 1\), \(p < \infty \),
-
(II)
\(\{V_i\}_i \subset L^p(P)\), \(\{ S_i\}_i \subset L^q(P)\) such that \(\{ (V_{i},S_{i})\}_i\) is i.i.d., and \(V_i\), \(S_i\) are measurable with respect to a \(\sigma \)-algebra \({\mathcal {F}}_i\),
-
(III)
Triangular array \(\{S_{(i,n)}\}_{n,i \le n }\) such that
$$\begin{aligned} \lim \limits _{n\longrightarrow \infty } P \big ( |S_{(1,n)}- S_1| \ge \epsilon \big ) = 0 \end{aligned}$$(26)for every \(\epsilon > 0\), and there exists a \({{\tilde{S}}} \in L^q(P)\) such that \({{\tilde{S}}} \ge |S_{(1,n)}|\) for every n,
-
(IV)
The conditional density of \(S_{(i,n)}\) given \({\mathcal {F}}_i\) does not depend on i.
This implies that
Proof
From the triangle inequality and condition (IV) we have that
The dominated convergence theorem implies that the last term converges to 0. Finally, the weak law of large numbers and the triangle inequality yields
\(\square \)
Lemma 3
\(\{V_i\}_i\) i.i.d. non-negative variables in \(L^2(P)\), then
for every \(\epsilon > 0\).
Proof
Note that
If \(n > \Vert V_1 \Vert _2\epsilon ^{-1}\), we therefore have by Chebyshev’s inequality that
where the last term converges to 0 when \(n \rightarrow \infty \) since \(\lim \nolimits _{n\longrightarrow \infty } n \log \big ( 1 - \frac{E[V_1^2]}{n^2 \epsilon ^2} \big ) = 0\) for every \(\epsilon > 0\). \(\square \)
Lemma 4
Define \(\gamma _s^i := Y_s^{i,D} \pmb X^i_s \pmb b_s\), where \(\pmb X^i_{s}\) is the i’th row of \(\pmb X^{(n)}_{s}\). If the assumptions of Theorem 1 are satisfied, then
for every \(\delta > 0\).
Proof
Assumption (III) from Theorem 1 and Lemma 1 implies that
Moreover, Lemma 2 implies that
converges in probability to
However, from the innovation theorem we have that this equals
since \(\pmb X_{t-}^1\) and \(\gamma _t^1\) are \({\mathcal {F}}_{t-}^{1,{\mathcal {V}}_0}\) measurable. This and (30) enables us to apply Andersen et al. (1993, Lemma II.5.3) to obtain (29). \(\square \)
Lemma 5
Suppose that (II) and (III) from Theorem 1 are satisfied and let \(\pmb M^{(n)}_t := \begin{pmatrix}N_t^{1,D} - \int _0^t \lambda _s^{1,D} ds , \dots , N_t^{n,D} - \int _0^t \lambda _s^{n,D} ds \end{pmatrix}^\intercal \). Then
defines a square integrable local martingale with respect to the filtration \({\mathcal {F}}_{s}^{1,{\mathcal {V}}_0 \cup {\mathcal {L}}} \otimes \dots \otimes \mathcal F_{s}^{n,{\mathcal {V}}_0 \cup {\mathcal {L}}} \) and
for every \(\delta >0\).
Proof
Writing \(\pmb \lambda ^{(n)}\) for the diagonal matrix with i’th diagonal element equal to \(\lambda ^{i,D}\), we have that
Moreover,
Now, (III), (37) and Lemma 1 implies that
On the other hand, Lemma 3, (36) and (III) gives us that
for every s and \(\delta >0\), so Andersen et al. (1993, Propositon II.5.3) implies that (31) also holds. \(\square \)
Proof of Theorem 1
We have the following decomposition:
Leglarts inequality (Jacod and Shiryaev 2003, Lemma I.3.30) together with Lemma 5 implies that \(\pmb \varXi ^{(n)}\) converges uniformly in probability to 0. Moreover, Lemma 4 implies that \( \int _0^\cdot \pmb \varGamma ^{(n)-1} \frac{1}{n} \sum _{i=1}^n R^{(i,n)}_{s-}\pmb X_{s-}^{i \intercal } (\lambda ^{i,D}_s - \gamma ^i_s)ds\) converges in same sense to 0, which proves the consistency.
To see that \(\pmb B^{(n)}\) is P-UT, note that it coincides with the sum of \(\pmb B_t\), \(\pmb \varXi ^{(n)}\) and \(\int _0^\cdot \pmb \varGamma _s^{(n)-1} \frac{1}{n} \sum _{i=1}^n R^{(i,n)}_{s-}\pmb X_{s-}^{i \intercal } ( \lambda ^i_s - \gamma ^i_s)ds\). According to Ryalen et al. (2018b, Lemma 1), the latter is P-UT since (III) and Lemma 1 implies (7). Moreover, \(\pmb B_t = \int _0^\cdot \pmb b_s ds\) is clearly P-UT, since \(\pmb b_t\) is uniformly bounded. \(\pmb \varXi ^{(n)}\) is also P-UT since Lemma 5 implies that (8) is satisfied. Finally, as \(\pmb B^{(n)}\) is a sum of three processes that are P-UT, it is necessarily P-UT itself. \(\square \)
Proof of Theorem 2
Lemma 6
Suppose that c. and d. from Theorem 2 are satisfied, and that
-
(I)
$$\begin{aligned} \lim _{a \rightarrow \infty } \sup _n P \bigg ( \sup _t \big | \theta ^{(i,n)}_t \big | \ge a \bigg ) = 0, \end{aligned}$$
-
(II)
\(\theta _{t-}^{(i,n)}\) converges to \(\theta ^i_t\) in probability for each i and t.
Then we have that \(K^{(i,n)}\) is predictably uniformly tight (P-UT) and
for every i and \(\epsilon > 0\).
Proof
Note that
where \(\pmb W_t^{(n)} := n^{1/2}(\pmb H^{(n)}_t - \pmb H_t)\) and \(\pmb {{{\tilde{W}}}}_t^{(n)} :=n^{1/2}( \pmb {{{\tilde{H}}}}^{(n)}_t - \pmb {{{\tilde{H}}}}_t)\) are square-integrable martingales with respect to \({\mathcal {F}}_{t}^{1,{\mathcal {V}}_0 \cup {\mathcal {L}} } \otimes \cdots \otimes {\mathcal {F}}_{t}^{n,{\mathcal {V}}_0 \cup {\mathcal {L}} }\) and \({\mathcal {F}}_{t}^{1,{\mathcal {V}}_0 } \otimes \cdots \otimes {\mathcal {F}}_{t}^{n,{\mathcal {V}}_0 }\) respectively.
Let \(\tau \) be an optional stopping time and note that
so by Lenglarts inequality, (Jacod and Shiryaev 2003, I.3.30), we see that
for every \(\epsilon > 0\) if
for every \(\epsilon > 0\). The latter property holds due to (I), (II) and Andersen et al. (1993, Proposition II.5.3).
Since \(\{ \int _0^t Y_s^{i,A} \pmb Z_{s-}^{i\intercal } d\pmb W^{(n)}_s \}_n\) converges in the skorokhod topology, we have that \(\{ \sup _{t \le T}|\int _0^t Y_s^{i,A} \pmb Z_{s-}^{i\intercal } d\pmb W^{(n)}_s | \}_n\) is tight (Jacod and Shiryaev 2003, Theorem VI.3.21). Therefore, we also get that
for every \(\epsilon > 0\). For the same reason we also have
By combining (42), (43) and (40), we obtain that
for every \(\epsilon > 0\).
To see that \(K^{(i,n)}\) is P-UT, note that the compensator of \(\int _0^\cdot (\theta ^{(i,n)}_{s-} - 1) dN_s^{i,A}\) equals \(\int _0^\cdot (\theta ^{(i,n)}_{s-} - 1) \lambda _s^{i,A} ds\) and
The assumptions (I) in this Lemma and c) together with Ryalen et al. (2018b, Lemma 1) therefore imply that \(\int _0^\cdot (\theta ^{(i,n)}_{s-} - 1) dN_s^{i,A}\) is P-UT.
To see that \(\int _0^\cdot Y_s ^i \pmb {{{\tilde{Z}}}}_{s-}^{i\intercal } d \pmb {{{\tilde{H}}}}^{(n)}_s \) is P-UT, note that
An analogous decompositon yields that \(\int _0^\cdot Y_s ^i \pmb Z_{s-}^{i\intercal } d \pmb H^{(n)}_s \) is P-UT. This means that \(K^{(i,n)}\) is a sum of three processes that are P-UT, and must therefore be P-UT itself. \(\square \)
Lemma 7
Suppose that
-
(I)
\(\{\kappa _n\}_n\) increasing sequence of positive numbers such that
$$\begin{aligned} \lim \limits _{n\longrightarrow \infty } \kappa _n = \infty \text { and } \sup _n \frac{\kappa _n }{\sqrt{n}} < \infty , \end{aligned}$$ -
(II)
\(\pmb h_t\) is a bounded and continuous vector valued function,
-
(III)
\(\pmb Z^i \) is caglad with \(E[\sup _{t\le T} |\pmb Z_t^i|^3_3 ] < \infty \),
-
(IV)
$$\begin{aligned} \lim _{J \rightarrow \infty } \sup _n P \bigg ( {{\,\mathrm{Tr}\,}}\Big ( \big ( \frac{1}{n} \pmb Z^{(n)\intercal }_{t-} \pmb Y^{(n),A}_t \pmb Z^{(n)}_{t-})^{-1} \Big ) \ge J \bigg ) = 0 \end{aligned}$$(46)
-
(V)
\(Y^{i,A} \pmb Z_{\cdot -}^{i\intercal } \pmb h\) defines the intensity for \(N^{i,A}\) with respect to P and \({\mathcal {F}}_\cdot ^{i,{\mathcal {V}}_0}\). Now,
$$\begin{aligned} \lim \limits _{n\longrightarrow \infty } P \bigg ( \sup _{ 1/\kappa _n \le t \le T} \Big | \kappa _n \int _{t - 1/\kappa _n } ^t Y_s^{i,A} \pmb Z_{s-}^{i\intercal } d\pmb H_s^{(n)} - Y_t^{i,A} \pmb Z_{t-}^{i \intercal } \pmb h_t \Big | \ge \epsilon \bigg ) = 0.\qquad \end{aligned}$$(47)
Proof
Note that
The martingale central limit theorem implies that \(\{\pmb W^{(n)} \}\) is a sequence of martingales that converges in law to a continuous Gaussian processes with independent increments, see Andersen et al. (1993). Moreover, Ryalen et al. (2018b, Proposition 1) says that \(\{\pmb W^{(n)}\}_n\) is P-UT.
Therefore Jacod and Shiryaev (2003, Theorem VI 6.22) implies that \( \int _0 ^\cdot Y_s^{i,A} \pmb Z_{s-}^{i\intercal } d \pmb W^{(n)}_s \) converges in law to a continuous process, so it is C-tight. Moreover, from Jacod and Shiryaev (2003, Proposition VI.3.26) we have that
for every \(\epsilon > 0\). The mean value theorem of elementary calculus implies that
P a.s. Combining (51) and (52) yields the claim. \(\square \)
Proof of Theorem 2
Combining (16) and the decomposition in the proof of Lemma 7, we see that
Combining (16) and a. we also have
Whenever \(t \ge 1/ \kappa _n \), we have that by the continuous mapping theorem that
Since \(\theta ^i\) is right-continuous at \(t= 0\), we have that
Finally, Jacod and Shiryaev (2003, Corollary VI 3.33) implies that \(\{( R^{(i,n)}_0,K^{(i,n)})\}_n\) converges to \((R_0^i, K^i)\) in probability. Since \(K^{(i,n)}\) is P-UT,
and
Jacod and Shiryaev (2003, Theorem IX 6.9) implies that \(R^{(i,n)}\) converges to \(R^{i}\) in probability. \(\square \)
Rights and permissions
About this article
Cite this article
Ryalen, P.C., Stensrud, M.J. & Røysland, K. The additive hazard estimator is consistent for continuous-time marginal structural models. Lifetime Data Anal 25, 611–638 (2019). https://doi.org/10.1007/s10985-019-09468-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-019-09468-y