Abstract
The Highly-Adaptive least absolute shrinkage and selection operator (LASSO) Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functions (i.e., relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuisance functions by minimizing the empirical risk over the parameter space under the constraint that the sectional variation norm of the candidate functions are bounded by a constant, where this constant can be selected with cross-validation. In this article we establish that the nonparametric bootstrap for the HAL-TMLE, fixing the value of the sectional variation norm at a value larger or equal than the cross-validation selector, provides a consistent method for estimating the normal limit distribution of the HAL-TMLE. In order to optimize the finite sample coverage of the nonparametric bootstrap confidence intervals, we propose a selection method for this sectional variation norm that is based on running the nonparametric bootstrap for all values of the sectional variation norm larger than the one selected by cross-validation, and subsequently determining a value at which the width of the resulting confidence intervals reaches a plateau. We demonstrate our method for 1) nonparametric estimation of the average treatment effect when observing a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. In addition, we also present simulation results for these two examples demonstrating the excellent finite sample coverage of bootstrap-based confidence intervals.
1 Introduction
We consider estimation of a pathwise differentiable real valued target estimand based on observing n independent and identically distributed observations
The target parameter
We consider a targeted minimum loss-based (substitution) estimator
In this article we propose the nonparametric bootstrap to obtain a better estimate of the finite sample distribution of the HAL-TMLE than the normal limit distribution. The bootstrap fixes the sectional variation norm at the values used for the HAL-MLEs
1.1 Organization
In Section 2 we formulate the estimation problem and motivate the challenge for statistical inference. In Section 3 we present the nonparametric bootstrap estimator of the actual sampling distribution of the HAL-TMLE which thus incorporates estimation of its higher order stochastic behavior, and can thereby be expected to outperform the Wald-type confidence intervals. We prove that this nonparametric bootstrap is asymptotically consistent for the optimal normal limit distribution. Our results also prove that the nonparametric bootstrap preserves the asymptotic behavior of the HAL-MLEs of our nuisance parameters Q and G, providing further evidence for good performance of the nonparametric bootstrap. Importantly, our results demonstrate that the approximation error of the nonparametric bootstrap estimate of the true finite sample distribution of the HAL-TMLE is mainly driven by the approximation error of the nonparametric bootstrap for estimating the finite sample distribution of a well behaved empirical process. In Section 4 we present a plateau selection method for selecting the fixed sectional variation norm in the nonparametric bootstrap and a bias-correction in order to obtain improved finite sample coverage for the resulting confidence intervals.
In Section 5 we demonstrate our methods for two examples involving a nonparametric model and a specified target parameter: average treatment effect and integral of the square of the data density. In Section 6 we carry out a simulation study to demonstrate the practical performance of our proposed nonparametric bootstrap based confidence intervals w.r.t. their finite sample coverage. We conclude with a discussion in Section 7. Proofs of our Lemma and Theorems have been deferred to the Appendix. We refer to our accompanying technical report for additional bootstrap methods and results based on applying the nonparametric bootstrap to an exact second order expansion of the HAL-TMLE, and to various upper bounds of this exact second order expansion.
2 General formulation of statistical estimation problem and moti-vation for finite sample inference
2.1 Statistical model and target parameter
Let
where
Example 1:
(Treatment-specific mean)
Let
The second-order remainder
Let
For each
Similarly, for each
We refer to
This condition holds for most common bounded loss functions (such as mean-squared error loss and cross entropy loss), and it guarantees that the loss-based dissimilarities
Example 2:
(Treatment-specific mean)
For the treatment-specific mean parameter, the
2.1.1 Donsker class condition
Our formal theorems need to assume that
Our formal results will refer to a rate of convergence of the HAL-MLEs w.r.t. loss based dissimilarity given by
2.1.2 Loss functions and canonical gradient have a uniformly bounded sectional variation norm
We assume that the loss functions and canonical gradient are cadlag functions with a universal bound on the sectional variation norm. The latter class of functions is indeed a uniform Donsker class. In the sequel we will assume this, but we remark here that throughout we could have replaced this class of cadlag functions with a universal bound on the sectional variation norm by any other uniform Donsker class satisfying (3) above. Below we will present a particular class of models
We will formalize this condition now. Suppose that
Thus, we define
Example 3:
(Treatment-specific mean)
Under the previous stated assumptions, the sectional variation norm of
For a given function
where the sum is over all subsets s of
As utilized in [7] to define the HAL-MLE, since
We could slightly enlarge this class as follows. Define the sectional variation norm where
We can enlarge the functional class
For discrete measures
2.1.3 General class of models for which parameter spaces for Q and G are Cartesian products of sets of cadlag functions with bounds on sectional variation norm
Although the above bounds
For that purpose, a model may assume that
denote the parameter spaces for
For example, the parameter space
for some set
The subset
In order to allow modeling of monotonicity (e.g., nuisance parameter
For the parameter space (7) of monotone functions we allow that the sectional variation norm is known by setting
For the analysis of our proposed nonparametric bootstrap sampling distributions we do not assume this extra model structure that
A typical statistical model assuming the extra structure (6) would be of the form
Remark 2.1
(Creating parameter spaces of type (6) or (7)) In our first example we have a nuisance parameter
2.1.4 Bounding the exact second-order remainder in terms of loss-based dissimilarities
Let
for some mapping
for some function
2.1.5 Continuity of efficient influence curve as function of P at
P
0
We also assume that if the rates of convergence of
for some function
2.2 HAL-MLEs of nuisance parameters
We estimate
For example,
Lemma 1
(Lemma 3 from van der Laan [7]) Let
If
where α is defined as in (3) for class
Application of this general lemma proves that
One can add restrictions to the parameter space
2.3 HAL-TMLE
Consider a finite dimensional local least favorable model
Since
Lemma 2 in Appendix A proves that
Assuming extra model structure (6), since we apply the least favorable submodel to an HAL-MLE
Example 4:
(Treatment-specific mean)
Condition (8) holds by applying the Cauchy–Schwarz inequality, and using
where
2.4 Asymptotic efficiency theorem for HAL-TMLE and CV-HAL-TMLE
Lemma 1 establishes that
We have the following identity for the HAL-TMLE:
The second term on the right-hand side is
Theorem 1
Consider the statistical model
Then the HAL-TMLE
We remind the reader that the condition (4), stating that the loss functions and canonical gradient are contained in class of cadlag functions with a universal bound on the sectional variation norm, can be replaced by a general Donsker class condition (3). We also remark that this Theorem 1 trivially generalizes to any rate of convergence for
2.4.1 Wald type confidence interval
A first order asymptotic 0.95-level confidence interval is given by
Let’s consider the extra model structure (6). The asymptotic efficiency proof above of the HAL-TMLE
For simplicity, in the next theorem we focus on data adaptive selection of
Theorem 2
Consider the setting of
Theorem 1
, but with the extra model structure (6). Let
Then, under the same assumptions as in
Theorem 1
, the TMLE
In general, when the model
3 The nonparametric bootstrap for the HAL-TMLE
Let
3.1 Definition of bootstrapped HAL-MLEs for model with extra structure (6)
In this subsection, we will assume the extra structure (6) so that our parameter spaces for Q and G consists of cadlag functions with a universal bound
Definition 1
Recall the representation (5) for a multivariate real valued cadlag function F in terms of its sections
In practice, the HAL-MLE
Let
be the corresponding HAL-MLEs of
The above bootstrap distribution depends on the bounds
3.2 Definition of bootstrapped HAL-MLE in general
In general,
3.3 Bootstrapped HAL-TMLEs
Let
conditional on
where
We now want to prove that
In the next subsection we show that the nonparametric bootstrap works for the HAL-MLEs
3.4 Nonparametric bootstrap for HAL-MLE
The following theorem establishes that the bootstrap HAL-MLE
Theorem 3
Assume (2) and (4).
Definitions: Let
Conclusion: Then,
We also have
Bootstrapping HAL-MLE
The proof of Theorem 3 is presented in Appendix B. In Appendix B we first establish that
Note that if
3.5 Preservation of rate of convergence for the targeted bootstrap estimator
In Appendix C we prove that
3.6 The nonparametric bootstrap for the HAL-TMLE
We can now imitate the efficiency proof for the HAL-TMLE to obtain the desired result for the bootstrapped HAL-TMLE of
Theorem 4
Assumptions: Consider the statistical model
TMLE is efficient: The standardized TMLE is asymptotically efficient:
Bootstrapped HAL-MLE:
Bootstrapped HAL-TMLE: Conditional on
As a consequence, conditional on
Consistency of the nonparametric bootstrap for HAL-TMLE at data adaptive selector
The proof of this theorem is presented in Appendix D.
4 Finite sample modifications of the nonparametric bootstrap dis-tribution for model with extra structure (6)
In this section we focus on the case that the model
Ideally, we want to set
Since the oracle choice
So a similar intuition holds for our estimator. If we set variation norm
We choose a log-uniform grid of pre-specified λ to simplify the finite difference estimation of the derivative, and we leave it an important future work to implement a potentially better estimator with more flexible choice of λ grid.
Figure 1 illustrates a simulated example of the curve
Increasing the scaling
The motivation is that in general the nonparametric bootstrap will also inherit bias of the sampling distribution of
where (using short-hand notation)
is the estimated RMSE of the bootstrap estimator
The full modified HAL-TMLE bootstrap procedure we propose in this article can be summarized in the following pseudo-algorithm:
5 Examples
In this section we apply our general theorem, by verifying its conditions, for asymptotic consistency of the nonparametric bootstrap of HAL-TMLE to two examples involving a nonparametric model. In the next section we will actually implement our nonparametric bootstrap based confidence intervals for these two examples, carry out a simulation study, and evaluate its practical performance w.r.t. finite sample coverage.
5.1 Nonparametric estimation of average treatment effect
Let
Statistical model: Since
Thus,
Notice that indeed our parameter space for
Target parameter: Let
Loss functions for Q and G: Let
be the corresponding loss-based dissimilarity. Let
Canonical gradient and corresponding exact second order expansion:
The canonical gradient of
The exact second-order remainder
Bounding the second order remainder: By using Cauchy-Schwarz inequality, we obtain the following bound on
where
By van der Vaart [16] we have
The right-hand side represents the function
This verifies (8). We note that this bound is very conservative due to the arguments we provided in general in the previous section for double robust estimation problems.
Continuity of canonical gradient: Regarding the continuity assumption (9), we note that
Uniform model bounds on sectional variation norm: It also follows immediately that the sectional variation norm model bounds
HAL-MLEs: Let
Note that
CV-HAL-MLEs: The above HAL-MLEs are determined by
where
HAL-TMLE: Let
Preservation of rate for HAL-TMLE: Lemma 2 in Appendix A shows
Asymptotic efficiency of HAL-TMLE and CV-HAL-TMLE: Application of Theorem 1 shows that
Asymptotic validity of the nonparametric bootstrap for the HAL-MLEs: Firstly, note that the bootstrapped HAL-MLEs
and
are easily computed as a standard LASSO regression using
Behavior of HAL-MLE under sampling from
Preservation of rate of TMLE under sampling from
Consistency of nonparametric bootstrap for HAL-TMLE: This verifies all conditions of Theorem 4 which establishes the asymptotic efficiency and asymptotic consistency of the nonparametric bootstrap.
Theorem 5
Consider statistical model
5.2 Nonparametric estimation of integral of square of density
Statistical model, target parameter, canonical gradient: Let
An alternative formulation that avoids a normalizing constant
The target parameter
where
Exact second order remainder: It implies the following exact second-order expansion:
where
Loss function: As loss function for Q we could consider the log-likelihood loss
Alternatively, we could consider the loss function
Note that this is indeed a valid loss function with loss-based dissimilarity given by
Bounding second order remainder: Thus, if we select this loss function, then we have
In terms of our general notation, we now have
HAL-MLE and CV-HAL-MLE: Let
Let’s denote this
TMLE using Local least favorable submodel and log-likelihood loss: Let
TMLE using Universal least favorable submodel and log-likelihood loss: One can also define a universal least favorable submodel [16] by recursively applying the above local least favorable submodel:
where
HAL-TMLE: The TMLE of
Efficiency of HAL-TMLE and CV-HAL-TMLE: Theorem 1 shows that
Asymptotic validity of the nonparametric bootstrap for the HAL-MLE: Let C be given. As remarked in the previous example, computation of the HAL-MLE
This shows that the empirical dissimilarity also equals the square of an
Preservation of rate for HAL-TMLE under sampling from
Asymptotic consistency of the bootstrap for the HAL-TMLE: This verifies all conditions of Theorem 4 which establishes the asymptotic efficiency and asymptotic consistency of the nonparametric bootstrap.
Theorem 6
Consider the model
We have that
In addition, conditional on
This theorem can also be applied to the setting in which
6 Simulation study evaluating performance of bootstrap method
6.1 Average treatment effect
To illustrate the finite sample performance of the proposed bootstrap method, we simulate a continuous outcome Y, a binary treatment A, and a continuous covariate W that confounds Y and A. The random variables are drawn from a family of distributions indexed by
where
To analyze the above simulated data, we compute the coverage and width of confidence interval of the Wald-type confidence interval where the nuisance functions
The simulation results reflect what is expected based on theory. In particular, as the sectional variation norm of the
6.2 Average density value
As we demonstrated, this problem has a non-forgiving second-order remainder term that is proportional to the
where
For a given K,
We parametrized the density in terms of its hazard, discretized the hazard making it piecewise constant across a large number of bins (like histogram density estimation), parametrized this piecewise constant hazard with a logistic regression for the probability of falling in bin h, given it exceeded bin
The simulations reflect what is expected based on theory: the bootstrap confidence interval has superior coverage relative to the Wald-type confidence interval, uniformly across different sample sizes and data distributions (Figure 5). In particular, as the true sectional variation norm increases (with the number of modes in the density), the second-order remainder term increases so that the Wald-type interval coverage declines. On the other hand, the bootstrap confidence intervals reflect the behavior of the second order remainder and thereby increase in width as the performance of the HAL-MLE deteriorates (due to increased complexity of true density). The bootstrap confidence interval controls the coverage close to the nominal rate and its coverage is not very sensitive to the true sectional variation norm of the density function. When sample size increases to 1000, the Wald-type interval coverage increases, and in simple cases where the true sectional variation norm is small, Wald-type coverage reaches its desired nominal covarage.
7 Discussion
On one hand, in parametric models and, more generally, in models small enough so that the MLE is still well behaved, one can use the nonparametric bootstrap to estimate the sampling distribution of the MLE. It is generally understood that in these small models the nonparametric bootstrap outperforms estimating the sampling distribution with a normal distribution (e.g., with variance estimated as the sample variance of the influence curve of the MLE), by picking up the higher order behavior of the MLE, if asymptotics has not set in yet. In such small models, reasonable sample sizes already achieve the normal approximation in which case the Wald type confidence intervals will perform well. Generally speaking, the nonparametric bootstrap is a valid method when the estimator is a compactly differentiable function of the empirical measure, such as the Kaplan–Meier estimator (i.e., one can apply the functional delta-method to analyze such estimators) [21] (Theorem 3.9.11 in [15]). These are estimators that essentially do not use smoothing of any sort.
On the other hand, efficient estimation of a pathwise differentiable target parameter in large realistic models generally requires estimation of the data density, and thereby machine learning such as super-learning to estimate the relevant parts of the data distribution. Therefore, efficient one-step estimators or TMLEs are not compactly differentiable functions of the data distribution. Due to this reason, we moved away from using the nonparametric bootstrap to estimate its sampling distribution, since it represents a generally inconsistent method (e.g., a cross-validation selector behaves very differently under sampling from the empirical distribution than under sampling from the true data distribution) [22]. Instead we estimated the normal limit distribution by estimating the variance of the influence curve of the estimator.
Such an influence curve based method is asymptotically consistent and therefore results in asymptotically valid 0.95-level confidence intervals. However, in such large models the nuisance parameter estimators will converge at slow rates (like
One might argue that one should use a model based bootstrap instead by sampling from an estimator of the density of the data distribution. General results show that such a model based bootstrap method will be asymptotically valid as long as the density estimator is consistent [23], [24], [25]. This is like carrying out a simulation study for the estimator in question using an estimator of the true data distribution as sampling distribution. However, estimation of the actual density of the data distribution is itself a very hard problem, with bias heavily affected by the curse of dimensionality, and, in addition, it can be immensely burdensome to construct such a density estimator and sample from it when the data is complex and high dimensional.
As demonstrated in this article, the HAL-MLE provides a solution to this bottleneck. The HAL-MLE(
As a consequence of this robust behavior of the HAL-MLE, for models in which the nuisance parameters of interest are cadlag functions with a universally bounded sectional variation norm (beyond possible other assumptions), we presented asymptotically consistent estimators of the sampling distribution of the HAL-TMLE of the target parameter of interest using the nonparametric bootstrap.
Our estimators of the sampling distribution are highly sensitive to the curse of dimensionality, just as the sampling distribution of the HAL-TMLE itself: specifically, the HAL-MLE on a bootstrap sample will converge just as slowly to its truth as under sampling from the true distribution. Therefore, in high dimensional estimation problems, we expect highly significant gains in valid inference relative to Wald type confidence intervals that are purely based on the normal limit distribution of the HAL-TMLE.
In general, the user will typically not know how to select the upper bound
Even though, for this cross-validation selector
There are a number of important future directions to this research. One direction is to derive finite-sample bounds on our bootstrap interval coverage probability, which will give additional guarantees for applications.
Funding source: National Institute of Allergy and Infectious Diseases
Award Identifier / Grant number: 5R01AI074345-07
Acknowledgment
We thank the reviewers for the suggestion of the enlargement of
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This research is funded by NIH-grant 5R01AI074345-07.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. Bickel, PJ, Klaassen, CAJ, Ritov, Y, Wellner, J. Efficient and adaptive estimation for semiparametric models. Berlin Heidelberg New York: Springer; 1997.Search in Google Scholar
2. Gill, RD, van der Laan, MJ, Wellner, JA. Inefficient estimators of the bivariate survival function for three models. Annales de l’Institut Henri Poincaré 1995;31:545–97.Search in Google Scholar
3. van der Laan, MJ, Rubin, DB. Targeted maximum likelihood learning. Int J Biostat 2006;2:Article 11. https://doi.org/10.2202/1557-4679.1043.Search in Google Scholar
4. van der Laan, MJ. Estimation based on case-control designs with known prevalance probability. Int J Biostat 2008;4:Article 17. https://doi.org/10.2202/1557-4679.1114.Search in Google Scholar PubMed
5. van der Laan, MJ, Rose, S. Targeted learning: causal inference for observational and experimental data. Berlin Heidelberg New York: Springer; 2011.10.1007/978-1-4419-9782-1Search in Google Scholar
6. van der Laan, MJ, Rose, S. Targeted learning in data science: causal inference for complex longitudinal studies. Berlin Heidelberg New York: Springer; 2017.10.1007/978-3-319-65304-4Search in Google Scholar
7. van der Laan, MJ. A generally efficient targeted minimum loss-based estimator. UC Berkeley; 2015. Technical Report 300. http://biostats.bepress.com/ucbbiostat/paper343.Search in Google Scholar
8. Benkeser, D, van der Laan, MJ. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). The Highly Adaptive Lasso Estimator 2016:689–96.10.1109/DSAA.2016.93Search in Google Scholar PubMed PubMed Central
9. van der Laan, MJ, Dudoit, S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. Berkeley: Division of Biostatistics, University of California; 2003. Technical Report 130.Search in Google Scholar
10. van der Vaart, AW, Dudoit, S, van der Laan, MJ. Oracle inequalities for multi-fold cross-validation. Stat Decis 2006;24:351–71. https://doi.org/10.1524/stnd.2006.24.3.351.10.1524/stnd.2006.24.3.351Search in Google Scholar
11. van der Laan, MJ, Dudoit, S, van der Vaart, AW. The cross-validated adaptive epsilon-net estimator. Stat Decis 2006;24:373–95. https://doi.org/10.1524/stnd.2006.24.3.373.Search in Google Scholar
12. van der Laan, MJ, Polley, EC, Hubbard, AE. Super learner. Stat Appl Genet Mol 2007;6:Article 25. https://doi.org/10.2202/1544-6115.1309.Search in Google Scholar PubMed
13. Polley, EC, Rose, S, van der Laan, MJ. Super learner. In van der Laan, MJ, Rose, S, editors. Targeted learning: causal inference for observational and experimental data. New York Dordrecht Heidelberg London: Springer; 2011.10.1007/978-1-4419-9782-1_3Search in Google Scholar
14. Neuhaus, G. On weak convergence of stochastic processes with multidimensional time parameter. Ann Stat 1971;42:1285–95. https://doi.org/10.1214/aoms/1177693241.Search in Google Scholar
15. van der Vaart, AW, Wellner, JA. Weak convergence and empirical processes. Berlin Heidelberg New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar
16. van der Laan, MJ, Gruber, S. One-step targeted minimum loss-based estimation based on universal least favorable one-dimensional submodels. The International Journal of Biostatistics 2006;12:351–378. https://doi.org/10.1515/ijb-2015-0054.Search in Google Scholar PubMed PubMed Central
17. van der Laan, MJ. A generally efficient targeted minimum loss based estimator. Int J Biostat 2017;13:20150097. https://doi.org/10.1515/ijb-2015-0097.Search in Google Scholar PubMed PubMed Central
18. Bibaut, A, van der Laan, MJ. Fast rates for empirical risk minimization over cadlag functions with bounded sectional variation norm. Berkeley: Division of Biostatistics, University of California; 2019. Technical report.Search in Google Scholar
19. Davies, M, van der Laan, MJ. Sieve plateau variance estimators: A new approach to confidence interval estimation for dependent data. Berkeley: Division of Biostatistics, University of California; Working Paper Series; 2014. Technical report. http://biostats.bepress.com/ucbbiostat/paper322/.Search in Google Scholar
20. Cai, W, van der Laan, M. TMLEbootstrap: HAL-TMLE bootstrap in r; 2018. https://github.com/wilsoncai1992/TMLEbootstrap.Search in Google Scholar
21. Gill, RD. Non- and semiparametric maximum likelihood estimators and the von Mises method (part 1). Scand J Stat 1989;16:97–128. https://doi.org/10.1515/ijb-2012-0038.Search in Google Scholar PubMed
22. Coyle, J, van der Laan, MJ. Targeted bootstrap. In Targeted learning in data science. Springer International Publishing; 2018. p. 523–39.10.1007/978-3-319-65304-4_28Search in Google Scholar
23. Arcones, MA, Giné, E. The bootstrap of the mean with arbitrary bootstrap sample size. Annales de l’IHP Probabilités et statistiques 1989;25:457–81.Search in Google Scholar
24. Giné, E, Zinn, J. Necessary conditions for the bootstrap of the mean. Annals Stat 1989;17:684–91. https://doi.org/10.1214/aos/1176347134.Search in Google Scholar
25. Arcones, MA, Giné, E. On the bootstrap of M-estimators and other statistical functionals. In Exploring the limits of bootstrap, 13–47. Wiley New York; 1992. https://doi.org/10.1111/j.1468-0262.2005.00613.x.Search in Google Scholar
26. Tran, L, Petersen, M, Schwab, J, J van der Laan, M. Robust variance estimation and inference for causal effect estimation. Berkeley: Division of Biostatistics, University of California; 2018. Technical report. eprint arXiv:1810.03030.Search in Google Scholar
27. van der Vaart, AW, Wellner, JA. A local maximal inequality under uniform entropy. Electron J Stat 2011;5:192–203. http://doi.org/10.1214/11-EJS605.10.1214/11-EJS605Search in Google Scholar PubMed PubMed Central
The HAL-MLEs on the original sample and bootstrap sample will be defined below as
A Proof that the one-step TMLE
Q
n
*
preserves rate of convergence of
Q
n
The following lemma establishes that the one-step TMLE
Lemma 2
Let
Then,
Specifically, we have
This also proves that the K-th step TMLE using a finite K (uniform in n) number of iterations satisfies
Proof of Lemma 2: We have
Since
B Asymptotic convergence of bootstrapped HAL-MLE: proof of theorem 3.
Theorem 7 below shows that both
Theorem 7
Consider a statistical model
Then,
Proof of Theorem 7: We have
As a consequence, by empirical process theory [27], we have
Thus, it also follows that
Lemma 3
Suppose that
Proof: We have
We have
C Proof that the one-step TMLE
Q
n
#
*
preserves rate of convergence of
Q
n
#
The following lemma establishes that the one-step TMLE
Lemma 4
Let
Then,
Proof of Lemma 4: Firstly, we note that
Using that
Plugging this bound for
By assumption, we have that
Consider now the first empirical process term
D Proof of theorem 4
Firstly, by definition of the remainder
where we ignored
Under the conditions of Theorem 1, we already established that
It remains to analyze the two leading empirical process terms in (26). By our continuity assumption (9) on the efficient influence curve as function in
By our continuity condition (9) we also have that
at this rate. Again, by [27] this shows
Thus, we have now shown, conditional on
This completes the proof of the Theorem for the HAL-TMLE. For a model
E Understanding why
d
n
1
(
Q
n
#
,
Q
n
)
is a quadratic dissimilarity
Lemma 5
Assume extra model structure (6) on
where
We have
In order to provide the reader a concrete example of what this empirical dissimilarity
Corollary 1 Consider the definitions of
Lemma 5 and apply it to loss function
Since
Proof of Corollary: We will prove
Note that the first term corresponds with
Proof of Lemma 5: We need to prove that the linear approximation
The extra model structure (6) allows the explicit calculation of score equations for the HAL-MLE and its bootstrap analog, which provides us then with the desired inequality.
Consider the h-specific path
for
while if
by assumption that
We also have that
Thus, this path generates a direction
Let
This derivative is given by
Suppose that
Then, we have
Combined with the stated second-order Taylor expansion of
Thus it remains to show (27).
In order to prove (27), let’s solve explicitly for h so that
where we used that
For this choice
since
This proves (27) and thereby completes the proof of Lemma 5.
F Number of non-zero HAL coefficients as a function of sample size
G Extend the
F
class to be shifted by an unbounded constant
The below result and proof was provided to us by the reviewer.
Lemma 6
Let
Proof. Fix Q and let
Taking a supremum in Q on both sides completes the proof.
Lemma 7
Consider the setting of Lemma 6. Fix
It holds that
Proof. Clearly
We now show the other direction. Fix
By the bounds on f,
Now, as
As
© 2020 Walter de Gruyter GmbH, Berlin/Boston