当前位置: X-MOL 学术Int. J. Biostat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Generally Efficient Targeted Minimum Loss Based Estimator based on the Highly Adaptive Lasso.
International Journal of Biostatistics ( IF 1.2 ) Pub Date : 2017-10-13 , DOI: 10.1515/ijb-2015-0097
Mark van der Laan 1
Affiliation  

Suppose we observe n $n$ independent and identically distributed observations of a finite dimensional bounded random variable. This article is concerned with the construction of an efficient targeted minimum loss-based estimator (TMLE) of a pathwise differentiable target parameter of the data distribution based on a realistic statistical model. The only smoothness condition we will enforce on the statistical model is that the nuisance parameters of the data distribution that are needed to evaluate the canonical gradient of the pathwise derivative of the target parameter are multivariate real valued cadlag functions (right-continuous and left-hand limits, (G. Neuhaus. On weak convergence of stochastic processes with multidimensional time parameter. Ann Stat 1971;42:1285-1295.) and have a finite supremum and (sectional) variation norm. Each nuisance parameter is defined as a minimizer of the expectation of a loss function over over all functions it its parameter space. For each nuisance parameter, we propose a new minimum loss based estimator that minimizes the loss-specific empirical risk over the functions in its parameter space under the additional constraint that the variation norm of the function is bounded by a set constant. The constant is selected with cross-validation. We show such an MLE can be represented as the minimizer of the empirical risk over linear combinations of indicator basis functions under the constraint that the sum of the absolute value of the coefficients is bounded by the constant: i.e., the variation norm corresponds with this L1 $L_1$-norm of the vector of coefficients. We will refer to this estimator as the highly adaptive Lasso (HAL)-estimator. We prove that for all models the HAL-estimator converges to the true nuisance parameter value at a rate that is faster than n-1/4 $n^{-1/4}$ w.r.t. square-root of the loss-based dissimilarity. We also show that if this HAL-estimator is included in the library of an ensemble super-learner, then the super-learner will at minimal achieve the rate of convergence of the HAL, but, by previous results, it will actually be asymptotically equivalent with the oracle (i.e., in some sense best) estimator in the library. Subsequently, we establish that a one-step TMLE using such a super-learner as initial estimator for each of the nuisance parameters is asymptotically efficient at any data generating distribution in the model, under weak structural conditions on the target parameter mapping and model and a strong positivity assumption (e.g., the canonical gradient is uniformly bounded). We demonstrate our general theorem by constructing such a one-step TMLE of the average causal effect in a nonparametric model, and establishing that it is asymptotically efficient.

中文翻译:

基于高度自适应套索的基于有效目标最小损失的估计器。

假设我们观察n个有限元有界随机变量的n $ n $独立且相同分布的观察结果。本文涉及基于实际统计模型的数据分布的路径可微分目标参数的有效目标基于最小损失的估计器(TMLE)。我们将在统计模型上强制执行的唯一平滑条件是,评估目标参数的路径导数的正则梯度所需的数据分布的扰动参数是多元实值cadlag函数(右连续和左手)极限(G. Neuhaus。关于具有多维时间参数的随机过程的弱收敛。AnnStat 1971; 42:1285-1295。),并且具有有限的极值和(截面)变化范数。每个讨厌的参数都定义为损失函数在其参数空间的所有函数上的期望的最小值。对于每个烦人的参数,我们提出了一个新的基于最小损失的估计量,该估计量在函数的变分范数受设置常数约束的附加约束下,将其参数空间中函数的特定于损失的经验风险最小化。通过交叉验证选择常数。我们表明,在系数的绝对值之和由常数限制的约束下,这样的MLE可以表示为指标基函数线性组合上的经验风险的最小化方:即,变化范数与此L1相对应系数向量的$ L_1 $-范数。我们将这个估计器称为高度自适应的套索(HAL)估计器。我们证明,对于所有模型,HAL估计器都以比基于损耗的相异性的平方根快的速率收敛到真实的干扰参数值。我们还表明,如果将此HAL估计量包含在整体超级学习者的库中,那么超级学习者将以最小的方式达到HAL的收敛速度,但是根据先前的结果,它实际上是渐近等效的使用库中的oracle估算器(即从某种意义上说最好)。随后,我们建立了在目标参数映射和模型的弱结构条件下,使用超级学习者作为每个讨厌参数的初始估计量的单步TMLE在模型中任何数据生成分布上的渐近有效。强烈的积极假设(例如,规范梯度是均匀有界的)。我们通过在非参数模型中构造这种平均因果效应的一步式TMLE并证明它是渐近有效的,来证明我们的一般定理。
更新日期:2019-11-01
down
wechat
bug