The method of envelopes to concisely calculate semiparametric efficient scores under parametric restrictions

Constantine E. Frangakis

doi:10.1515/ijb-2019-0043

Publicly Available Published by De Gruyter December 24, 2020

The method of envelopes to concisely calculate semiparametric efficient scores under parametric restrictions

Constantine E. Frangakis

From the journal The International Journal of Biostatistics

https://doi.org/10.1515/ijb-2019-0043

Abstract

When addressing semiparametric problems with parametric restrictions (assumptions on the distribution), the efficient score (ES) of a parameter is often important for generating useful estimates. However, usual derivation of ES, although conceptually simple, is often lengthy and with many steps that do not help in understanding why its final form arises. This drawback often casts onto semiparametric estimation a mantle that can turn away otherwise able doctoral students or researchers. Here we show that many ESs can be obtained as a one-step derivation after we characterize those features (envelopes) of the unrestricted problem that are constrained in the restricted problem. We demonstrate our arguments in three problems with known ES but whose usual derivations are lengthy. We show that the envelope-based derivation is dramatically explanatory and compact, needing essentially two lines where the standard approach needs 10 or more pages. This suggests that the envelope method can add useful intuition and exegesis to both teaching and research of semiparametric estimation.

Keywords: efficiency; envelope; score; semiparametric

1 The problem

Semiparametric problems are useful when their “restrictions” (i.e., assumptions about the distribution) reflect existing knowledge or a simplified expression of a complex reality. When addressing semiparametric problems, the efficient score (ES) of a parameter is often important for generating useful estimators (e.g., [1], [2], [3], [4]).

One important example is the “restricted conditional moment” problem, where we measure a random sample of predictors and a response, (X_i,Y_i), i = 1, … , n. In such a problem, we model the regression E(Yi|Xi)=g(Xi,β0) and we wish to estimate β₀; and the rest of the distribution is unspecified and indexed by λ₀ (e.g., [3]). Another important example is the “treatment effect regression” problem, where a random sample of patients with covariates X_i is assigned one of two treatments, T_i = 0 or 1, based on X_i, and outcome Y_i is measured. In such a problem, we often model the difference E(Yi|Ti=1,Xi)−E(Yi|Ti=0,Xi)=g(Xi,β0) and wish to estimate β₀; and the rest of the distribution is, again, unspecified and indexed by λ₀ (e.g., [5], [, 6]).

The above examples are cases of a more general problem in which we measure a random sample of data D_i, i = 1, … , n from an unknown true distribution pr₀(D_i); we assume pr₀(D_i) is characterized by a finite dimensional vector β₀ that we wish to estimate (say β0∈Setβ) ; and leave the rest of the distribution unspecified and indexed by λ₀ (say λ0∈Setλ). Such a semiparametric model is summarized (e.g., [1]) as:

(1)(β0,λ0)∈Msem:={(β,λ)∈Setβ×Setλ} .

Derivation of the ES, say Sβ eff(Di,β,λ), is useful because it can lead to efficient estimation of β. Specifically, under some regularity conditions, the estimator for β₀ that zeros out the empirical sum ∑iSβ eff(Di,β,λ) over individuals is consistent and efficient if λ, when needed, is replaced by a consistent estimator with convergence rates larger than n1/4 [4].

In what follows, by notation such as Sβ,i eff we mean the random variable Sβ eff(Di,β,λ), with the understanding that it arises from a deterministic function taking a data value and parameters β, λ as arguments. The usual derivation of ES is summarized in the following steps (e.g., [2]):

consider a parametric submodel, say, Mssem indexed by β and a finite dimensional parameter λ_s;
find the scores Sλs;
find, across all submodels s, the set of all linear combinations of scores Sλs, and the closure of this set, known as the nuisance tangent space and labeled as Λ;
find the efficient score Sβ eff for β as the residual from the score S_β after projecting it on the space Λ.

(2)Sβ eff:=Sβ−proj(Sβ|Λ) .

2 The method of envelopes

The key idea here is that, instead of constructing the score for β with the “bottom up” argument (using smaller parametric submodels Mssem), it can be easier to construct it with a “top down” argument, using a model larger than Msem. To proceed, we assume the support of data D_i is discrete, even though the dimension can be large. This is often reasonable on its own, e.g., for arguments that take sample size to a limit but keep measurement instruments with fixed, finite number of values. Here, discreteness is used mostly as a device to uncover the form of the efficient score when this is understood to exist.

Consider the larger model, say Menv, that is as Msem except that it poses no restrictions using β. Consider then all those features of the distribution pr₀(D_i) in Menv that do get constrained by β in the semiparametric model Msem. Those features, which we denote by “env”, can be seen as envelopes, in the larger model Menv, of the consequences of placing the restrictions β in the semiparametric model Msem. Denote the larger model as:

(3)(env0,λ0env)∈Menv:={(env,λenv)∈Setenv×Setλenv} .

We then have the following useful observations:

the meaning of the parameters λ and the set Setλ are the same in model Msem as the parameters λenv and the set Setλenv in model Menv; therefore, the efficient score for the envelope in Menv is

(4)Senv eff=Senv−proj(Senv|Λ), for the same Λ as in (2) ;

from the semiparametric model we obtain that Sβ=∂env∂β×Senv; here, ∂env∂β is a deterministic matrix of partial derivatives of the dim(env) envelopes with respect to the dim(β) parameters, evaluated at the true values.
substituting, using the last expression, ∂env∂β×Senv for Sβ in (2), we obtain the following connection between the two efficient scores:

Result. The efficient score Sβ eff in the smaller model can be obtained from the efficient score Senv eff in the larger model as

Sβ eff⏟dim(β)×1=∂env∂β⏟dim(β)×dim(env)×Senv eff⏟dim(env)×1

Proof:Sβ eff=∂env∂β⋅Senv−proj(∂env∂β⋅Senv|Λ){here Λ acts as the nuis. tang.space for the smaller model}

=∂env∂β{Senv−proj(Senv|Λ)}{now Λ can act as the nuis.tang. space for the larger model}

(5)=from (4)∂env∂β×Senv eff .

This last expression is important because it gives the efficient score Sβ eff as a simple function of the score for the envelopes Senv eff in the larger model, and bypasses long calculations within model Msem. The scores for Senv eff are often known either directly or through their relation to the efficient influence functions (EIF) ϕenv eff of the envelopes:

(6)Senv eff=E(ϕenv effϕenv eff′)−1⋅ϕenv eff

where the latter are easily derived as Gateaux derivatives [7]. Once the efficient scores are derived, the efficient influence function for β, say ϕβ eff can be derived in the usual way (e.g., [3]; p. 66) as

ϕβ eff=E(Sβ effSβ eff′)−1⋅Sβ eff .

3 Examples

3.1 Restricted conditional moment problem

The first problem of Section 1 is the efficient score for β modeling E(Yi|Xi)=μ(Xi,β0), with no other assumptions. The standard derivation of this efficient score uses the steps (i)–(iv) of Section 1 (e.g., [3]). These, typically, take long arguments of 10 or more pages of text, but also involve conjectures (and then verifications), which leave the question of how one first arrives to such conjectures more generally.

To see the practicality of the ideas in Section 2, first observe that, since β only restricts the conditional means E(Yi|Xi=x)=μ(x), the vector of all these means env:=[μ(x):all x] is an envelope parameter. Therefore, in the right hand side of (6), the EIF ϕenv,i eff is the vector of the nonparametric EIFs for each conditional mean, ϕμ(x),i eff, which we know is the residual {Yi−μ(x)}1(Xi=x)p(x), where p(x) is the mass function at x. Also, the variance matrix in (6) is diagonal with variances var(Yi|Xi=x)/p(x), for each x. Substituting these in (6) and then in (5), and replacing ∂env(β)∂β by [∂μ(x,β)∂β, all x], we get

Sβ,i eff=∂μ(Xi,β)∂β{Yi−μ(Xi,β)}var(Yi|Xi)

which is the well known efficient score, derived in essentially one argument. This short derivation is paralleled by an analogously short intuition: the EIF elements [ϕμ(x),i eff:all x] of the envelope in the larger model contain the sufficient information also for the smaller model’s parameter β; then these elements are combined through the derivative of the envelope and inversely proportional to their variance.

3.2 Treatment effect regression problem

The second problem of Section 1 is the efficient score for β modeling E(Yi|Ti=1,Xi)−E(Yi|Ti=0,Xi)=δ(Xi,β0), with no other assumptions. The standard derivation of the efficient score, like in problem 1, also uses the steps (i)–(iv) of Section 1, needs conjectures and is also analogously long.

The use of envelopes is again straightforward. Since β only restricts δ(Xi):=E(Yi|Ti=1,Xi)−E(Yi|Ti=0,Xi), the set of differences env:=[δ(x):all x] forms an envelope for β. We know that for each conditional mean μ(t,x):=E(Yi|Ti=t,Xi=x), the nonparametric (in Menv) EIF ϕμ(t,x),i eff is the residual {Yi−μ(t,x)}1(Ti=t,Xi=x)p(t,x) where p(t, x) is the mass function at t, x. Therefore, in the right hand side of (6), the EIF ϕenv,i eff is the vector of the nonparametric EIFs for each difference,

[ϕδ(x),i eff:all x], where ϕδ(x),i eff={{Yi−μ(1,x)}1(Ti=1)e(x)−{Yi−μ(0,x)}1(Ti=0)1−e(x)}1(Xi=x)p(x) ,

and e(x) is the propensity score pr(Ti=1|Xi=x). Also, the variance matrix in (6) is again diagonal with variances v(x)/p(x), where

v(x)=var(Yi|Ti=1,Xi=x)e(x)+var(Yi|Ti=0,Xi=x)1−e(x) .

Substituting these in (6) and then in (5), and replacing ∂env(β)∂β by [∂δ(x,β)∂β, all x], we get

Sβ,i eff=∂δ(Xi,β)∂β1v(x){{Yi−μ(0,Xi)−δ(Xi,β)}1(Ti=1)e(Xi)−{Yi−μ(0,Xi)}1(Ti=0)1−e(Xi)} .

As with the first problem, the intuition that parallels the derivation is that the EIF elements of the envelope in the larger model contain the sufficient information also for the smaller model’s parameter β.

3.3 Restricted conditional median problem

To show the ease of applying this idea also to other functionals, consider a variation of problem 1 where now we are interested in β that models the conditional medians of Y_i given Xx=x as, say, m(Xi,β0), with no other assumptions.

Since β only restricts the conditional medians, m(x), of Y_i given Xi=x, the vector of medians env:=[m(x):all x] is an envelope parameter. Therefore, in the right hand side of (6), the EIF ϕenv,i eff is the vector of the nonparametric EIFs for each conditional median, ϕm(x),i eff, which we know is [1{Yi>m(x)}−0.5]⋅1(Xi=x)/{f(0;x)p(x)}, where f(0;x) is the conditional mass function of Y_i − m(x) at 0 given Xi=x, and p(x) is the mass function at x. The variance matrix in (6) is diagonal with variances 1/{4f2(0;x)p(x)}, for each x. Substituting these in (6) and then in (5), and replacing ∂env(β)∂β by [∂m(x,β)∂β, all x], we get

Sβ,i eff=∂m(Xi,β)∂β4f(0;Xi)[1{Yi>m(Xi)}−0.5]

which is the known efficient score ([8]; p. 106).

4 Comments

A conceptual comparison of the usual method to the envelope method to derive a semiparametric efficient score is the following. The usual method relates the semiparametric restricted model to smaller, submodels with more restrictions. The envelope method relates the restricted model to a larger model that relaxes those restrictions. Here, we showed how this latter relation can provide a much shorter derivation and understanding of the efficient scores for teaching and research.

Restrictions in semiparametric models can take forms other than through a finite dimensional parameter, for example through conditional independence or symmetry. If the functional of interest in the smaller model is only a function of easily estimable features of the larger, non-restricted model, then the envelope method can still be useful.

An alternative approach to calculating efficient scores is through Target Maximum Likelihood Estimation (TMLE, e.g., [9]). Ways to use TMLE while avoiding knowledge of the efficient influence curve have been explored in some cases, although their generality is still under investigation.

Corresponding author: Constantine E. Frangakis, Department of Biostatistics, Johns Hopkins University, Baltimore, MD21205, USA, E-mail: cfrangak@jhsph.edu

Funding source: National Institutes of Health

Acknowledgments

We thank the editor, and Dan Scharfstein, Betsy Ogburn, Mark van der Laan, and Stijn Vansteelandt for useful comments.

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This article was funded by National Institutes of Health.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Newey, WK. Semiparametric efficiency bounds. J Appl Econom 1990;5:99–135. https://doi.org/10.1002/jae.3950050202.Search in Google Scholar

2. Robins, JM, Rotnitzky, A, Zhao, LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994;89:846–66. https://doi.org/10.1080/01621459.1994.10476818.Search in Google Scholar

3. Tsiatis, AA. Semiparametric inference and missing data. New York: Springer; 2007.Search in Google Scholar

4. van der Vaart, AW. Asymptotic statistics. New York: Cambridge University Press; 2000.Search in Google Scholar

5. Yu, Z, van der Laan, MJ. Measuring treatment effects using semiparametric models. U.C. Berkeley Division of Biostatistics Working Paper Series, paper 136; 2003.Search in Google Scholar

6. Vansteelandt, S, Joffe, M. Structural nested models and g-estimation: the partially realized promise. Stat Sci 2014;20:707–31. https://doi.org/10.1214/14-sts493.Search in Google Scholar

7. Hampel, FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974;69:383–93. https://doi.org/10.1080/01621459.1974.10482962.Search in Google Scholar

8. van der Laan, M, Robins, J. Unified methods for censored longitudinal data and causality. New York: Springer; 2003.10.1007/978-0-387-21700-0Search in Google Scholar

9. van der Laan, M, Rose, S. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011.10.1007/978-1-4419-9782-1Search in Google Scholar

Received: 2019-04-17

Accepted: 2020-12-01

Published Online: 2020-12-24

The method of envelopes to concisely calculate semiparametric efficient scores under parametric restrictions

Abstract

1 The problem

2 The method of envelopes

3 Examples

3.1 Restricted conditional moment problem

3.2 Treatment effect regression problem

3.3 Restricted conditional median problem

4 Comments

Acknowledgments

References

Journal and Issue

Articles in the same Issue