Endogenous treatment effect estimation using high-dimensional instruments and double selection

https://doi.org/10.1016/j.spl.2020.108967Get rights and content

Abstract

We propose a double selection instrumental variable estimator for the endogenous treatment effects using both high-dimensional control variables and instrumental variables. It deals with the endogeneity of the treatment variable and reduces omitted variable bias due to imperfect model selection.

Introduction

Estimating the causal effect of the treatment variable is of fundamental importance in the observational studies. Because subjects are usually not randomly assigned, it is necessary to assume that the treatment variable can be considered as randomly assigned after controlling for a large set of other confounding covariates (Imbens, 2004, Imbens and Rubin, 2015). But in empirical research, there is often no guidance on how to choose control variables (Donohue III and Levitt, 2001). Belloni et al. (2014) proposed a double selection (DS) method to identify the important control variables for the exogenous treatment variable. However, the treatment is often endogenous due to unavailability of important control variables or sample selection, which would lead to the inconsistency of the DS estimator. To deal with the endogeneity, the instrumental variable (IV) technique has been widely used. The optimal instrument is the conditional expectation of the endogenous variable given IVs (Amemiya, 1974). Belloni et al. (2012) proposed a post-LASSO method to select important IVs and estimate optimal instruments. Lin et al. (2015), Farrell (2015), Kang et al. (2016) and Fan and Zhong (2018) also studied high-dimensional IV models using LASSO-related methods. Zhong et al. (2020) further considered the penalized logistic regression based IV estimator for dummy treatment variable. In high dimensional data analysis literature, regularization methods have been intensively studied for variable selection, e.g., LASSO (Tibshirani, 1996), SCAD (Fan and Li, 2001), etc. However, as mentioned by Belloni et al. (2014), the traditional post-single-selection methods fail to control the omitted variables bias due to imperfect model selection. This motivates us to develop a double selection procedure for estimating the endogenous treatment effect using both high-dimensional control variables and instrumental variables.

In this paper, we propose a double selection instrumental variable (DS-IV) estimator using a three-step algorithm. In the first step, we select the significant control variables for the outcome using regularization methods; In the second step, we select the control variables and instrumental variables which are important to predict the endogenous variable and obtain the predicted value of the endogenous treatment variable; In the third step, we obtain the DS-IV estimator for the endogenous treatment effect based on the predicted treatment variable and the union of the selected control variables in the first two steps. The control variable selection alleviates the intrinsic difficulty of finding valid instruments. It is easily implemented using our developed R package naivereg1 (Fan et al., 2020). A closely related work by Chernozhukov et al. (2015) also offers an approach to estimating structural parameters of endogenous variables in the presence of many instruments and controls. The main difference is that they used orthogonal moment functions in Belloni et al. (2014) while we focus on the selection of instrumental variables.

The rest of the paper is organized as follows: Section 2 presents the DS-IV estimator. Section 3 investigates its theoretical properties. Section 4 is a real data application. To save space, the Monte Carlo simulations and all detailed proofs are contained in the Supplementary material. Throughout the paper, we let 0, and represent the 0-norm, 2-norm and the infinity norm, respectively. mn=max{m,n} and En[f]En[f(ωi)]i=1nf(ωi)n.

Section snippets

Methodology

Consider a structural equation with an endogenous treatment variable and many control variables yi=diα0+xiβ0+εi,where yi is the outcome variable for individual i, di is the endogenous treatment variable, α0 denotes the true coefficient of di, xi is a p×1 vector of exogenous control variables, β0 is a p×1 vector of the true parameters associated with xi, εi is the ith random error term for i=1,2,,n, and n is the sample size. To estimate the treatment effect accurately, we include as many as

Theoretical properties

Assume β00s and γ00s, where the number of true regression coefficients β0 and γ0 cannot exceed sn, and its estimator is given by ŝ=Î0. The following regularity conditions are imposed for the theoretical properties of the proposed DS-IV estimator.

  • (A)

    Define ϕmin(m)[M]min1δ0mδMδδ2 and ϕmax(m)[M]max1δ0mδMδδ2 for a semi-definite matrix M. There is an absolute sequence an such that with probability at least 1Δn, kϕmin(ans)[En(zizi)]ϕmax(ans)[En(zizi)]k, where En(zizi

The treatment effect of teacher’s attentiveness on student’s achievement

Home visits could facilitate parent involvement, reduce discipline problems and increase student’s overall positive attitudes toward school (Dohl and Lochner, 2012, Castro et al., 2015). Using a comprehensive survey data, the China Education Panel Survey (CEPS), we investigate the treatment effects of home visits on students’ performance measured by standardized exam grades. We define the treatment variable di to be 1 if the class adviser goes to the ith student’s home to talk with the parents

Acknowledgments

The authors thank Laura M. Sangalli (the editor), the associate editor, and the anonymous referees for their helpful comments that improved the article significantly. Zhong’s work is supported by National Natural Science Foundation of China (11671334 and 11922117) and Fujian Provincial Natural Science Fund for Distinguish Young Scholars (2019J06004). Fan’s research, in part, was supported by the National Natural Science Foundation of China Grants 71671149, 71801183 and 71631004 (Key Project)

References (20)

  • AmemiyaT.

    The non-linear two-stage least squares estimator

    J. Econometrics

    (1974)
  • CastroM. et al.

    Parental involvement on student academic achievement: A meta-analysis

    Educ. Res. Rev.

    (2015)
  • FarrellM.H.

    Robust inference on average treatment effects with possibly more covariates than observations

    J. Econometrics

    (2015)
  • BelloniA. et al.

    Sparse models and methods for optimal instruments with an application to eminent domain

    Econometrica

    (2012)
  • BelloniA. et al.

    Inference on treatment effects after selection amongst high-dimensional controls

    Rev. Econom. Stud.

    (2014)
  • ChernozhukovV. et al.

    Post-selection and post-regularization inference in linear models with many controls and instruments

    Amer. Econ. Rev.: Pap. Proc.

    (2015)
  • DohlG. et al.

    The impact of family income on child achievement: Evidence from the earned income tax credit

    Amer. Econ. Rev.

    (2012)
  • Donohue IIIJ.J. et al.

    The impact of legalized abortion on crime

    Q. J. Econ.

    (2001)
  • FanQ. et al.

    R package naivereg: Nonparametric additive instrumental variable estimator and related IV methods

    (2020)
  • FanJ. et al.

    Variable selection via nonconcave penalized likelihood and its oracle properties

    J. Amer. Statist. Assoc.

    (2001)
There are more references available in the full text version of this article.
View full text