Endogenous treatment effect estimation using high-dimensional instruments and double selection
Introduction
Estimating the causal effect of the treatment variable is of fundamental importance in the observational studies. Because subjects are usually not randomly assigned, it is necessary to assume that the treatment variable can be considered as randomly assigned after controlling for a large set of other confounding covariates (Imbens, 2004, Imbens and Rubin, 2015). But in empirical research, there is often no guidance on how to choose control variables (Donohue III and Levitt, 2001). Belloni et al. (2014) proposed a double selection (DS) method to identify the important control variables for the exogenous treatment variable. However, the treatment is often endogenous due to unavailability of important control variables or sample selection, which would lead to the inconsistency of the DS estimator. To deal with the endogeneity, the instrumental variable (IV) technique has been widely used. The optimal instrument is the conditional expectation of the endogenous variable given IVs (Amemiya, 1974). Belloni et al. (2012) proposed a post-LASSO method to select important IVs and estimate optimal instruments. Lin et al. (2015), Farrell (2015), Kang et al. (2016) and Fan and Zhong (2018) also studied high-dimensional IV models using LASSO-related methods. Zhong et al. (2020) further considered the penalized logistic regression based IV estimator for dummy treatment variable. In high dimensional data analysis literature, regularization methods have been intensively studied for variable selection, e.g., LASSO (Tibshirani, 1996), SCAD (Fan and Li, 2001), etc. However, as mentioned by Belloni et al. (2014), the traditional post-single-selection methods fail to control the omitted variables bias due to imperfect model selection. This motivates us to develop a double selection procedure for estimating the endogenous treatment effect using both high-dimensional control variables and instrumental variables.
In this paper, we propose a double selection instrumental variable (DS-IV) estimator using a three-step algorithm. In the first step, we select the significant control variables for the outcome using regularization methods; In the second step, we select the control variables and instrumental variables which are important to predict the endogenous variable and obtain the predicted value of the endogenous treatment variable; In the third step, we obtain the DS-IV estimator for the endogenous treatment effect based on the predicted treatment variable and the union of the selected control variables in the first two steps. The control variable selection alleviates the intrinsic difficulty of finding valid instruments. It is easily implemented using our developed R package naivereg1 (Fan et al., 2020). A closely related work by Chernozhukov et al. (2015) also offers an approach to estimating structural parameters of endogenous variables in the presence of many instruments and controls. The main difference is that they used orthogonal moment functions in Belloni et al. (2014) while we focus on the selection of instrumental variables.
The rest of the paper is organized as follows: Section 2 presents the DS-IV estimator. Section 3 investigates its theoretical properties. Section 4 is a real data application. To save space, the Monte Carlo simulations and all detailed proofs are contained in the Supplementary material. Throughout the paper, we let , and represent the -norm, -norm and the infinity norm, respectively. and .
Section snippets
Methodology
Consider a structural equation with an endogenous treatment variable and many control variables where is the outcome variable for individual , is the endogenous treatment variable, denotes the true coefficient of , is a vector of exogenous control variables, is a vector of the true parameters associated with , is the th random error term for , and is the sample size. To estimate the treatment effect accurately, we include as many as
Theoretical properties
Assume and , where the number of true regression coefficients and cannot exceed , and its estimator is given by . The following regularity conditions are imposed for the theoretical properties of the proposed DS-IV estimator.
- (A)
Define and for a semi-definite matrix . There is an absolute sequence such that with probability at least , where
The treatment effect of teacher’s attentiveness on student’s achievement
Home visits could facilitate parent involvement, reduce discipline problems and increase student’s overall positive attitudes toward school (Dohl and Lochner, 2012, Castro et al., 2015). Using a comprehensive survey data, the China Education Panel Survey (CEPS), we investigate the treatment effects of home visits on students’ performance measured by standardized exam grades. We define the treatment variable to be 1 if the class adviser goes to the th student’s home to talk with the parents
Acknowledgments
The authors thank Laura M. Sangalli (the editor), the associate editor, and the anonymous referees for their helpful comments that improved the article significantly. Zhong’s work is supported by National Natural Science Foundation of China (11671334 and 11922117) and Fujian Provincial Natural Science Fund for Distinguish Young Scholars (2019J06004). Fan’s research, in part, was supported by the National Natural Science Foundation of China Grants 71671149, 71801183 and 71631004 (Key Project)
References (20)
The non-linear two-stage least squares estimator
J. Econometrics
(1974)- et al.
Parental involvement on student academic achievement: A meta-analysis
Educ. Res. Rev.
(2015) Robust inference on average treatment effects with possibly more covariates than observations
J. Econometrics
(2015)- et al.
Sparse models and methods for optimal instruments with an application to eminent domain
Econometrica
(2012) - et al.
Inference on treatment effects after selection amongst high-dimensional controls
Rev. Econom. Stud.
(2014) - et al.
Post-selection and post-regularization inference in linear models with many controls and instruments
Amer. Econ. Rev.: Pap. Proc.
(2015) - et al.
The impact of family income on child achievement: Evidence from the earned income tax credit
Amer. Econ. Rev.
(2012) - et al.
The impact of legalized abortion on crime
Q. J. Econ.
(2001) - et al.
R package naivereg: Nonparametric additive instrumental variable estimator and related IV methods
(2020) - et al.
Variable selection via nonconcave penalized likelihood and its oracle properties
J. Amer. Statist. Assoc.
(2001)
Cited by (1)
Dummy endogenous treatment effect estimation using high-dimensional instrumental variables
2022, Canadian Journal of Statistics