Elsevier

Economics Letters

Volume 206, September 2021, 109993
Economics Letters

On the instrument functional form with a binary endogenous explanatory variable

https://doi.org/10.1016/j.econlet.2021.109993Get rights and content

Highlights

  • I study a linear regression model with a binary endogenous regressor.

  • A linear first-stage equation could artificially lead to a weak instrument issue.

  • Nonlinear fitted probabilities are alternative instruments for the binary regressor.

  • These alternative instruments could avoid the weak identification problem.

Abstract

I demonstrate that, in a linear regression model with a binary endogenous explanatory variable, a two-step IV estimation procedure could avoid the weak identification issue caused by a first-stage linear projection onto the linear instruments.

Introduction

When TSLS is applied in empirical research, the typical way is to have the first stage linear in a single instrument along with other exogenous variables. When the structural error term is assumed to satisfy a zero conditional mean assumption, a linear first stage may neglect additional instruments such as higher order terms of the exogenous variables.1 Antoine and Lavergne (2020) also construct examples illustrating how inadequate functional form in the first-stage equations may artificially create a weak identification issue. In this study, I evaluate the case where linear instruments in the first stage do not explore the functional link between a binary endogenous explanatory variable (EEV) and the excluded exogenous variables, resulting in a potentially avoidable weak instruments problem.

An alternative approach is based on the two-step IV method proposed in Wooldridge (2010): in the first step, I estimate a binary response model by maximum likelihood; in the second step, I use the obtained fitted probability as an instrument in the IV estimation. In the context of a binary EEV, when the correct specification of its conditional mean and homoskedasticity of the structural error term are assumed, the fitted probability from maximum likelihood estimation (MLE) is the exact feasible optimal instrument.

Further, I show that when the partial correlation between the excluded exogenous variables and the binary EEV is weak, the IV estimator obtained using a nonlinear fitted probability outperforms the linear TSLS estimator. Explicitly, using the alternative instrument can generate a consistent and more efficient IV estimator when the included exogenous variables are strongly correlated with the binary EEV.

Section snippets

Econometric framework and assumptions

I study a linear regression model with a binary EEV and weak linear instruments. y=α1+βw+xγ+uWith a sample size of N, the econometricians observe the dataset {yi,wi,xi,zi}i=1N, where y is the continuous outcome variable, w is the single EEV, x is a 1×k1 vector of exogenous variables included in Eq. (1), and z is a 1×k2 vector of instruments excluded from Eq. (1). The parameter of interest is the coefficient on the EEV, β. w=α2+xη+zπ+v

Eq. (2) is a first-stage linear projection defined for any

Asymptotic theory

The estimation procedure is the same as Procedure 21.1 in  Wooldridge (2010). In the first step, estimate a binary response model by QMLE and obtain the fitted probability F(αˆ3+xηˆ+zπˆ), where (αˆ3,ηˆ,πˆ) is the estimator of (α̃3,η̃,π̃). In the second step, the fitted probability is used as an instrument for IV estimation. As claimed by Wooldridge (2010), this procedure has a desired robustness property as the model for P(w=1|x,z) does not need to be correctly specified.

Let βˆIV,F and

Simulation results

I compare the performance of the IV estimators obtained using various fitted probabilities with that of the linear IV estimator in finite samples. In the main design, x and z are standard normal and independent of each other. u and v are standard normal with a correlation of 0.5. The simulations are done with 1,000 replications and a sample size of 500. y=x+w+uw=1[xη+0.05z+v0]

I focus on the ratio of the bias of the IV estimator relative to the bias of the OLS estimator and the coverage rate

Application

I apply the two-step IV method to Dinkelman (2011), who evaluates the effects of rural electrification on employment growth in South Africa. The dependent variables are the community-level growth of female and male employment rates from 1996 to 2001. Electricity project placement is the binary EEV and the average land gradient is the instrument.

Table 2 mimics Tables 4 and 5 in Dinkelman (2011) with columns (1) and (6) reproducing the main results. In columns (2) and (7), fitted probit

Conclusion

In this study, I show that a more appropriate first-stage parameterization can help with the weak instruments problem when the EEV is binary. A similar argument can be generalized to other limited EEVs, such as count variables and fractional responses, where nonlinear instruments can be constructed.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

I am grateful to Jeffrey Wooldridge, Todd Elder, Kyoo il Kim, Jinyong Hahn, and Tetsuya Kaji for their helpful comments and suggestions. An earlier version of this paper has been circulated under the title “Weak Instruments with a Binary Endogenous Explanatory Variable”. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (8)

There are more references available in the full text version of this article.

Cited by (0)

View full text