Abstract
We define heterogeneous endogeneity as the case where a potentially endogenous regressor is endogenous for some sub-groups of the data but exogenous for other subgroups. We derive an estimator and test procedure based on the control function approach to deal with the phenomenon. We show that accounting for heterogeneous endogeneity can greatly increase the power of endogeneity tests and increase the precision of our estimator over traditional IV. While the gains get larger as the instrument gets weaker and as the relative size of the non-endogenous subgroup gets larger, we find efficiency gains even when the underlying instrument is very strong. We illustrate our approach with an example using data from Abaide et al. (Econometrica 70:91–117).
Similar content being viewed by others
Notes
One can argue that the federal minimum wage is an endogenous regressor because it is most likely correlated with some states’ unobserved labor market conditions especially for those states which have relatively larger shares on the national income.
In Sect. 2, we explain the control function method in detail.
There is a large literature on models where the heterogeneous effects of endogenous variables on an outcome variable exist in the population. In the literature, many important works such as Abadie (1991) and Angrist et al. (1996) have proposed methods for estimating the average effect of receiving or not receiving a binary treatment. Recently, Florens et al. (2008) provided a non-parametric way to analyze heterogeneous effects of a continuous treatment. For heterogeneous response models, see Heckman et al. (1997), Card (1999) , Card (2001), Heckman and Vytlacil (2005), Heckman and Vytlacil (2007a), Heckman and Vytlacil (2007b) and references therein.
For notational simplicity, we do not include a vector of exogenous covariates in Eq. (1). We assume that we have a set of valid instruments \(z_{ij}\) in Eq. (2) and thus the parameter \(\beta \) in Eq. (1) can be consistently estimated by 2SLS. This assumption also implies that our instrument is exogenous to all subgroups of population. Some specific examples of such instruments can be found in Graddy (1995) and Acemoglu et al. (2013).
The group assignments in terms of heterogeneous endogeneity would be viable if economic theories suggest that observed individual characteristics decide the group structure.
We note that by definition W is a \(K \times K\) matrix. In a special case, when we have only one endogenous variable, W is a scalar and \(0 \le W \le 1\).
Note that when there exists heteroskedasticity in \(u_{ij}\), we should consistently estimate the heteroskedasticity first and then employ normalized data for this test.
Through Monte-Carlo simulations, we confirm that a power of the test for heterogeneous endogeneity is well controlled when a concentration parameter is higher than 10.
To perform this endogeneity test, we implicitly assume that the instrumental variables \(z_{ij}\) are completely exogenous. Therefore, \(\hat{\nu }_{ij}\) contains all the contaminated part of \(x_{ij}\) because in the first stage we project \(x_{ij}\) on the instrumental variable space.
In the simulation study, we assume that the exogenous group is priorly known. In the case where the exogenous group is not known, a straightforward test can be performed to identify the group.
A continuous variable can be used to create a categorical variable with proper threshold values.
Given that we have a dummy potentially endogenous variable, we avoid the forbidden regression problem in the first stage by following the procedure outlined on pages 189–191 of Angrist and Pischke (2009) Results are very similar to those reported in the text if we simply use a linear probability model in the first stage.
References
Abadie A, Angrist J, Imbens G (2002) Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70:91–117
Abadie J (1991) Instrumental variables estimation of average treatment effects in econometrics and epidemiology, NBER Working Paper No. 115
Acemoglu D, Finkelstein A, Notowidigdo M (2013) Income and health spending: evidence from oil price shocks. Rev Econ Stat 95(4):1079–1095
Altonji J, Blank R (1999) Race and gender in the labor market. Handb Labor Econ 49:3143–3259
Angrist J, Imbens G, Rubin D (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:44–455
Angrist J, Pischke J (2009) Mostly harmless econometrics: an empiricists companion. Princeton University Press, Princeton, NJ
Autor D, Dorn D, Gordon H (2013) The china syndrome: local labor market effects of import competition in the united states. Am Econ Rev 103(6):2121–2168
Autor D, Dorn D, Gordon H, Majlesi K (2017) Importing political polarization? the electoral consequences of rising trade exposure, NBER Working Paper No. 22637
Barnow B, Cain G, Goldberger A (1981) Selection on observables. Eval Stud Rev Annu 5(1):43–59
Blundell R, Chen X, Christensen D (2007) Semi-nonparametric IV estimation of shape-invariant engel curves. Econometrica 75:1613–1669
Blundell R, Powell J (2003) Identification of unconditional partial effects in nonseparable models. In: Dewatripont M, Hansen L, Turnovsky S (eds) Advances in economics and econometrics, vol II. Cambridge University Press, Cambridge, pp 312–357
Bonhomme S, Manresa E (2015) Grouped patterns of heterogeneity in panel data. Econometrica 83(3):1147–1184
Bound J, Holzer H (1993) Industrial shifts, skill levels and the labor market for white and black males. Rev Econ Stat 75(3):387–396
Card D (1999) The causal effect of education on earnings. Handb Labor Econ 3:1801–1863
Card D (2001) Estimating the return to schooling: progress on some persistent econometric problems. Econometrica 69(5):1127–1160
Chesher A (2003) Identification in nonseparable models. Econometrica 71:1405–1441
Chesher A (2005) Nonparametric identidication under discrete variation. Econometrica 73:1525–1550
Darolles S, Florens J, Renault E (2003) Nonparametric estimation of triangular simultaneous equations models, Working Paper. Toulouse University
Florens J, Heckman J, Meghir C, Vytlacil E (2008) Identification of treatment effects using control functions in models with continuous, endogenous treatment and heterogeneous effects. Econometrica 76(5):1191–1206
Forgy E (1965) Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3):768–769
Goldberger A, (1972) Selection bias in evaluating treatment effects: some formal illustrations., Discussion Paper. University of Wisconsin–Madison
Graddy K (1995) Testing for imperfect competition at the fulton fish market. RAND J Econ 26(1):75–92
Hall P, Horowitz J (2005) Nonparametric methods for inference in the presence of instrumental variables. Ann Stat 33:2904–2929
Hashimoto M (1987) The minimum wage law and youth crimes: time-series evidence. J Law Econ 30(2):443–464
Heckman J, Robb R (1985) Alternative methods for evaluating the impact of interventions: an overview. J Econom 30(1–2):239–267
Heckman J, Smith J, Clements N (1997) Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Rev Econ Stud 64(4):487–535
Heckman J, Vytlacil E (1998) Instrumental variables methods for the correlated random coefficient model: estimating the average rate of return to schooling when the return is correlated with schooling. J Hum Resour 33(4):974–987
Heckman J, Vytlacil E (2005) Structural equations, treatment effects, and econometric policy evaluation. Econometrica 73(3):669–738
Heckman J, Vytlacil E (2007a) Econometric evaluation of social programs, part I: causal models, structural models and econometric policy evaluation. Handb Econom 6:4779–4874
Heckman J, Vytlacil E (2007b) Econometric evaluation of social programs, part II: using the marginal treatment effect to organize alternative econometric estimators to evaluate social programs, and to forecast their effects in new environments. Handb Econom 6:4875–5143
Holzer H (1987) Informal job search and black youth unemployment. Am Econ Rev 77(3):446–452
Imbens G, Newey W (2009) Identification and estimation of triangular simultaneous equations models without additivity. Econometrica 77:1481–1512
Imbens G, Wooldridge W (2009) Recent developments in the econometrics of program evaluation. J Econ Lit 47(1):5–86
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Lee S (2007) Endogeneity in quantile regression models: a control function approach. J Econom 141:1131–1158
Ma L, Koenker R (2006) Quantile regression methods for recursive structural equation models. J Econom 134:471–506
Nelson C, Startz R (1990a) The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one. J Bus 63(1):S125–S140
Nelson C, Startz R (1990b) Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58(4):967–976
Newey W, Powell J, Vella F (1999) Nonparametric estimation of triangular simultaneous equations models. Econometrica 67:563–603
Newey W, Powell J, Vella F (2003) Nonparametric estimation of triangular simultaneous equations models. Econometrica 71:1565–1578
Pinkse J (2000) Nonparametric two-step regression functions when regressors and error are dependent. Can J Stat 28:289–300
Rivers D, Vuong Q (1988) Limited information estimators and exogeneity tests for simultaneous probit models. J Econom 39(3):347–366
Smith J (1993) Affirmative action and racial wage gap. Am Econ Rev 83(2):79–84
Smith R, Blundell R (1986) An exogeneity test for a simultaneous equation tobit model with an application to labor supply. Econometrica 54(3):679–685
Stock J, Wright J, Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. J Bus Econ Stat 20(4):518–529
Vytlacil E, Yildiz N (2007) Dummy endogenous variables in weakly separable models. Econometrica 75:757–779
Wooldridge J (2015) Control function methods in applied econometrics. J Hum Resour 50(2):420–445
Acknowledgements
We are thankful to Christoph Rothe, Richard Startz, Lutz Kilian, Le Wang for their helpful comments and suggestions. Any errors or omissions are our own.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
We first show the assumptions required to derive theoretical results in the paper.
Assumption 1
The data are generated as \(y_{ij} = x_{ij}'\beta + u_{ij}\), \(x_{ij} = Z_{ij}'\delta + \varSigma _{\nu }\nu _{ij}^*\) with the covariance matrix of the error terms in Eq. (3).
Assumption 2
The following limits hold for \(j = 1,2,\ldots ,J,\) when the sample size converges to infinity:
- a.:
-
(exogeneity) \(Z_j'u_j/n_j \overset{p}{\rightarrow } 0\)
- b.:
-
(well-behaved data) \(Z_j'Z_j/n_j\overset{p}{\rightarrow } Q_{Z_jZ_j}\) where \(Q_{Z_jZ_j}\) is a finite, positive definite \(L\times L\) matrix.
- c.:
-
(relevance) \(Z_j'x_j/n_j\overset{p}{\rightarrow } Q_{Z_jX_j}\) where \(Q_{Z_jX_j}\) is a finite \(L\times K\) matrix with rank K.
Assumption 3
As \(N \rightarrow \infty \), the total number of observations in the jth group, \(N_{j}\) converges to infinity such that \({lim}_{N \rightarrow \infty } \frac{N_{j}}{N} = a_{j}\) for \(j = 1,2,\ldots ,J\).
Notice that the convergences in Assumption 2 hold under the weak law of large numbers and the asymptotic moments are assumed to be same across different groups under Assumption 1.
Proof of Lemma 1
Because \({\hat{V}}\) is a diagonal matrix, \(M_{{\hat{V}}}\) is also a diagonal matrix of the following form:
where \(M_{{\hat{V}}_j} = I_{N_j} - {\hat{V}}_j({\hat{V}}_j^{'}\hat{V}_j)^{-1}{\hat{V}}_j^{'}\). Therefore, we have:
where \(P_{Z_j} = Z_j(Z_j'Z_j)^{-1}Z_j'\);\(Z_j = [z_{1j} \ z_{2j} \ldots \ z_{N_{jj}} ]'\); \(E_{j} = [\epsilon _{1j} \ \epsilon _{2j} \ \ldots \ \epsilon _{N_{jj}} ]'\). The validity of going from the third line to the fourth line in equation (A.1) is that \(M_{{\hat{V}}_j} x_j = P_{Z_j} x_j\). Since \(N_j \overset{}{\rightarrow } \infty \) as \(N \overset{}{\rightarrow } \infty \) and \(a_{j} = {lim}_{N \rightarrow \infty }\frac{N_{j}}{N}\),
by the law of large numbers and the central limit theorem under the Assumption 1. \(\square \)
Proof of proposition 1
As shown in Eq. (10),
where the residual maker matrix \(M_{\tilde{V}} = I - P_{\tilde{V}}\) and \(P_{\tilde{V}}\) is the usual projection matrix. We denote \(\tilde{V} = \begin{bmatrix} \hat{\nu }_{e} \\ 0\\ \end{bmatrix}\). Thus we get
where \(P_{\hat{\nu }_{e}^{}} = \hat{\nu }_{e}^{}(\hat{\nu }_{e}^{'}\hat{\nu }_{e}^{})^{-1}\hat{\nu }_{e}^{'}\). Therefore,
where \(M_{\hat{\nu }_{e}^{}} = I - P_{\hat{\nu }_{e}^{}}\). Now, we use the above \(M_{\hat{\nu }_{e}^{}}\) matrix to obtain the expressions \({X}^{\prime }M_{\hat{\nu }_{e}^{}}{X} \) and \({X}^{\prime }M_{\hat{\nu }_{e}^{}}{Y}\):
Similarly for \(\tilde{X}^{\prime }M_{\tilde{V}^{}}\tilde{Y}\), we get
From the standard OLS estimator we get, \(x_{ne}^{\prime }y_{ne} = (x_{ne}^{\prime }x_{ne}) \widehat{\beta }_{ne,OLS}\). For group e, we get \(\hat{\beta }_{e,CF} = (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} x_{e})^{-1} x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}y_{e}\). As \(M_{\hat{\nu }_{e}^{}} = P_{Z_{e}}\), \(\hat{\beta }_{e,CF} = \hat{\beta }_{e,2SLS}\). Therefore, we have \( x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}y_{e} = (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}x_{e}) \widehat{\beta }_{2SLS} \). Substituting \(x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}y_{e}\) and \(x_{ne}^{\prime }y_{ne}\) in \({X}^{\prime }M_{{\tilde{V}}^{}}{Y}\), we get
Hence we get,
where
Under assumption 1, it is straightforward to show that \(\widehat{Avar}[\hat{\beta }_{e,2SLS}] = \hat{\sigma }_u^2\) \((x_e'M_{\hat{\nu }_{e}^{}}x_e)^{-1} = \hat{\sigma }_u^2(x_e'P_{Z_{e}}x_e)^{-1}\) and \(\widehat{Avar}[\hat{\beta }_{ne,OLS}] = \sigma _u^2(x_{ne}'x_{ne})^{-1}\). Therefore,
and
Proof of Lemma 2
Similar to Lemma 1,
where \(P_{Z_{e}} = Z_{e}(Z_e'Z_e)^{-1}Z_e'\) and \(Z_e = [z_{1e} \ z_{2e} \ldots z_{n_{ee}} ]'\); \(N_e\) is the number of individuals in the group e. Since \(N_{e} \overset{}{\rightarrow } \infty \) and \(N_{ne} \overset{}{\rightarrow } \infty \) as \(N \overset{}{\rightarrow } \infty \),
by the law of large numbers and the central limit theorem under assumption 1.
Proof of Proposition 2
The difference in the asymptotic variances of \(\widehat{\beta }_{CF}^*\) and the conventional 2SLS estimator under homogeneous endogeneity are given by:
To compare the two matrices in the brackets, we can compare their inverses:
Because \(M_{Z}\) is a positive definite matrix, it follows that \(X'M_{Z}X\) is also. The matrix \((w X' P_{Z}X + (1-w) X'X)\) is larger in the matrix sense than \( X' P_{Z}X\) and thus, its inverse is smaller. Therefore, \(Avar[\widehat{\beta }_{2SLS}] - Avar[\widehat{\beta }_{CF}^*]\) is a positive definite matrix.
Proof of Lemma 3
Using the residual matrix estimated from the first stage regression, we obtain the following OLS estimator:
where \({\hat{x}}_{n}^{*} = [x_n' \ \ {\hat{V}}_n']'\); \({\hat{V}}_n = (D_n \otimes {\hat{\nu }}_n)\); \(m_{n} = [\mathbf{0}' \ \ (V_n-{\hat{V}}_n)']'\). Note \(x_{n}^{*} = {\hat{x}}_{n}^{*} + m_n\). The first term in the second bracket is reduced to:
where \( \mathbf{Z _\mathbf{n }} = (D_n\otimes Z_n')\). As \({\hat{\pi }} \ \xrightarrow []{p} \pi \) and \({\hat{x}}_n \ \xrightarrow []{p} x_n\), we have:
The second term in the second bracket also converges to zero as:
Because \( \sum \nolimits _{n=1}^{N}{\hat{x}}_{n}^* \hat{x}_n^{*'} \ \xrightarrow []{p} E\Big [ x_{n}^* x_n^{*'}\Big ]\), \(\hat{\delta }\ \xrightarrow []{p} \delta \). \(\square \)
Proof of Proposition 3
From Lemma 3 , we have:
The first term in the second bracket is reduced to:
where \( \mathbf{Z _\mathbf{n }} = (D_n\otimes Z_n')\). Thus, we have:
Next, the second term in the second bracket is decomposed into two parts as:
And the second term in the above equation can be rewritten as:
Therefore, we obtain the following result:
because \(\sum \nolimits _{n=1}^{N} Z_n'\eta _{n} \xrightarrow []{p} E\big [Z_n'\eta _{n} \big ] = 0\) and \(\sqrt{N}(\hat{\pi }- \pi ) = O_p(1)\). This leads to:
Using \( \sum \nolimits _{n=1}^{N}{\hat{x}}_{n}^* {\hat{x}}_n^{*'} \ \xrightarrow []{p} E[ x_{n}^* x_n^{*'}]\), we obtain:
where \(H = E\big [x_n \varGamma ' \mathbf{Z _\mathbf{n }} \big ]\). Thus the asymptotic distribution of \({\hat{\delta }}\) is given by:
where \(\sqrt{N} ({\hat{\pi }} - \pi ) \xrightarrow []{d} N(0,M)\) because \(E[\eta _{n} \vert x_n,Z_n ] = 0\). \(\square \)
Appendix B
In Appendix B we show Monte-Carlo simulation results about the finite sample performance of the new method that combines the CF approach and the kmeans clustering algorithm, which is explained in Sect. 3.1. For the simulation, we employ the same data generating process and the same true values of the model parameters used in Sect. 2.3 except \(\rho _2\) and \(\mu ^2\). The true value of \(\rho _2\) is set to be \(-0.9\). (a strong endogeneity case) Sample sizes vary a case by a case. For the first simulation trials, we consider \(\{N_1 = 10,N_2=10 \}\) and \(T = 1,5,10\), and 20. For the second simulation trials, we consider \(\{N_1 = 100,N_2=100 \}\) and \(T = 1,5,10\), and 20. The ratio of the two population variances that determine the concentration parameter \(\mu ^2\) is assumed \({Var( \pi _1 z_{i,j})\over Var(\nu _{i,j})} = {1 \over 10} = 0.1\). Thus, \(\mu ^2 = T(N_1+N_2)\times 0.1\) is \(\{2,10,20,40\}\) for the first simulation set and is \(\{20,100,200,400\}\) for the second simulation set. We generate 10,000 data sets for each case. Then we apply 4 estimators, OLS, 2SLS, the CF approach based on the true group membership, and the CF approach based on the modified kmeans clustering algorithm. Note that the first and the second cases of the first simulation set are weak and marginally weak instrument cases in which standard 2SLS estimation does not perform well.
Figure 3 shows the results for the first simulation set that assumes \(N_1=10\) and \(N_2 = 10\). As well known in the literature, the sampling distribution of the 2SLS extremely disperses when \(\mu ^2 =2\). This implies that 2SLS does not reveal any information for statistical inference. On the other hand, in the same case the sampling distributions of the two proposed CF estimators are close to the true value 0.5 even though there exists a slight bias in the CF estimator when the group membership is estimated by the modified kmeans clustering algorithm. Both proposed CF estimators perform substantially better because they extract information from the exogenous group in the data. In the second case (\(T = 5, \mu ^2 = 10\)), the performance of the 2SLS estimator is not much better than the first case. However, the performance of the CF estimator that is based on the estimated groups does improve significantly. The magnitude of the bias of the CF estimator becomes almost negligible. In the third case (\(T = 10, \mu ^2 = 20\)) and the fourth case (\(T = 20, \mu ^2 = 40\)), the CF estimator that estimates the group membership produce almost identical results with those of the CF estimator that employs the true group membership.
Figure 4 shows the results of the second simulation set that assumes \(N_1=100\) and \(N_2 = 100\). The important difference from the previous simulation set is that when \(N = N_1 + N_2\) is large and T is small, the CF estimator that directly estimates the group membership has a larger bias. But once again the bias quickly converges to 0 when T increases. When T is larger than 10 in this case, the bias already becomes very close to 0.
Rights and permissions
About this article
Cite this article
Ghosh, P., Grier, K. & Kim, J. Heterogeneous endogeneity. Stat Papers 62, 847–886 (2021). https://doi.org/10.1007/s00362-019-01116-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-019-01116-9