Skip to main content
Log in

Heterogeneous endogeneity

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

We define heterogeneous endogeneity as the case where a potentially endogenous regressor is endogenous for some sub-groups of the data but exogenous for other subgroups. We derive an estimator and test procedure based on the control function approach to deal with the phenomenon. We show that accounting for heterogeneous endogeneity can greatly increase the power of endogeneity tests and increase the precision of our estimator over traditional IV. While the gains get larger as the instrument gets weaker and as the relative size of the non-endogenous subgroup gets larger, we find efficiency gains even when the underlying instrument is very strong. We illustrate our approach with an example using data from Abaide et al. (Econometrica 70:91–117).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. One can argue that the federal minimum wage is an endogenous regressor because it is most likely correlated with some states’ unobserved labor market conditions especially for those states which have relatively larger shares on the national income.

  2. Though the strength of the instrument is a factor we vary in our simulations, none of the cases we consider suffer from a weak instrument problem in the sense of Nelson and Startz (1990b) or Stock et al. (2002).

  3. In Sect. 2, we explain the control function method in detail.

  4. Pinkse (2000), Blundell and Powell (2003), Darolles et al. (2003), Hall and Horowitz (2005), Vytlacil and Yildiz (2007), Blundell et al. (2007), Florens et al. (2008), and Imbens and Newey (2009) among others have extended the literature.

  5. There is a large literature on models where the heterogeneous effects of endogenous variables on an outcome variable exist in the population. In the literature, many important works such as Abadie (1991) and Angrist et al. (1996) have proposed methods for estimating the average effect of receiving or not receiving a binary treatment. Recently, Florens et al. (2008) provided a non-parametric way to analyze heterogeneous effects of a continuous treatment. For heterogeneous response models, see Heckman et al. (1997), Card (1999) , Card (2001), Heckman and Vytlacil (2005), Heckman and Vytlacil (2007a), Heckman and Vytlacil (2007b) and references therein.

  6. For notational simplicity, we do not include a vector of exogenous covariates in Eq. (1). We assume that we have a set of valid instruments \(z_{ij}\) in Eq. (2) and thus the parameter \(\beta \) in Eq. (1) can be consistently estimated by 2SLS. This assumption also implies that our instrument is exogenous to all subgroups of population. Some specific examples of such instruments can be found in Graddy (1995) and Acemoglu et al. (2013).

  7. The group assignments in terms of heterogeneous endogeneity would be viable if economic theories suggest that observed individual characteristics decide the group structure.

  8. We note that by definition W is a \(K \times K\) matrix. In a special case, when we have only one endogenous variable, W is a scalar and \(0 \le W \le 1\).

  9. Note that when there exists heteroskedasticity in \(u_{ij}\), we should consistently estimate the heteroskedasticity first and then employ normalized data for this test.

  10. Through Monte-Carlo simulations, we confirm that a power of the test for heterogeneous endogeneity is well controlled when a concentration parameter is higher than 10.

  11. To perform this endogeneity test, we implicitly assume that the instrumental variables \(z_{ij}\) are completely exogenous. Therefore, \(\hat{\nu }_{ij}\) contains all the contaminated part of \(x_{ij}\) because in the first stage we project \(x_{ij}\) on the instrumental variable space.

  12. In the simulation study, we assume that the exogenous group is priorly known. In the case where the exogenous group is not known, a straightforward test can be performed to identify the group.

  13. A continuous variable can be used to create a categorical variable with proper threshold values.

  14. Given that we have a dummy potentially endogenous variable, we avoid the forbidden regression problem in the first stage by following the procedure outlined on pages 189–191 of Angrist and Pischke (2009) Results are very similar to those reported in the text if we simply use a linear probability model in the first stage.

References

  • Abadie A, Angrist J, Imbens G (2002) Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica 70:91–117

    MathSciNet  MATH  Google Scholar 

  • Abadie J (1991) Instrumental variables estimation of average treatment effects in econometrics and epidemiology, NBER Working Paper No. 115

  • Acemoglu D, Finkelstein A, Notowidigdo M (2013) Income and health spending: evidence from oil price shocks. Rev Econ Stat 95(4):1079–1095

    Google Scholar 

  • Altonji J, Blank R (1999) Race and gender in the labor market. Handb Labor Econ 49:3143–3259

    Google Scholar 

  • Angrist J, Imbens G, Rubin D (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:44–455

    MATH  Google Scholar 

  • Angrist J, Pischke J (2009) Mostly harmless econometrics: an empiricists companion. Princeton University Press, Princeton, NJ

    MATH  Google Scholar 

  • Autor D, Dorn D, Gordon H (2013) The china syndrome: local labor market effects of import competition in the united states. Am Econ Rev 103(6):2121–2168

    Google Scholar 

  • Autor D, Dorn D, Gordon H, Majlesi K (2017) Importing political polarization? the electoral consequences of rising trade exposure, NBER Working Paper No. 22637

  • Barnow B, Cain G, Goldberger A (1981) Selection on observables. Eval Stud Rev Annu 5(1):43–59

    Google Scholar 

  • Blundell R, Chen X, Christensen D (2007) Semi-nonparametric IV estimation of shape-invariant engel curves. Econometrica 75:1613–1669

    MathSciNet  MATH  Google Scholar 

  • Blundell R, Powell J (2003) Identification of unconditional partial effects in nonseparable models. In: Dewatripont M, Hansen L, Turnovsky S (eds) Advances in economics and econometrics, vol II. Cambridge University Press, Cambridge, pp 312–357

    Google Scholar 

  • Bonhomme S, Manresa E (2015) Grouped patterns of heterogeneity in panel data. Econometrica 83(3):1147–1184

    MathSciNet  MATH  Google Scholar 

  • Bound J, Holzer H (1993) Industrial shifts, skill levels and the labor market for white and black males. Rev Econ Stat 75(3):387–396

    Google Scholar 

  • Card D (1999) The causal effect of education on earnings. Handb Labor Econ 3:1801–1863

    Google Scholar 

  • Card D (2001) Estimating the return to schooling: progress on some persistent econometric problems. Econometrica 69(5):1127–1160

    Google Scholar 

  • Chesher A (2003) Identification in nonseparable models. Econometrica 71:1405–1441

    MathSciNet  MATH  Google Scholar 

  • Chesher A (2005) Nonparametric identidication under discrete variation. Econometrica 73:1525–1550

    MathSciNet  MATH  Google Scholar 

  • Darolles S, Florens J, Renault E (2003) Nonparametric estimation of triangular simultaneous equations models, Working Paper. Toulouse University

  • Florens J, Heckman J, Meghir C, Vytlacil E (2008) Identification of treatment effects using control functions in models with continuous, endogenous treatment and heterogeneous effects. Econometrica 76(5):1191–1206

    MathSciNet  MATH  Google Scholar 

  • Forgy E (1965) Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3):768–769

    Google Scholar 

  • Goldberger A, (1972) Selection bias in evaluating treatment effects: some formal illustrations., Discussion Paper. University of Wisconsin–Madison

  • Graddy K (1995) Testing for imperfect competition at the fulton fish market. RAND J Econ 26(1):75–92

    Google Scholar 

  • Hall P, Horowitz J (2005) Nonparametric methods for inference in the presence of instrumental variables. Ann Stat 33:2904–2929

    MathSciNet  MATH  Google Scholar 

  • Hashimoto M (1987) The minimum wage law and youth crimes: time-series evidence. J Law Econ 30(2):443–464

    Google Scholar 

  • Heckman J, Robb R (1985) Alternative methods for evaluating the impact of interventions: an overview. J Econom 30(1–2):239–267

    MATH  Google Scholar 

  • Heckman J, Smith J, Clements N (1997) Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Rev Econ Stud 64(4):487–535

    MathSciNet  MATH  Google Scholar 

  • Heckman J, Vytlacil E (1998) Instrumental variables methods for the correlated random coefficient model: estimating the average rate of return to schooling when the return is correlated with schooling. J Hum Resour 33(4):974–987

    Google Scholar 

  • Heckman J, Vytlacil E (2005) Structural equations, treatment effects, and econometric policy evaluation. Econometrica 73(3):669–738

    MathSciNet  MATH  Google Scholar 

  • Heckman J, Vytlacil E (2007a) Econometric evaluation of social programs, part I: causal models, structural models and econometric policy evaluation. Handb Econom 6:4779–4874

    Google Scholar 

  • Heckman J, Vytlacil E (2007b) Econometric evaluation of social programs, part II: using the marginal treatment effect to organize alternative econometric estimators to evaluate social programs, and to forecast their effects in new environments. Handb Econom 6:4875–5143

    Google Scholar 

  • Holzer H (1987) Informal job search and black youth unemployment. Am Econ Rev 77(3):446–452

    Google Scholar 

  • Imbens G, Newey W (2009) Identification and estimation of triangular simultaneous equations models without additivity. Econometrica 77:1481–1512

    MathSciNet  MATH  Google Scholar 

  • Imbens G, Wooldridge W (2009) Recent developments in the econometrics of program evaluation. J Econ Lit 47(1):5–86

    Google Scholar 

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    MathSciNet  MATH  Google Scholar 

  • Lee S (2007) Endogeneity in quantile regression models: a control function approach. J Econom 141:1131–1158

    MathSciNet  MATH  Google Scholar 

  • Ma L, Koenker R (2006) Quantile regression methods for recursive structural equation models. J Econom 134:471–506

    MathSciNet  MATH  Google Scholar 

  • Nelson C, Startz R (1990a) The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one. J Bus 63(1):S125–S140

    Google Scholar 

  • Nelson C, Startz R (1990b) Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58(4):967–976

    MathSciNet  MATH  Google Scholar 

  • Newey W, Powell J, Vella F (1999) Nonparametric estimation of triangular simultaneous equations models. Econometrica 67:563–603

    MathSciNet  MATH  Google Scholar 

  • Newey W, Powell J, Vella F (2003) Nonparametric estimation of triangular simultaneous equations models. Econometrica 71:1565–1578

    MathSciNet  MATH  Google Scholar 

  • Pinkse J (2000) Nonparametric two-step regression functions when regressors and error are dependent. Can J Stat 28:289–300

    MathSciNet  Google Scholar 

  • Rivers D, Vuong Q (1988) Limited information estimators and exogeneity tests for simultaneous probit models. J Econom 39(3):347–366

    MathSciNet  MATH  Google Scholar 

  • Smith J (1993) Affirmative action and racial wage gap. Am Econ Rev 83(2):79–84

    Google Scholar 

  • Smith R, Blundell R (1986) An exogeneity test for a simultaneous equation tobit model with an application to labor supply. Econometrica 54(3):679–685

    MathSciNet  MATH  Google Scholar 

  • Stock J, Wright J, Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. J Bus Econ Stat 20(4):518–529

    MathSciNet  Google Scholar 

  • Vytlacil E, Yildiz N (2007) Dummy endogenous variables in weakly separable models. Econometrica 75:757–779

    MathSciNet  MATH  Google Scholar 

  • Wooldridge J (2015) Control function methods in applied econometrics. J Hum Resour 50(2):420–445

    Google Scholar 

Download references

Acknowledgements

We are thankful to Christoph Rothe, Richard Startz, Lutz Kilian, Le Wang for their helpful comments and suggestions. Any errors or omissions are our own.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaeho Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

We first show the assumptions required to derive theoretical results in the paper.

Assumption 1

The data are generated as \(y_{ij} = x_{ij}'\beta + u_{ij}\), \(x_{ij} = Z_{ij}'\delta + \varSigma _{\nu }\nu _{ij}^*\) with the covariance matrix of the error terms in Eq. (3).

Assumption 2

The following limits hold for \(j = 1,2,\ldots ,J,\) when the sample size converges to infinity:

a.:

(exogeneity) \(Z_j'u_j/n_j \overset{p}{\rightarrow } 0\)

b.:

(well-behaved data) \(Z_j'Z_j/n_j\overset{p}{\rightarrow } Q_{Z_jZ_j}\) where \(Q_{Z_jZ_j}\) is a finite, positive definite \(L\times L\) matrix.

c.:

(relevance) \(Z_j'x_j/n_j\overset{p}{\rightarrow } Q_{Z_jX_j}\) where \(Q_{Z_jX_j}\) is a finite \(L\times K\) matrix with rank K.

Assumption 3

As \(N \rightarrow \infty \), the total number of observations in the jth group, \(N_{j}\) converges to infinity such that \({lim}_{N \rightarrow \infty } \frac{N_{j}}{N} = a_{j}\) for \(j = 1,2,\ldots ,J\).

Notice that the convergences in Assumption 2 hold under the weak law of large numbers and the asymptotic moments are assumed to be same across different groups under Assumption 1.

Proof of Lemma 1

Because \({\hat{V}}\) is a diagonal matrix, \(M_{{\hat{V}}}\) is also a diagonal matrix of the following form:

$$\begin{aligned} M_{{\hat{V}}} = diag[M_{{\hat{V}}_1},M_{{\hat{V}}_2},\ldots ,M_{{\hat{V}}_J}] \end{aligned}$$

where \(M_{{\hat{V}}_j} = I_{N_j} - {\hat{V}}_j({\hat{V}}_j^{'}\hat{V}_j)^{-1}{\hat{V}}_j^{'}\). Therefore, we have:

$$\begin{aligned} \begin{aligned} {\hat{\beta }}_{CF}&= \left( X'M_{{\hat{V}}} X \right) ^{-1}\left( X'M_{{\hat{V}}} Y \right) \\&= \left( \sum _{j=1}^{J} x_j' M_{{\hat{V}}_j} x_j \right) ^{-1}\left( \sum _{j=1}^{J} x_j' M_{{\hat{V}}_j} y_j \right) \\&= \left( \sum _{j=1}^{J} x_j' P_{Z_j} x_j \right) ^{-1}\left( \sum _{j=1}^{J} x_j' P_{Z_j} y_j \right) \\&= \left( \sum _{j=1}^{J} \frac{N_{j}}{N}{1 \over N_j}x_j' P_{Z_j} x_j \right) ^{-1}\left( \sum _{j=1}^{J} \frac{N_{j}}{N} {1 \over N_j} x_j' P_{Z_j} y_j \right) \\&= \beta + \left( {1 \over N} \sum _{j=1}^{J} x_j' P_{Z_j} x_j \right) ^{-1}\left( {1 \over N} \sum _{j=1}^{J} x_j' P_{Z_j} E_j \right) \\ \end{aligned} \end{aligned}$$
(A.1)

where \(P_{Z_j} = Z_j(Z_j'Z_j)^{-1}Z_j'\);\(Z_j = [z_{1j} \ z_{2j} \ldots \ z_{N_{jj}} ]'\); \(E_{j} = [\epsilon _{1j} \ \epsilon _{2j} \ \ldots \ \epsilon _{N_{jj}} ]'\). The validity of going from the third line to the fourth line in equation (A.1) is that \(M_{{\hat{V}}_j} x_j = P_{Z_j} x_j\). Since \(N_j \overset{}{\rightarrow } \infty \) as \(N \overset{}{\rightarrow } \infty \) and \(a_{j} = {lim}_{N \rightarrow \infty }\frac{N_{j}}{N}\),

$$\begin{aligned} \sqrt{N}\Big ({\widehat{\beta }}_{CF}^* - \beta \Big ) \overset{d}{\rightarrow } N\bigg (\mathbf{0}, \sigma _u^2\Big (\sum _{j=1}^{J} a_j Q_{Z_jX_j}'Q_{Z_jZ_j}^{-1}Q_{Z_jX_j}\Big )^{-1} \bigg ) \end{aligned}$$

by the law of large numbers and the central limit theorem under the Assumption 1. \(\square \)

Proof of proposition 1

As shown in Eq. (10),

$$\begin{aligned} \widehat{\beta }_{CF}^{*} = \left( {X}^{\prime }M_{\tilde{V}}{X}\right) ^{-1}\left( {X}^{\prime }M_{\tilde{V}}{Y}\right) \end{aligned}$$

where the residual maker matrix \(M_{\tilde{V}} = I - P_{\tilde{V}}\) and \(P_{\tilde{V}}\) is the usual projection matrix. We denote \(\tilde{V} = \begin{bmatrix} \hat{\nu }_{e} \\ 0\\ \end{bmatrix}\). Thus we get

$$\begin{aligned} \begin{aligned} P_{\tilde{V}}&= \tilde{V} (\tilde{V}^{'}\tilde{V})^{-1}\tilde{V}^{'} = \begin{bmatrix} \hat{\nu }_{e} \\ 0 \\ \end{bmatrix} \left( \begin{bmatrix} \hat{\nu }_{e}^{}&0 \end{bmatrix} \begin{bmatrix} \hat{\nu }_{e}^{} \\ 0 \\ \end{bmatrix} \right) ^{-1} \begin{bmatrix} \hat{\nu }_{e}^{}&0 \end{bmatrix} = \begin{bmatrix} P_{\hat{\nu }_{e}^{}}&0 \\ 0&0 \end{bmatrix} \\ \end{aligned} \end{aligned}$$

where \(P_{\hat{\nu }_{e}^{}} = \hat{\nu }_{e}^{}(\hat{\nu }_{e}^{'}\hat{\nu }_{e}^{})^{-1}\hat{\nu }_{e}^{'}\). Therefore,

$$\begin{aligned} M_{\tilde{V}^{}} = I_N - P_{\tilde{V}^{}} = \begin{bmatrix} M_{\hat{\nu }_{e}^{}}&0 \\ 0&I \\ \end{bmatrix} \end{aligned}$$

where \(M_{\hat{\nu }_{e}^{}} = I - P_{\hat{\nu }_{e}^{}}\). Now, we use the above \(M_{\hat{\nu }_{e}^{}}\) matrix to obtain the expressions \({X}^{\prime }M_{\hat{\nu }_{e}^{}}{X} \) and \({X}^{\prime }M_{\hat{\nu }_{e}^{}}{Y}\):

$$\begin{aligned} \begin{aligned} {X}^{\prime }M_{\tilde{V}^{}}{X}&= \begin{bmatrix} x_{e}^{\prime }&x_{ne}^{\prime } \end{bmatrix} \begin{bmatrix} M_{\hat{\nu }_{e}^{}}&0 \\ 0&I \end{bmatrix} \begin{bmatrix} x_{e}\\ x_{b} \end{bmatrix} = x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} x_{e} + x_{ne}^{\prime }x_{ne} \\ \end{aligned} \end{aligned}$$

Similarly for \(\tilde{X}^{\prime }M_{\tilde{V}^{}}\tilde{Y}\), we get

$$\begin{aligned} \begin{aligned} {X}^{\prime }M_{\tilde{V}^{}}{Y}&= \begin{bmatrix} x_{e}^{\prime }&x_{ne}^{\prime } \end{bmatrix} \begin{bmatrix} M_{\hat{\nu }_{e}^{}}&0 \\ 0&I \end{bmatrix} \begin{bmatrix} y_{e}\\ y_{ne} \end{bmatrix} = x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} y_{e} + x_{ne}^{\prime }y_{ne} \\ \end{aligned} \end{aligned}$$

From the standard OLS estimator we get, \(x_{ne}^{\prime }y_{ne} = (x_{ne}^{\prime }x_{ne}) \widehat{\beta }_{ne,OLS}\). For group e, we get \(\hat{\beta }_{e,CF} = (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} x_{e})^{-1} x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}y_{e}\). As \(M_{\hat{\nu }_{e}^{}} = P_{Z_{e}}\), \(\hat{\beta }_{e,CF} = \hat{\beta }_{e,2SLS}\). Therefore, we have \( x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}y_{e} = (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}x_{e}) \widehat{\beta }_{2SLS} \). Substituting \(x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}y_{e}\) and \(x_{ne}^{\prime }y_{ne}\) in \({X}^{\prime }M_{{\tilde{V}}^{}}{Y}\), we get

$$\begin{aligned} {X}^{\prime }M_{{\tilde{V}}^{}}{Y} = (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}x_{e})\widehat{\beta }_{e,2SLS} + (x_{ne}^{\prime }x_{ne}) \widehat{\beta }_{ne,OLS} \end{aligned}$$

Hence we get,

$$\begin{aligned}&\widehat{\beta }_{CF}^{*} = ({X}^{\prime }{\tilde{V}}^{}{X})^{-1} {X}^{\prime }{\tilde{V}}^{}{Y} \\&\quad = \left( x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} x_{e} + x_{ne}^{\prime }x_{ne} \right) ^{-1} \left( (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}}x_{e})\widehat{\beta }_{e,2SLS} + (x_{ne}^{\prime }x_{ne}) \widehat{\beta }_{ne,OLS} \right) \\&\quad = W \widehat{\beta }_{e,2SLS} + (I-W) \widehat{\beta }_{ne,OLS} \end{aligned}$$

where

$$\begin{aligned} W = \Big (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} x_{e} + x_{ne}^{\prime }x_{ne} \Big )^{-1}\Big (x_{e}^{\prime }M_{\hat{\nu }_{e}^{}} x_{e}\Big ) \end{aligned}$$

Under assumption 1, it is straightforward to show that \(\widehat{Avar}[\hat{\beta }_{e,2SLS}] = \hat{\sigma }_u^2\) \((x_e'M_{\hat{\nu }_{e}^{}}x_e)^{-1} = \hat{\sigma }_u^2(x_e'P_{Z_{e}}x_e)^{-1}\) and \(\widehat{Avar}[\hat{\beta }_{ne,OLS}] = \sigma _u^2(x_{ne}'x_{ne})^{-1}\). Therefore,

$$\begin{aligned} W = \Big (\widehat{Avar}[\widehat{\beta }_{e,2SLS}]^{-1} + \widehat{Avar}[\widehat{\beta }_{ne,OLS}]^{-1} \Big )^{-1}\Big (\widehat{Avar}[\widehat{\beta }_{e,2SLS}]^{-1} \Big ) \end{aligned}$$

and

$$\begin{aligned} I - W = \Big (\widehat{Avar}[\widehat{\beta }_{e,2SLS}]^{-1} + \widehat{Avar}[\widehat{\beta }_{ne,OLS}]^{-1})^{-1}(\widehat{Avar}[\widehat{\beta }_{ne,OLS}]^{-1} \Big ). \end{aligned}$$

Proof of Lemma 2

Similar to Lemma 1,

$$\begin{aligned} \begin{aligned} {\widehat{\beta }}_{CF}^*&= ( X'M_{{\tilde{V}}} X)^{-1}( X'M_{{\tilde{V}}} Y) \\&= \beta + ( x_e' P_{Z_e} x_e + x_{ne}' x_{ne} )^{-1}(x_{e}' P_{Z_{e}} u_e + x_{ne}' u_{ne} ) \\&= \beta + \left( \frac{N_{e}}{N} {1 \over N_e} x_{e}' P_{Z_{e}} x_{e} + \left( 1- \frac{N_{e}}{N}\right) {1 \over N_{ne}} x_{ne}' x_{ne} \right) ^{-1} \\&\times \left( \frac{N_{e}}{N}{1 \over N_{e}} x_{e}' P_{Z_{e}} u_{e} + \left( 1- \frac{N_{e}}{N}\right) {1 \over N_{ne}} x_{ne}' u_{ne} \right) \\\end{aligned} \end{aligned}$$

where \(P_{Z_{e}} = Z_{e}(Z_e'Z_e)^{-1}Z_e'\) and \(Z_e = [z_{1e} \ z_{2e} \ldots z_{n_{ee}} ]'\); \(N_e\) is the number of individuals in the group e. Since \(N_{e} \overset{}{\rightarrow } \infty \) and \(N_{ne} \overset{}{\rightarrow } \infty \) as \(N \overset{}{\rightarrow } \infty \),

$$\begin{aligned} \sqrt{N}\Big ({\hat{\beta }}_{CF}^* - \beta \Big ) \overset{d}{\rightarrow } N\bigg (\mathbf{0}, \sigma _u^2\Big ( w \ Q_{Z_{e}X_{e}}'Q_{Z_{e}Z_{e}}^{-1}Q_{Z_{e}X_{e}} + (1-w) Q_{X_{ne}X_{ne}}\Big )^{-1} \bigg ) \end{aligned}$$

by the law of large numbers and the central limit theorem under assumption 1.

Proof of Proposition 2

The difference in the asymptotic variances of \(\widehat{\beta }_{CF}^*\) and the conventional 2SLS estimator under homogeneous endogeneity are given by:

$$\begin{aligned} \begin{aligned}&Avar\big [\widehat{\beta }_{2SLS}\big ] - Avar\big [\widehat{\beta }_{CF}^*\big ] = \sigma ^2_u {1\over N} plim \left( \left( {1 \over N} X' P_{Z}X \right) ^{-1} \right) \\&\qquad - \sigma ^2_u {1\over N} plim\left( \left( {1 \over N} x_{e}' P_{Z_{e}} x_{e} + {1 \over N} x_{ne}' x_{ne} \right) ^{-1} \right) \\&\quad = \sigma ^2_u {1\over N} plim\left( \left( {1 \over N} X' P_{Z}X \right) ^{-1}\right) \\&\qquad - \sigma ^2_u {1\over N} plim\left( \left( w {1 \over N_e} x_e' P_{Z_e} x_e + (1-w) {1 \over N_{ne}} x_{ne}' x_{ne} \right) ^{-1} \right) \\&\quad = \sigma ^2_u {1\over N} plim\left( \left( {1 \over N} X' P_{Z}X \right) ^{-1}\right) \\&- \sigma ^2_u {1\over N} plim\left( \left( w {1 \over N} X' P_{Z} X + (1-w) {1 \over N} X' X \right) ^{-1} \right) \\&\quad = \sigma ^2_u {1\over N} plim\Big ( N\big ((X' P_{Z}X)^{-1} - ( w\ X' P_{Z}X + (1-w)X' X)^{-1} \big ) \Big )\\ \end{aligned} \end{aligned}$$

To compare the two matrices in the brackets, we can compare their inverses:

$$\begin{aligned} \begin{aligned}&= w\ X' P_{Z}X + (1-w) X'X - X' P_{Z}X \\&= w\ X' P_{Z}X + (1-w) X'X -w X' P_{Z}X - (1-w) X' P_{Z}X \\&= (1-w)(X'X - X' P_{Z}X) \\&= (1-w)(X'X + X' (M_{Z}-I)X) \\&= (1-w)(X'M_{Z}X) \\ \end{aligned} \end{aligned}$$

Because \(M_{Z}\) is a positive definite matrix, it follows that \(X'M_{Z}X\) is also. The matrix \((w X' P_{Z}X + (1-w) X'X)\) is larger in the matrix sense than \( X' P_{Z}X\) and thus, its inverse is smaller. Therefore, \(Avar[\widehat{\beta }_{2SLS}] - Avar[\widehat{\beta }_{CF}^*]\) is a positive definite matrix.

Proof of Lemma 3

Using the residual matrix estimated from the first stage regression, we obtain the following OLS estimator:

$$\begin{aligned} \begin{aligned} {\hat{\delta }}&= \left( \sum _{n=1}^N{\hat{x}}_{n}^* {\hat{x}}_n^{*'} \right) ^{-1} \left( \sum _{n=1}^N{\hat{x}}_{n}^* y_n \right) \\&= \left( \sum _{n=1}^N{\hat{x}}_{n}^* {\hat{x}}_n^{*'} \right) ^{-1} \left\{ \sum _{n=1}^N{\hat{x}}_{n}^* x_{n}^{*'} \delta + \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n} \right\} \\&= \left( \sum _{n=1}^N{\hat{x}}_{n}^* {\hat{x}}_n^{*'}\right) ^{-1} \left\{ \sum _{n=1}^N{\hat{x}}_{n}^*{\hat{x}}_{n}^{*'}\delta + \sum _{n=1}^N\hat{x}_{n}^* m_{n}^{'} \delta + \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n} \right\} \\&= \delta + \left( {1\over N}\sum _{n=1}^N{\hat{x}}_{n}^* \hat{x}_n^{*'}\right) ^{-1} \left\{ {1\over N} \sum _{n=1}^N{\hat{x}}_{n}^* m_{n}^{'} \delta + {1\over N} \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n} \right\} \\ \end{aligned} \end{aligned}$$

where \({\hat{x}}_{n}^{*} = [x_n' \ \ {\hat{V}}_n']'\); \({\hat{V}}_n = (D_n \otimes {\hat{\nu }}_n)\); \(m_{n} = [\mathbf{0}' \ \ (V_n-{\hat{V}}_n)']'\). Note \(x_{n}^{*} = {\hat{x}}_{n}^{*} + m_n\). The first term in the second bracket is reduced to:

$$\begin{aligned} \begin{aligned} {1\over N} \sum _{n=1}^N{\hat{x}}_{n}^* m_{n}^{'} \delta =&{1\over N} \sum _{n=1}^N \begin{bmatrix} x_n \\ {\hat{V}}_t \end{bmatrix} \begin{bmatrix} \mathbf{0'}&(V_t- {\hat{V}}_t)' \end{bmatrix} \begin{bmatrix} \beta \\ \varGamma \end{bmatrix} \\&= {1\over N} \sum _{n=1}^N {\hat{x}}_n^* \Big (V_t- {\hat{V}}_t \Big )'\varGamma \\&= {1\over N} \sum _{n=1}^N {\hat{x}}_n^* \varGamma '\Big (V_t- {\hat{V}}_t \Big )\\&= \left( {1\over N} \sum _{n=1}^N {\hat{x}}_n^* \varGamma '\mathbf{Z _\mathbf{n }} \right) \Big ({\hat{\pi }}- \pi \Big ) \\ \end{aligned} \end{aligned}$$

where \( \mathbf{Z _\mathbf{n }} = (D_n\otimes Z_n')\). As \({\hat{\pi }} \ \xrightarrow []{p} \pi \) and \({\hat{x}}_n \ \xrightarrow []{p} x_n\), we have:

$$\begin{aligned} \begin{aligned} {1\over N} \sum _{n=1}^N{\hat{x}}_{n}^* m_{n}^{'} \delta \ \xrightarrow []{p} 0. \end{aligned} \end{aligned}$$

The second term in the second bracket also converges to zero as:

$$\begin{aligned} \begin{aligned} {1\over N} \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n} \ \xrightarrow []{p} E\big [x_{n}^* \eta _n \big ] = E\Big [x_{n}^* E\big [\eta _n \vert x_n,Z_n \big ] \Big ] = 0. \end{aligned} \end{aligned}$$

Because \( \sum \nolimits _{n=1}^{N}{\hat{x}}_{n}^* \hat{x}_n^{*'} \ \xrightarrow []{p} E\Big [ x_{n}^* x_n^{*'}\Big ]\), \(\hat{\delta }\ \xrightarrow []{p} \delta \). \(\square \)

Proof of Proposition 3

From Lemma 3 , we have:

$$\begin{aligned} \begin{aligned} \sqrt{N}\big ({\hat{\delta }} - \delta \big )&= \left( {1\over N}\sum _{n=1}^N{\hat{x}}_{n}^* {\hat{x}}_n^{*'} \right) ^{-1} \left\{ {1\over \sqrt{N}} \sum _{n=1}^N{\hat{x}}_{n}^* m_{n}^{'} \delta + {1\over \sqrt{N}} \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n} \right\} . \\ \end{aligned} \end{aligned}$$

The first term in the second bracket is reduced to:

$$\begin{aligned} \begin{aligned} {1\over \sqrt{N}} \sum _{n=1}^N{\hat{x}}_{n}^* m_{n}^{'} \delta&= \left( {1\over N} \sum _{n=1}^N {\hat{x}}_n \varGamma ' \mathbf{Z _\mathbf{n }} \right) \sqrt{N}({\hat{\pi }}- \pi )\\ \end{aligned} \end{aligned}$$

where \( \mathbf{Z _\mathbf{n }} = (D_n\otimes Z_n')\). Thus, we have:

$$\begin{aligned} \begin{aligned} {1\over \sqrt{N}} \sum _{n=1}^N{\hat{x}}_{n}^* m_{n}^{'} \beta ^*&= E\big [{\hat{x}}_n \varGamma ' \mathbf{Z _\mathbf{n }} \big ]\sqrt{N}({\hat{\pi }}- \pi ) + o_p(1).\\ \end{aligned} \end{aligned}$$

Next, the second term in the second bracket is decomposed into two parts as:

$$\begin{aligned} \begin{aligned} {1\over \sqrt{N}} \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n}&= {1\over \sqrt{N}} \sum _{n=1}^N x_{n}^*\eta _{n} - {1\over \sqrt{N}} \sum _{n=1}^N m_{n} \eta _{n}. \end{aligned} \end{aligned}$$

And the second term in the above equation can be rewritten as:

$$\begin{aligned} \begin{aligned} {1\over \sqrt{N}} \sum _{n=1}^N m_{n}\eta _{n}&= {1\over \sqrt{N}} \sum _{n=1}^N \begin{bmatrix} \mathbf{0} \\ (D_n \otimes Z_n'({\hat{\pi }}- \pi )) \eta _{n} \end{bmatrix} \\&= \begin{bmatrix} \mathbf{0} \\ (D_n \otimes {1\over \sqrt{N}} \sum _{n=1}^N Z_n'\eta _{n}({\hat{\pi }}- \pi )) \end{bmatrix}. \end{aligned} \end{aligned}$$

Therefore, we obtain the following result:

$$\begin{aligned} \begin{aligned} {1\over \sqrt{N}} \sum _{n=1}^N Z_n'\eta _{n}({\hat{\pi }}- \pi )&= \left( {1\over N} \sum _{n=1}^N Z_n'\eta _{n} \right) \sqrt{N}(\hat{\pi }- \pi ) = o_p(1)\\ \end{aligned} \end{aligned}$$

because \(\sum \nolimits _{n=1}^{N} Z_n'\eta _{n} \xrightarrow []{p} E\big [Z_n'\eta _{n} \big ] = 0\) and \(\sqrt{N}(\hat{\pi }- \pi ) = O_p(1)\). This leads to:

$$\begin{aligned} \begin{aligned} {1\over \sqrt{N}} \sum _{n=1}^N {\hat{x}}_{n}^* \eta _{n}&= {1\over \sqrt{N}} \sum _{n=1}^N x_{n}^*\eta _{n} + o_p(1) \end{aligned} \end{aligned}$$

Using \( \sum \nolimits _{n=1}^{N}{\hat{x}}_{n}^* {\hat{x}}_n^{*'} \ \xrightarrow []{p} E[ x_{n}^* x_n^{*'}]\), we obtain:

$$\begin{aligned} \begin{aligned} \sqrt{N}({\hat{\delta }} - \delta )&= \left( E[ x_{n}^* x_n^{*'}] \right) ^{-1} \{ H \sqrt{N}({\hat{\pi }}- \pi ) + {1\over \sqrt{N}} \sum _{n=1}^N x_{n}^* \eta _{n} \} + o_p(1)\\ \end{aligned} \end{aligned}$$

where \(H = E\big [x_n \varGamma ' \mathbf{Z _\mathbf{n }} \big ]\). Thus the asymptotic distribution of \({\hat{\delta }}\) is given by:

$$\begin{aligned} \sqrt{N}({\hat{\delta }} - \delta ) \overset{a}{\sim } N\left( \mathbf{0},\left( E[ x_{n}^* x_n^{*'}] \right) ^{-1} \Big (H \ M \ H' + Var[x_{n}^* \eta _{n}] \Big )\left( \big (E[ x_{n}^* x_n^{*'}] \big )^{-1} \right) ' \right) , \end{aligned}$$

where \(\sqrt{N} ({\hat{\pi }} - \pi ) \xrightarrow []{d} N(0,M)\) because \(E[\eta _{n} \vert x_n,Z_n ] = 0\). \(\square \)

Appendix B

In Appendix B we show Monte-Carlo simulation results about the finite sample performance of the new method that combines the CF approach and the kmeans clustering algorithm, which is explained in Sect. 3.1. For the simulation, we employ the same data generating process and the same true values of the model parameters used in Sect. 2.3 except \(\rho _2\) and \(\mu ^2\). The true value of \(\rho _2\) is set to be \(-0.9\). (a strong endogeneity case) Sample sizes vary a case by a case. For the first simulation trials, we consider \(\{N_1 = 10,N_2=10 \}\) and \(T = 1,5,10\), and 20. For the second simulation trials, we consider \(\{N_1 = 100,N_2=100 \}\) and \(T = 1,5,10\), and 20. The ratio of the two population variances that determine the concentration parameter \(\mu ^2\) is assumed \({Var( \pi _1 z_{i,j})\over Var(\nu _{i,j})} = {1 \over 10} = 0.1\). Thus, \(\mu ^2 = T(N_1+N_2)\times 0.1\) is \(\{2,10,20,40\}\) for the first simulation set and is \(\{20,100,200,400\}\) for the second simulation set. We generate 10,000 data sets for each case. Then we apply 4 estimators, OLS, 2SLS, the CF approach based on the true group membership, and the CF approach based on the modified kmeans clustering algorithm. Note that the first and the second cases of the first simulation set are weak and marginally weak instrument cases in which standard 2SLS estimation does not perform well.

Fig. 3
figure 3

Sampling distributions of \(\beta _1\) from OLS, 2SLS, and two control function (CF) methods: a comparison between control function (CF) methods based on true and estimated groups

Fig. 4
figure 4

Sampling distributions of \(\beta _1\) from OLS, 2SLS, and two control function (CF) methods: a comparison between control function (CF) methods based on true and estimated groups

Figure 3 shows the results for the first simulation set that assumes \(N_1=10\) and \(N_2 = 10\). As well known in the literature, the sampling distribution of the 2SLS extremely disperses when \(\mu ^2 =2\). This implies that 2SLS does not reveal any information for statistical inference. On the other hand, in the same case the sampling distributions of the two proposed CF estimators are close to the true value 0.5 even though there exists a slight bias in the CF estimator when the group membership is estimated by the modified kmeans clustering algorithm. Both proposed CF estimators perform substantially better because they extract information from the exogenous group in the data. In the second case (\(T = 5, \mu ^2 = 10\)), the performance of the 2SLS estimator is not much better than the first case. However, the performance of the CF estimator that is based on the estimated groups does improve significantly. The magnitude of the bias of the CF estimator becomes almost negligible. In the third case (\(T = 10, \mu ^2 = 20\)) and the fourth case (\(T = 20, \mu ^2 = 40\)), the CF estimator that estimates the group membership produce almost identical results with those of the CF estimator that employs the true group membership.

Figure 4 shows the results of the second simulation set that assumes \(N_1=100\) and \(N_2 = 100\). The important difference from the previous simulation set is that when \(N = N_1 + N_2\) is large and T is small, the CF estimator that directly estimates the group membership has a larger bias. But once again the bias quickly converges to 0 when T increases. When T is larger than 10 in this case, the bias already becomes very close to 0.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, P., Grier, K. & Kim, J. Heterogeneous endogeneity. Stat Papers 62, 847–886 (2021). https://doi.org/10.1007/s00362-019-01116-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-019-01116-9

Keywords

Mathematics Subject Classification

Navigation