Abstract
In this paper, we consider the situation in which the observations follow an isotonic generalized partly linear model. Under this model, the mean of the responses is modelled, through a link function, linearly on some covariates and nonparametrically on an univariate regressor in such a way that the nonparametric component is assumed to be a monotone function. A class of robust estimates for the monotone nonparametric component and for the regression parameter, related to the linear one, is defined. The robust estimators are based on a spline approach combined with a score function which bounds large values of the deviance. As an application, we consider the isotonic partly linear log-Gamma regression model. Under regularity conditions, we derive consistency results for the nonparametric function estimators as well as consistency and asymptotic distribution results for the regression parameter estimators. Besides, the empirical influence function allows us to study the sensitivity of the estimators to anomalous observations. Through a Monte Carlo study, we investigate the performance of the proposed estimators under a partly linear log-Gamma regression model with increasing nonparametric component. The proposal is illustrated on a real data set.
Similar content being viewed by others
References
Aït Sahalia Y (1995) The delta method for nonparametric kernel functionals. Ph.D. dissertation, University of Chicago
Álvarez E, Yohai J (2012) \(M\)-estimators for isotonic regression. J Stat Plan Inference 142:2241–2284
Bianco A, Boente G (2004) Robust estimators in semiparametric partly linear regression models. J Stat Plan Inference 122:229–252
Bianco A, Yohai V (1996) Robust estimation in the logistic regression model. Lecture notes in statistics, vol 109. Springer, New York, pp 17–34
Bianco A, García Ben M, Yohai V (2005) Robust estimation for linear regression with asymmetric errors. Can J Stat 33:511–528
Bianco A, Boente G, Rodrigues I (2013a) Resistant estimators in Poisson and Gamma models with missing responses and an application to outlier detection. J Multivar Anal 114:209–226
Bianco A, Boente G, Rodrigues I (2013b) Robust tests in generalized linear models with missing responses. Comput Stat Data Anal 65:80–97
Birke M, Dette H (2007) Testing strict monotonicity in nonparametric regression. Math Methods Stat 16:110–123
Boente G, Rodríguez D (2010) Robust inference in generalized partially linear models. Comput Stat Data Anal 54:2942–2966
Boente G, He X, Zhou J (2006) Robust estimates in generalized partially linear models. Ann Stat 34:2856–2878
Boente G, Rodríguez D, Vena P (2018) Robust estimators in a generalized partly linear regression model under monotony constraints. https://arxiv.org/abs/1802.07998
Cantoni E, Ronchetti E (2001) Robust inference for generalized linear models. J Am Stat Assoc 96:1022–1030
Cantoni E, Ronchetti E (2006) A robust approach for skewed and heavy tailed outcomes in the analysis of health care expenditures. J Health Econ 25:198–213
Croux C, Haesbroeck G (2002) Implementing the Bianco and Yohai estimator for logistic regression. Comput Stat Data Anal 44:273–295
Du J, Sun Z, Xie T (2013) \(M\)-estimation for the partially linear regression model under monotonic constraints. Stat Probab Lett 83:1353–1363
Gijbels I, Hall P, Jones M, Koch I (2000) Tests for monotonicity of a regression mean with guaranteed level. Biometrika 87:663–673
Härdle W, Liang H, Gao J (2000) Partially linear models. Physica-Verlag, Wurzburg
He X, Shi P (1996) Bivariate tensor-product \(B\)-spline in a partly linear model. J Multivar Anal 58:162–181
He X, Shi P (1998) Monotone B-spline smoothing. J Am Stat Assoc 93:643–650
He X, Zhu Z, Fung W (2002) Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika 89:579–590
Heritier S, Cantoni E, Copt S, Victoria-Feser MP (2009) Robust methods in biostatistics. Wiley series in probability and statistics. Wiley, New York
Huang J (2002) A note on estimating a partly linear model under monotonicity constraints. J Stat Plan Inference 107:343–351
Künsch H, Stefanski L, Carroll R (1989) Conditionally unbiased bounded influence estimation in general regression models with applications to generalized linear models. J Am Stat Assoc 84:460–466
Lu M (2010) Spline-based sieve maximum likelihood estimation in the partly linear model under monotonicity constraints. J Multivar Anal 101:2528–2542
Lu M (2015) Spline estimation of generalised monotonic regression. J Nonparametr Stat 27:19–39
Lu M, Zhang Y, Huang J (2007) Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94:705–718
Mallows C (1974) On some topics in robustness. Memorandum Bell Laboratories, Murray Hill
Manchester L (1996) Empirical influence for robust smoothing. Aust J Stat 38:275–296
Marazzi A, Yohai V (2004) Adaptively truncated maximum likelihood regression with asymmetric errors. J Stat Plan Inference 122:271–291
Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York
McCullagh P, Nelder J (1989) Generalized linear models, 2nd edn. Champman and Hall, London
Ramsay J (1988) Monotone regression splines in action. Stat Sci 3:425–441
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22:580–615
Stefanski L, Carroll R, Ruppert D (1986) Bounded score functions for generalized linear models. Biometrika 73:413–424
Stone CJ (1986) The dimensionality reduction principle for generalized additive models. Ann Stat 14:590–606
Sun Z, Zhang Z, Du J (2012) Semiparametric analysis of isotonic errors-in-variables regression models with missing response. Commun Stat Theory Methods 41:2034–2060
Tamine J (2002) Smoothed influence function: another view at robust nonparametric regression. Discussion paper 62 Sonderforschungsbereich 373, Humboldt-Universität zu Berlin
van der Geer S (2000) Empirical processes in \(M\)-estimation. Cambridge University Press, Cambridge
van der Vaart A (1998) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. With applications to statistics. Springer, New York
Wang Y, Huang J (2002) Limiting distribution for monotone median regression. J Stat Plann Infer 108:281–287
Acknowledgements
The authors wish to thank the Associate Editor and two anonymous referee for their valuable comments which led to an improved version of the original paper. This research was partially supported by Grants pip 112-201101-00742 from conicet, pict 2014-0351 from anpcyt and 20020170100022BA and 20020170100330BA from the Universidad de Buenos Aires, Argentina and also by the Spanish Project MTM2016-76969P from the Ministry of Science and Innovation, Spain.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material.
The supplementary material (available online) contains the proof of Theorem 3 and that of the expressions given in (21) and (22) for the empirical influence function of the proposed estimators. Some additional figures for the empirical influence function given in Section 6.1 are provided. It also contains some lemmas ensuring that the entropy assumptions C4 and C5 hold, for some choices of the loss function. 1.21MB PDF
Appendix
Appendix
Throughout this section, we will denote as \(\Vert \rho \Vert _{\infty }=\sup _{y \in \mathbb {R}, u\in \mathbb {R}, a\in {\mathcal {V}}} \rho (y,u,a)\) and \(\Vert w\Vert _{\infty }=\sup _{{\mathbf {x}}\in \mathbb {R}^p} w({\mathbf {x}})\).
1.1 Proof of Theorem 1
Let \(V_{\varvec{\beta },g,a}=\rho \left( y,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }+g(t),a\right) w({\mathbf {x}}) \) and denote as P the probability measure of \((y ,{\mathbf {x}},t )\) and as \(P_n\) its corresponding empirical measure. Then, \(L_n(\varvec{\beta },g,a)=P_n V_{\varvec{\beta },g,a}\) and \(L(\varvec{\beta },g,a)=P V_{\varvec{\beta },g,a}\).
The consistency of \(\widehat{\kappa }\) entails that given any neighbourhood \({\mathcal {V}}\) of \(\kappa _0\), there exists a null set \({\mathcal {N}}_{\mathcal {V}}\), such that for \(\omega \notin {\mathcal {N}}_{\mathcal {V}}\), there exists \(n_0\in \mathbb {N}\), such that for all \(n\ge n_0\) we have that \( \widehat{\kappa }\in {\mathcal {V}}\).
The proof follows similar steps as those used in the proof of Theorem 5.7 of van der Vaart (1998). Let us begin showing that
Note that \(A_n=\sup _{f\in {\mathcal {F}}_n} (P_n-P)f\), where \({\mathcal {F}}_n\) is defined in (10). Furthermore, C1 entails that \(\sup _{f\in {\mathcal {F}}_n}|f|=\Vert \rho \Vert _\infty \Vert w\Vert _\infty \), while C4 and the fact that \(k_n = O(n^\nu )\) with \(\nu< 1/(2r)<1\) imply that
Hence, we get that (A.1) holds (see, for instance, exercise 3.6 in van der Geer (2000) with \(b_n=\max (1, \Vert \rho \Vert _{\infty } \Vert w\Vert _{\infty })\)).
Since \(L(\varvec{\theta }_0, \kappa _0)=\inf _{\varvec{\beta }\in \mathbb {R}^p, g\in {\mathcal {G}}}L(\varvec{\beta },g, \kappa _0)\), where \(\varvec{\theta }_0=(\varvec{\beta }_0,\eta _0)\), we have that
with \(A_{n,1}=L(\widehat{\varvec{\theta }}, \widehat{\kappa })-L_n(\widehat{\varvec{\theta }}, \widehat{\kappa })\), \(A_{n,2}=L_n(\widehat{\varvec{\theta }}, \widehat{\kappa })-L(\varvec{\theta }_0, \kappa _0)\) and \(A_{n,3}=L(\widehat{\varvec{\theta }}, \kappa _0)-L(\widehat{\varvec{\theta }}, \widehat{\kappa })\). Noting that \(|A_{n,1}|\le A_n\), we obtain that \(A_{n,1}=o_{\text {a.s.}}(1)\). On the other hand, since \(L(\widehat{\varvec{\theta }}, a)=L^{\star } (\widehat{\varvec{\beta }}, \widehat{\varvec{\lambda }}, a)\) the equicontinuity of \(L^{\star }\) stated in C1 and the consistency of \(\widehat{\kappa }\) entails that \(A_{n,3}=o_{\text {a.s.}}(1)\).
We will now bound \(A_{n,2}\). Using Lemma A1 of Lu et al. (2007), we get that there exists \(g_n\in {\mathcal {M}}_n({\mathcal {T}}_n,\ell )\) with \(\ell \ge r+2\), such that \(\Vert g_n-\eta _0\Vert _{\infty }=O(n^{-r\nu } )\), for \(1/(2r +2)< \nu < 1/(2r)\). Denote \(\varvec{\theta }_{0,n}=(\varvec{\beta }_0, g_n)\) and let \(S_{n,1}=(P_n-P)V_{\varvec{\beta }_0,g_n, \widehat{\kappa }}\) and \(S_{n,2}=L(\varvec{\theta }_{0,n}, \widehat{\kappa })-L(\varvec{\theta }_0, \kappa _0)\). Note that \(S_{n,1}\le A_n\), so that from (A.1), we get that \(S_{n,1} \buildrel {a.s.}\over \longrightarrow 0\). On the other hand, if we write \(S_{n,2}=\sum _{j=1}^2 S_{n,2}^{(j)}\) where \(S_{n,2}^{(1)}=L(\varvec{\theta }_{0,n}, \widehat{\kappa })- L(\varvec{\theta }_{0,n}, \kappa _0)\) and \(S_{n,2}^{(2)}=L(\varvec{\theta }_{0,n}, \kappa _0)-L(\varvec{\theta }_0, \kappa _0)\), the continuity of \(\rho \) together with the fact that \(\Vert g_n-\eta _0\Vert _{\infty }\rightarrow 0\) and the dominated convergence theorem entail that \(S_{n,2}^{(2)}\rightarrow 0\), while the continuity and boundedness of \(\rho \) together with the consistency of \(\widehat{\kappa }\) lead to \(S_{n,2}^{(1)}=o_{\text {a.s.}}(1)\). Hence, \(S_{n,j}=o_{\text {a.s.}}(1)\) for \(j=1,2\).
Using that \(\widehat{\varvec{\theta }}\) minimizes \(L_n\) over \(\mathbb {R}^p\times {\mathcal {M}}_n({\mathcal {T}}_n,\ell )\) we obtain that
Hence, from (A.2) and (A.3) and using that \(A_{n,j}=o_{\text {a.s.}}(1)\), for \(j=1,3\) and \( S_{n,j}= o_{\text {a.s.}}(1)\), for \(j=1,2\), we conclude that
so \( L(\widehat{\varvec{\theta }}, \kappa _0)\rightarrow L(\varvec{\theta }_0, \kappa _0)\). The fact that \(\inf _{ {\varvec{\theta }}\in {\mathcal {A}}_\epsilon }L(\varvec{\theta },\kappa _0)>L(\varvec{\theta }_0,\kappa _0)\) entails that \(\pi (\widehat{\varvec{\theta }},\varvec{\theta }_0) \buildrel {a.s.}\over \longrightarrow 0\), concluding the proof. \(\square \)
1.2 Proof of Theorem 2
Define the functions \(M_1(s)=L(\varvec{\beta }_0+ s \varvec{\beta },\eta _0,a)\) and \(M_2(s)=L(\varvec{\beta }_0,\eta _0+sg,a) \) and note that \(M_1^{\prime }(0)= \mathbb {E}\left[ w({\mathbf {x}})\varPsi (y ,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }_0+\eta _0(t ) ,a) {\mathbf {x}}^{\textsc {t}}\varvec{\beta }\right] \) and \(M_2^{\prime }(0)= \mathbb {E}\left[ w({\mathbf {x}})\varPsi (y ,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }_0+\eta _0(t ) ,a) g(t)\right] \). When C9a) holds, we have that \(M_1(s)\) and \(M_2(s)\) have a minimum at \(s=0\), for any \(\varvec{\beta }\in \mathbb {R}^p\) and \(g\in {\mathcal {G}}\). Then, \(M_1^{\prime }(0)=0\) and \(M_2^{\prime }(0)=0\) which implies that, for any \(a\in {\mathcal {V}}\),
Clearly, (A.4) and (A.5) also hold under C9b).
To prove Theorem 2 under both sets of assumptions, we will state the common steps at the beginning and we then continue the proof when C5\(^\star \) or C5\(^{\star \star }\) hold.
We denote \( \varTheta _n = \mathbb {R}^p\times {\mathcal {M}}_n({\mathcal {T}}_n,\ell )\cap \{\varvec{\theta }=(\varvec{\beta },g)\in \varTheta : \pi (\varvec{\theta },\varvec{\theta }_0)<\epsilon _0\}\), where \(\varTheta =\mathbb {R}^p\times {\mathcal {G}}\). Note that, except for a null probability set, \(\widehat{\varvec{\theta }}\in \varTheta _n\), for n large enough. As in the proof of Theorem 1, let \(g_n\in {\mathcal {M}}_n({\mathcal {T}}_n,\ell )\) with \(\ell \ge r+2\), \(g_n(t)=\varvec{\lambda }_n^{\textsc {t}}{\mathbf {B}}(t)\), be such that \(\Vert g_n-\eta _0\Vert _{\infty }=O(n^{-r\nu } )\), for \(1/(2r +2)< \nu < 1/(2r)\) and denote \(\varvec{\theta }_{0,n} = (\varvec{\beta }_0,g_n)\).
In order to get the convergence rate of our estimator \(\widehat{\varvec{\theta }}= (\widehat{\varvec{\beta }},\widehat{\eta })\), we will apply Theorem 3.4.1 of van der Vaart and Wellner (1996). For that purpose, following the notation in that Theorem, denote as \(M(\varvec{\theta })= - L(\varvec{\theta }, \widehat{\kappa })\) and \(\mathbb {M}_n(\varvec{\theta })=- L_n(\varvec{\theta }, \widehat{\kappa })\) and for \(\varvec{\theta }\in \varTheta _n\), let \(d_n(\varvec{\theta }, \varvec{\theta }_0)= \pi _{\mathbb {P}}(\varvec{\theta }, \varvec{\theta }_0)\). Note that the function M is random, due to the nuisance parameter estimator \(\widehat{\kappa }\). Let \(\delta _n=A\Vert \eta _0-g_n\Vert _{{{\mathcal {F}}}}\), where \(A=4\,\sqrt{(C_0 /\Vert w\Vert _{\infty }+A_0)/C_0}\) with \(A_0= \Vert w\Vert _{\infty } \Vert \chi \Vert _{\infty }/2\) and \(C_0\) given in C8.
Using that \(|(L_n(\varvec{\theta }, \widehat{\kappa })- L(\varvec{\theta },\widehat{\kappa }))-(L_n(\varvec{\theta }_{0,n}, \widehat{\kappa })- L(\varvec{\theta }_{0,n},\widehat{\kappa })) |= | (\mathbb {M}_n-M)(\varvec{\theta })- (\mathbb {M}_n-M)(\varvec{\theta }_{0,n})|\), to make use of Theorem 3.4.1 of van der Vaart and Wellner (1996), we have to show that there exists a function \(\phi _n\) such that \(\phi _n(\delta )/\delta ^\nu \) is decreasing on \((\delta _n, \infty )\) for some \(\nu <2\) and that for any \(\delta >\delta _n\),
where the symbol \(\lesssim \) means less or equal up to a constant, \(\mathbb {E}^{*}\) stands for the outer expectation and \(\varTheta _{n,\delta }=\{\varvec{\theta }\in \varTheta _n: \delta / 2 < d_n(\varvec{\theta },\varvec{\theta }_{0,n}) \le \delta \}\).
Assumption C8 and the fact that \(\widehat{\kappa } \buildrel {a.s.}\over \longrightarrow \kappa _0\) entails that, except for a null probability set, for any \(\varvec{\theta }\in \varTheta _n\), \(L(\varvec{\theta }, \widehat{\kappa })-L(\varvec{\theta }_0, \widehat{\kappa })\ge C_0\,\pi _{\mathbb {P}}^2(\varvec{\theta },\varvec{\theta }_0)\). Besides, (A.5) entails that \(\mathbb {E}\left[ w({\mathbf {x}})\varPsi (y ,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }_0+\eta _0(t ),a) \left( g_n(t )-\eta _0(t )\right) \right] =0\), so
where \(A_0= \Vert w\Vert _{\infty } \Vert \chi \Vert _{\infty }/2\) and \(\widetilde{\eta }(t)\) is an intermediate value between \(\eta _0(t)\) and \(g_n(t)\). Thus, using that \(d_n^2(\varvec{\theta },\varvec{\theta }_{0,n})\le 2 d_n^2(\varvec{\theta },\varvec{\theta }_0)+ 2 d_n^2(\varvec{\theta }_{0,n},\varvec{\theta }_0) \le 2 d_n^2(\varvec{\theta },\varvec{\theta }_0) + 2 \Vert w\Vert _{\infty }\,\Vert g_n-\eta _0\Vert _{2}^2 \le 2 d_n^2(\varvec{\theta },\varvec{\theta }_0) + 2 \Vert w\Vert _{\infty }\,\Vert g_n-\eta _0\Vert _{{{\mathcal {F}}}}^2\) and that \(\delta / 2 < d_n(\varvec{\theta },\varvec{\theta }_{0,n}) \), we obtain that
concluding the proof of (A.6).
We have now to find \(\phi _n(\delta )\) such that \(\phi _n(\delta )/\delta \) is decreasing in \(\delta \) and (A.7) holds. Note that from the consistency of \(\widehat{\kappa }\), we have that with probability one for n large enough
Define the class of functions
with \(V_{{\varvec{\theta }}, a}=\rho \left( y,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }+g(t),a\right) w({\mathbf {x}}) \), for \(\varvec{\theta }=(\varvec{\beta },g)\). Inequality (A.7) involves an empirical process indexed by \({\mathcal {F}}_{n,\delta }\), since
For any \(f\in {\mathcal {F}}_{n,\delta } \), we have that \(\Vert f\Vert _{\infty } \le A_1 = 2 \Vert \rho \Vert _{\infty } \Vert w\Vert _{\infty }\). Furthermore, if \(A_2= \Vert \psi \Vert _{\infty } \Vert w\Vert _{\infty }\) using that
and the fact that \(\pi _{\mathbb {P}}(\varvec{\theta },\varvec{\theta }_{0,n})=d_n(\varvec{\theta },\varvec{\theta }_{0,n})\le \delta \), we get that
Lemma 3.4.2 van der Vaart and Wellner (1996) leads to
where \(J_{[\;]}(\delta , {\mathcal {F}}, L_2(P)) =\int _0^\delta \sqrt{1+ \log N_{[\;]}(\epsilon , {\mathcal {F}}, L_2(P)) } \mathrm{d}\epsilon \) is the bracketing integral.
(a) Assume now that C5\(^{\star }\) holds and note that for any \(\varvec{\theta }=(\varvec{\beta },g) \in \varTheta _{n,\delta }\), g can be written as \(g=\varvec{\lambda }^{\textsc {t}}{\mathbf {B}}\) for some \(\varvec{\lambda }\in {\mathcal {L}}_{k_n}\), so
Hence, \({\mathcal {F}}_{n,\delta }\subset {\mathcal {G}}_{n,c, {\varvec{\lambda }}_n}\) with \(c= \delta \) and the bound given in C5\(^{\star }\) leads to
This implies that
If we denote \(q_n = k_n + p+1\), we obtain that for some constant \(A_3\) independent of n and \(\delta \),
Choosing
we have that \(\phi _n(\delta )/\delta \) is decreasing in \(\delta \), concluding the proof of (A.7). The fact that \(\pi (\widehat{\varvec{\theta }}, \varvec{\theta }_0) \buildrel {a.s.}\over \longrightarrow 0\) entails that \(\pi _\mathbb {P}(\widehat{\varvec{\theta }}, \varvec{\theta }_0) \buildrel {a.s.}\over \longrightarrow 0\) which together with \(\pi _\mathbb {P}(\varvec{\theta }_{0,n}, \varvec{\theta }_0)\rightarrow 0\), leads to (A.8).
Let \(\gamma _n= O(n^{\min (r\nu ,(1-\nu )/2)})\), then \(\gamma _n \lesssim \delta _n^{-1}\), where \(\delta _n=A\Vert \eta _0-g_n\Vert _{{{\mathcal {F}}}}=O(n^{-r\nu })\). We have to show that \(\gamma _n^2\phi _n \left( 1/{\gamma _n}\right) \lesssim \sqrt{n}\). Note that
where \(a_n=\gamma _n q_n^{1/2}/\sqrt{n}\). Hence, to derive that \(\gamma _n^2\phi _n \left( 1/{\gamma _n}\right) \lesssim \sqrt{n}\), it is enough to show that \(a_n=O(1)\), which follows easily since \(k_n=O(n^{\nu })\) and \(\gamma _n= O(n^\varsigma )\) with \(\varsigma =\min (r\nu ,(1-\nu )/2)\).
Finally, the condition \(\mathbb {M}_n(\widehat{\varvec{\theta }})\ge \mathbb {M}_n(\varvec{\theta }_{0,n})-O_{\mathbb {P}}(\gamma _n^{-2})\) required by Theorem 3.4.1 of van der Vaart and Wellner (1996) is trivially fulfilled because \(\widehat{\varvec{\theta }}_n\) minimizes \(L_n(\varvec{\theta }, \widehat{\kappa })\). Hence, we get that \(\gamma _n^2 d_n^2(\varvec{\theta }_{0,n},\widehat{\varvec{\theta }}) = O_{\mathbb {P}}(1)\).
On the other hand, \(d_n(\varvec{\theta }_{0,n},\varvec{\theta }_0)\le \Vert w\Vert _{\infty }^{1/2} \Vert g_n-\eta _0\Vert _{\infty }=O(n^{-r\nu })\le \gamma _n\), which together with \(\gamma _n^2 d_n^2(\varvec{\theta }_{0,n},\widehat{\varvec{\theta }}) = O_{\mathbb {P}}(1)\) and the triangular inequality leads to \(\gamma _n^2 d_n^2(\varvec{\theta }_{0},\widehat{\varvec{\theta }}) = O_{\mathbb {P}}(1)\), concluding the proof.
(b) We will assume now that C5\(^{\star \star }\) holds. Therefore, using that any \(f\in {\mathcal {F}}_{n,\delta }\) can be written as \(f=f_1-f_2\) with \(f_j\in {\mathcal {F}}_{n,\epsilon _0}^\star \) and the bound given in C5\(^{\star \star }\), we get that
This implies that
If we denote \(q_n = k_n + p+1\), we obtain
Choosing
we have that \(\phi _n(\delta )/\delta \) is decreasing in \(\delta \).
Therefore, from Theorem 3.4.1 of van der Vaart and Wellner (1996), we conclude that \(\gamma _n^2 d_n^2(\varvec{\theta }_{0,n},\widehat{\varvec{\theta }}) = O_{\mathbb {P}}(1)\), where \(\gamma _n\) is any sequence satisfying \(\gamma _n \lesssim \delta _n^{-1}\) with \(\delta _n=\pi (\varvec{\theta }_0,\varvec{\theta }_{0,n} )=O(n^{-r\nu })\) and \(\gamma _n^2\phi _n \left( {1}/{\gamma _n}\right) \le \sqrt{n}\). The first condition entails that \(\gamma _n \le O(n^{r\nu })\). The second one implies that
so using that \(k_n=O(n^{\nu })\) we get that \(\gamma _n \log (\gamma _n)\le O(n^{(1-\nu )/2})\). Finally, the condition \(\mathbb {M}_n(\widehat{\varvec{\theta }})\ge \mathbb {M}_n(\theta _0)-O_{\mathbb {P}}(r_n^{-2})\) required by Theorem 3.4.1 of van der Vaart and Wellner (1996) is trivially fulfilled because \(\widehat{\varvec{\theta }}_n\) minimizes \(L_n(\varvec{\theta }, \widehat{\kappa })\).
On the other hand, \(d_n(\varvec{\theta }_{0,n},\varvec{\theta }_0)\le \Vert w\Vert _{\infty }^{1/2} \Vert g_n-\eta _0\Vert _{\infty }=O(n^{-r\nu })\le \gamma _n\), which together with \(\gamma _n^2 d_n^2(\varvec{\theta }_{0,n},\widehat{\varvec{\theta }}) = O_{\mathbb {P}}(1)\) and the triangular inequality leads to \(\gamma _n^2 d_n^2(\varvec{\theta }_{0},\widehat{\varvec{\theta }}) = O_{\mathbb {P}}(1)\). \(\square \)
1.3 Conditions guaranteeing C8
The following lemma provides conditions to ensure that C8 holds.
Lemma 1
Assume that C9 holds and that \(\rho (y,u,a)\) is twice continuously differentiable with respect to u.
- (a)
If the function \(\chi \left( y,u, a\right) ={\partial ^2 \rho (y,u,a)}/{\partial u^2} \) is such that there exists \(\epsilon _0>0\) and a neighbourhood \({\mathcal {V}}\) of \(\kappa _0\) such that
$$\begin{aligned} C_0= \inf _{a \in {\mathcal {V}}}\inf _{ \begin{array}{c} \pi ^2({\varvec{\theta }},{\varvec{\theta }}_0 )<\epsilon _0\\ {\varvec{\theta }}\in \mathbb {R}^p\times {\mathcal {G}} \end{array}}\inf _{({\mathbf {x}}_0,t_0) \in {\mathcal {S}}_w \times [0,1]} \mathbb {E}\left( \chi \left( y,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }+g(t), a\right) \left| ({\mathbf {x}},t)=({\mathbf {x}}_0,t_0)\right. \right) >0\,, \end{aligned}$$(A.9)where \({\mathcal {S}}_w\) stands for the support of the function w, then C8 holds.
- (b)
If \(\pi ^2(\varvec{\theta }_1 ,\varvec{\theta }_2 ) =\Vert \varvec{\beta }_1-\varvec{\beta }_2 \Vert ^2+ \Vert \eta _1-\eta _2\Vert ^2_{\infty }\), C8 holds if w has bounded support \({\mathcal {S}}_w\subset \{\Vert {\mathbf {x}}\Vert \le A_1\}\) or \(\mathbb {P}(\Vert {\mathbf {x}}\Vert \le A_1)=1\) and for some positive constant \(A_2\)
$$\begin{aligned} C_0= \inf _{a \in {\mathcal {V}}}\inf _{({\mathbf {x}}_0,t_0) \in {\mathcal {S}}_w \times [0,1]}\inf _{|s-s_0|<A_2} \mathbb {E}\left( \chi \left( y,s, a\right) \left| ({\mathbf {x}},t)=({\mathbf {x}}_0,t_0)\right. \right) >0\,, \end{aligned}$$(A.10)where \(s_0= {\mathbf {x}}_0^{\textsc {t}}\varvec{\beta }_0+\eta _0(t_0)\).
Note that (A.9) is the robust counterpart of the assumption that the conditional variance of \(y|({\mathbf {x}},t)\) is bounded away from 0 used in Theorem 1 in Lu (2015). Assumption (A.10) is fulfilled, for instance, if \(\mathbb {E}\left( \chi \left( y,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }_0+\eta _0(t), a\right) \left| ({\mathbf {x}},t)=({\mathbf {x}}_0,t_0)\right. \right) >0\) and the function \(\chi (y,s,a)\) is continuous in all its arguments. These two conditions hold for instance, under partial linear model (6) both for symmetric errors or with errors having a density (8), when the functions \(\phi \) and \(\upsilon \) satisfy the assumptions N3 and N5 needed to derive the asymptotic normality of the regression estimators \(\widehat{\varvec{\beta }}\).
Proof of Lemma 1
For any \(\varvec{\theta }\in \mathbb {R}^p\times {\mathcal {M}}_n({\mathcal {T}}_n, \ell )\), denote as \(M_{{\varvec{\theta }}}(s)= L(\varvec{\theta }_0+s (\varvec{\theta }-\varvec{\theta }_0),a)\), then \(M_{{\varvec{\theta }}}(1)=L(\varvec{\theta },a)\) and \(M_{{\varvec{\theta }}}(0)=L(\varvec{\theta }_0,a)\). Furthermore, denoting \(b({\mathbf {x}},t)={\mathbf {x}}^{\textsc {t}}(\varvec{\beta }-\varvec{\beta }_0)+g(t )-\eta _0(t )\), we have
Assumption C9 implies that \(M_{{\varvec{\theta }}}^{\prime }(0)=0\), hence, a Taylor’s expansion of order two entails that for some \(0<\xi <1\), \(M_{{\varvec{\theta }}}(1)-M_{{\varvec{\theta }}}(0)= M_{{\varvec{\theta }}}^{\prime \,\prime }(\xi )/2\).
- (a)
Denote as \(\varvec{\beta }_\xi =\varvec{\beta }_0+\xi (\varvec{\beta }-\varvec{\beta }_0)\) and \(g_\xi =\eta _0+ \xi (g-\eta _0)=(1-\xi )\eta _0+\xi g\), then \(\varvec{\theta }_\xi =(\varvec{\beta }_\xi ,g_\xi )\in \varTheta \) for \(g\in {\mathcal {G}}\) and \(\pi (\varvec{\theta }_\xi ,\varvec{\theta }_0)=\xi \pi (\varvec{\theta }, \varvec{\theta }_0)\). Therefore, for \(a\in {\mathcal {V}}\), and \(\varvec{\theta }\in \mathbb {R}^p\times {\mathcal {M}}_n({\mathcal {T}}_n, \ell )\), such that \(\pi (\varvec{\theta },\varvec{\theta }_0)<\epsilon _0\), we have that
$$\begin{aligned}&L(\varvec{\theta }, a)-L(\varvec{\theta }_0, a)\\&\quad = M_{{\varvec{\theta }}}(1)-M_{{\varvec{\theta }}}(0)=\frac{1}{2}\; \mathbb {E}\left[ w({\mathbf {x}})\, \chi (y ,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }_\xi + g_\xi (t ), a) \, b^2({\mathbf {x}},t) \right] \\&\quad = \frac{1}{2}\; \mathbb {E}\left[ w({\mathbf {x}})\, \mathbb {E}\left\{ \chi (y ,{\mathbf {x}}^{\textsc {t}}\varvec{\beta }_\xi + g_\xi (t ), a) \Big | ({\mathbf {x}},t)\right\} \, b^2({\mathbf {x}},t) \mathbb {I}_{{\mathcal {S}}_{w}\times [0,1]}({\mathbf {x}},t) \right] \\&\quad \ge C_0 \mathbb {E}w({\mathbf {x}}) b^2({\mathbf {x}},t)= C_0 \pi _{\mathbb {P}}^2(\varvec{\theta }, \varvec{\theta }_0) \end{aligned}$$where we have used that \(\pi (\varvec{\theta }_\xi ,\varvec{\theta }_0) <\epsilon _0\) and (A.9), concluding the proof of (a).
- (b)
Assume that \(\pi ^2(\varvec{\theta }_1 ,\varvec{\theta }_2 ) =\Vert \varvec{\beta }_1-\varvec{\beta }_2 \Vert ^2+ \Vert \eta _1-\eta _2\Vert ^2_{\infty }\) and that (A.10) holds. Let \(s_0={\mathbf {x}}_0^{\textsc {t}}\varvec{\beta }_0+\eta _0(t_0)\) with \({\mathbf {x}}_0\in {\mathcal {S}}_w\). Using that \( |{\mathbf {x}}_0 ^{\textsc {t}}\varvec{\beta }_\xi + g_\xi (t )- s_0|\le A_1 \Vert \varvec{\beta }_\xi -\varvec{\beta }_0\Vert + |g_\xi (t_0 )- \eta _0(t_0)|\), we get that \(|{\mathbf {x}}_0 ^{\textsc {t}}\varvec{\beta }_\xi + g_\xi (t )- s_0|\le A_2\), whenever \(\pi (\varvec{\theta }, \varvec{\theta }_0)\le \epsilon \), with \(\epsilon _0< A_2/(1+A_1)\). The proof follows as in (a) using (A.10). \(\square \)
Rights and permissions
About this article
Cite this article
Boente, G., Rodriguez, D. & Vena, P. Robust estimators in a generalized partly linear regression model under monotony constraints. TEST 29, 50–89 (2020). https://doi.org/10.1007/s11749-019-00629-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-019-00629-7