Abstract
Although semiparametric models, in particular varying-coefficient models, alleviate the curse of dimensionality by avoiding estimation of fully nonparametric multivariate functions, there would typically still be a large number of functions to estimate. We propose a dimension reduction approach to estimating a large number of nonparametric univariate functions in varying-coefficient models, in which these functions are constrained to lie in a finite-dimensional subspace consisting of the linear span of a small number of smooth functions. The proposed methodology is put in the context of quantile regression, which provides more information on the response variable than the more conventional mean regression. Finally, we present some numerical illustrations to demonstrate the performances.
Similar content being viewed by others
References
Cai Z, Xiao Z (2012) Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. J Econom 167:413–425
Cai Z, Xu X (2008) Nonparametric quantile estimations for dynamic smooth coefficient models. J Am Stat Assoc 103(484):1595–1608
Chen R, Tsay RS (1993) Functional-coefficient autoregressive models. J Am Stat Assoc 88(421):298–308
Fan J, Ma Y, Dai W (2014) Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc 109(507):1270–1284
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B Methodol 55(4):757–796
He X, Shi P (1994) Convergence rate of b-spline estimators of nonparametric conditional quantile functions. J Nonparametric Stat 3:299–308
Horowitz JL, Lee S (2005) Nonparametric estimation of an additive quantile regression model. J Am Stat Assoc 100(472):1238–1249
Jiang Q, Wang H, Xia Y, Jiang G (2013) On a principal varying coefficient model. J Am Stat Assoc 108(501):228–236
Kim M (2007) Quantile regression with varying coefficients. Annal Stat 35(1):92–108
Lian H (2012a) A note on the consistency of schwarz’s criterion in linear quantile regression with the scad penalty. Stat Probab Lett 82(7):1224–1228
Lian H (2012b) Variable selection for high-dimensional generalized varying-coefficient models. Stat Sinica 22:1563–1588
Storlie C, Bondell H, Reich B, Zhang H (2011) Surface estimation, variable selection, and the nonparametric oracle property. Stat Sinica 21:679–705
van der Geer SA (2000) Empirical processes in M-Estimation. Cambridge University Press, Cambridge
Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Annal Stat 37(6B):3841–3866
Wei F, Huang J, Li HZ (2011) Variable selection and estimation in high-dimensional varying-coefficient models. Stat Sinica 21:1515–1540
Xue L, Qu A (2012) Variable selection in high-dimensional varying-coefficient models with global optimality. J Mach Learn Res 13(1):1973–1998
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Annal Stat 38(2):894–942
Zhao W, Jiang X, Lian H (2018) A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 127:269–280
Zhao W, Zhang F, Wang X, Li R, Lian H (2019) Principal varying coefficient estimator for high-dimensional models. Statistics 53:1234–1250
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Acknowledgements
The research of Weihua Zhao is partially supported by National Social Science Fund (15BTJ027). Li Rui’s research was supported by National Social Science Fund of China (No.17BTJ025). The research of Heng Lian is supported by City University of Hong Kong Start-up grant 7200521, by Hong Kong RGC general research fund 11301718 and 11300519, by Project 11871411 from NSFC and by the Shenzhen Research Institute, City University of Hong Kong. On behalf of all authors, the corresponding author states that there is no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
Let \({\varvec{{\theta }}}_{0j}\) be the spline coefficients satisfying \(\sup _t|g_j(t)-\mathbf{B}^{\mathrm{T}}(t){\varvec{{\theta }}}_{0j}|\le CK^{-d}\), which is possible by (A3), and let \({{\varvec{\Theta }}}_0=({\varvec{{\theta }}}_{01},\ldots ,{\varvec{{\theta }}}_{0p})^{\mathrm{T}}\) and \({\varvec{{\theta }}}_0=\mathrm{vec}({{\varvec{\Theta }}}_0^{\mathrm{T}})\). Let \(F(.|T,\mathbf{X})\) be the conditional cdf of e given the covariates and \(F(\cdot )\) the unconditional cdf of e. We also define \(\mathbf{Z}_i=\mathbf{B}(T_i)\otimes \mathbf{X}_i\) and \(m_i=\mathbf{X}_i^{\mathrm{T}}\mathbf{g}(T_i)\). For a matrix, \(\Vert \cdot \Vert \) is used to denote its Frobenius norm while \(\Vert \cdot \Vert _{op}\) is used to denote the operator norm. In the proofs C denotes a generic positive constant which may assume different values even on the same line.
Lemma 1
Let \(r_n=\sqrt{(K+p-r)r/n}+\sqrt{p}K^{-d}\). We have
Proof of Lemma 1
As in He and Shi (1994), without loss of generality and for simplicity of notation, we consider median regression with \(\tau =1/2,\;\rho _\tau (u)=|u|/2\) only, and the general case can be shown in the same way. Let \(\mathcal{N}=\{{\varvec{{\theta }}}^{(1)},\ldots ,{\varvec{{\theta }}}^{(N)}\}\) be a \(\delta _n\) covering of \(\{{\varvec{{\theta }}}: \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n\}\). By Lemma 2.5 in van der Geer (2000), we have the bound \(\log N\le C pK\log n\) if we set \(\delta _n\sim n^{-a}\) for \(a>0\) and we will choose a to be large enough.
Let \(M_{ni}({\varvec{{\theta }}})=\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|+ \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})\), and \(M_n({\varvec{{\theta }}})=\sum _{i=1}^nM_{ni}({\varvec{{\theta }}})\). Let \(\Omega \) be the event that \(\{\max _{i,j}|X_{ij}|\le C\sqrt{\log (pn)}\}\), and by the assumed sub-Gaussianity of \(X_{ij}\), we have
for any \(a>0\) as long as C is sufficiently large. Let \(M_{ni}^*({\varvec{{\theta }}})=M_{ni}({\varvec{{\theta }}})I_{\Omega }\) and \(M_n^*({\varvec{{\theta }}})=M_n({\varvec{{\theta }}})I_{\Omega }\) where \(I_{\Omega }\) is the indicator function that takes value 1 when \(\Omega \) happens. By (6), we only need to show that
Since the function |u| is Lipschitz, and using the fact that for any \({\varvec{{\theta }}}\) there exists \({\varvec{{\theta }}}^{(l)}\) in the covering with \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(l)}\Vert ^2\le \delta _n^2\), we have
which can obviously be made smaller than \(nr_n^2\) by setting \(\delta _n\sim n^{-a}\) for a large enough.
Furthermore, by simple algebra,
Thus
Furthermore, we have
Using Bernstein’s inequality, we have
When we set \(a=O\left( \max \{(pK\log n)^{3/2}r_n, \sqrt{n(pK\log n)^{3/2}r_n^3}\}\right) =o(nr_n^2)\), the right hand side above converges to zero which proves the result. \(\square \)
Proof of Theorem 1
Suppose \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n\) . Using the Knight’s identity \(\rho _\tau (x-y)-\rho _\tau (x)=-y(\tau -I\{x\le 0\})+\int _0^y (I\{x\le t\}-I\{x\le 0\})dt\), we have that
where \(R_i=\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_0-m_i\).
Now define
Lemma 2 shows that \(\Vert {\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0\Vert =O_p(r_n)\).
Denote
If \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =C r_n\), then by Lemma 1, we have
Since \(Q({\varvec{{\theta }}})\) is a quadratic function of \({\varvec{{\theta }}}\) after ignoring a small term \(O_p(n(\sqrt{pK\log n}r_n)^3)\), we have that when \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =Cr_n\),
As a result, we have
This implies that \(\Vert \widehat{\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =O_p(r_n)\), which proved the convergence rate. \(\square \)
Lemma 2
\(\Vert {\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0\Vert =O_p(r_n)\).
Proof of Lemma 2
By the definition of \({\varvec{{\theta }}}^*\) and comparing the value of (11) at \({\varvec{{\theta }}}={\varvec{{\theta }}}^*\) and at \({\varvec{{\theta }}}={\varvec{{\theta }}}_0\), we have
where \(\mathbf{Z}=(\sqrt{f(0|T_1,\mathbf{X}_1)}\mathbf{Z}_1,\ldots ,\sqrt{f(0|T_n,\mathbf{X}_n)}\mathbf{Z}_n)^{\mathrm{T}}\) and \(\mathbf{e}=(\epsilon _1,\ldots ,\epsilon _n)^{\mathrm{T}}\). The rest of the proof is the same as the proof of Theorem 2.1 in Zhao et al. (2019) and thus the details are omitted. \(\square \)
Proof of Theorem 2
We define the oracle estimator by \(\widehat{\varvec{{\theta }}}^o_{(1)}\), which is a local minimizer of (2) using only the first q components of \(\mathbf{X}\) associated with nonzero \(g_j\). By adding zero components, we also call \(\widehat{\varvec{{\theta }}}^o=( (\widehat{\varvec{{\theta }}}^o_{(1)})^{\mathrm{T}},\mathbf{0}^{\mathrm{T}})^{\mathrm{T}}\) the oracle estimator. The same as shown in Theorem 1, the oracle estimator is consistent with convergence rate \(O_p(r_n)\). Thus what is left to show is that \(\widehat{\varvec{{\theta }}}^o\) is a local minimizer of (4).
The strategy is to compare the objective function value at \(\widehat{\varvec{{\theta }}}^o\) with the value at other \({\varvec{{\theta }}}\) in a sufficiently small neighborhood of \(\widehat{\varvec{{\theta }}}^o\). For this, we consider any \({\varvec{{\theta }}}=({\varvec{{\theta }}}_{(1)}^{\mathrm{T}},{\varvec{{\theta }}}_{(2)}^{\mathrm{T}}=\mathbf{0}^{\mathrm{T}})^{\mathrm{T}}\) with \(\Vert {\varvec{{\theta }}}_{(1)}-\widehat{\varvec{{\theta }}}_{(1)}^o\Vert \le c\) and c is sufficiently small. By the definition of \(\widehat{\varvec{{\theta }}}^o\) as a local minimizer of (2), we have
while
since \(\Vert \widehat{\varvec{{\theta }}}_j^o-{\varvec{{\theta }}}_{0j}\Vert =O_p(r_n)=o(\lambda )\) and \(\lambda =o_p(\min _{j\le q}\Vert {\varvec{{\theta }}}_{0j}\Vert _2)\).
Now consider \({\varvec{{\theta }}}=({\varvec{{\theta }}}_{(1)}^{\mathrm{T}},{\varvec{{\theta }}}_{(2)}^{\mathrm{T}})^{\mathrm{T}}\), with \(\Vert {\varvec{{\theta }}}-\widehat{\varvec{{\theta }}}^o\Vert \le c\) (again, c needs to be small enough). Assuming (by way of contradiction) that \({\varvec{{\theta }}}_{j^*}\ne 0\) for some \(j^*> q\), we only need to show that for \(1\le j^*\le p\) uniformly, we have
where \({\varvec{{\theta }}}^*\) is the same as \({\varvec{{\theta }}}\) except that we replace \({\varvec{{\theta }}}_{j^*}\) by \(\mathbf{0}\). Furthermore, considering the penalty terms, we have
Next, by convexity of the loss function which implies that \(\rho _\tau (x)-\rho _\tau (y)\ge (\tau -I\{y\le 0\})(x-y)\), we have
where \(\mathbf{Z}_{ij}=\mathbf{B}(T_i)X_{ij}\). For the first term above, using Bernstein’s inequality, we have \(\max _{1\le j\le p,{\varvec{{\theta }}}_{j}\ne \mathbf{0}}\sum _i(\tau -I\{e_i\le 0\})\mathbf{Z}_{ij}{\varvec{{\theta }}}_{j}=O_p(\sqrt{nK\log (pn)}\Vert {\varvec{{\theta }}}_{j}\Vert )\). For the second term above, we show in Lemma 3 that, if c is sufficiently small,
and thus the second term in (14) is \(O_p(nKr_n\sqrt{q\log (pn)}\Vert {\varvec{{\theta }}}_{j^*}\Vert )\).
Thus by our assumptions, we have \(\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)=-O_p(nKr_n\sqrt{q\log (pn)})\Vert {\varvec{{\theta }}}_{j^*}\Vert \). This is dominated by the difference of the penalties stated in (13), which established (12). \(\square \)
Lemma 3
Proof of Lemma 3
We have
Application of Bernstein’s inequality yields that for \(t_n\rightarrow 0\), \(\delta _n\rightarrow 0\), some \(j\in \{1,\ldots ,p\}\), \(k\in \{1,\ldots ,K\}\), for any \(u>0\),
Let \(A=\{ {\varvec{{\theta }}}:\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le t_n, {\varvec{{\theta }}}_j=\mathbf{0} \text{ for } j>q \}\). Next we derive a bound for
We construct a \(n^{-c}\) covering of A with size \(R=O(n^{CqK})\), with elements denoted by \(\{{\varvec{{\theta }}}^{(1)},\ldots ,{\varvec{{\theta }}}^{(R)}\}\). Then we write
Since the two terms above are similar, we only consider the first term in the following. Furthermore, since we can consider the positive part and the negative part of \(X_{ij}B_k(T_i)\) separately and in a similar way, we can assume without loss of generality that \(X_{ij}B_k(T_i)\ge 0\) and further restrict to the event \(\Omega \) so that \(X_{ij}B_k(T_i)\le \sqrt{K\log (n)}, j\le q\). Then we have
By (16), using the union bound, we have
where \(b_n=C(\sqrt{qK\log n}t_n+\sqrt{q}K^{-d})\) is an upper bound for \(|\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i|\). Taking \(u\asymp (\sqrt{\frac{K}{n}}+\sqrt{Kb_n})Kq\log (pn)\), we see that
For \(I_{2}\), using the monotonicity of the indicator function and that for \({\varvec{{\theta }}}\in A, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c}\), \(|\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)})|\le t_n':=C\sqrt{qK\log (n)}n^{-c}\), we have
Again by (16) for \(I_{21}\) with union bound, and Taylor’s expansion for \(I_{22}\), we obtain
Combining the bounds for \(I_1\) and \(I_{2}\) we get
when c is chosen to be large enough.
Now, using \(|\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o|\le t_n'':= C\sqrt{pK\log (pn)}c\) (note c can be chosen to be arbitrarily small), we can bound (as before we can assume \(X_{ij}B_k(T_i)\ge 0\) and also get rid of the absolute value around the sum without loss of generality)
The second sum above can be arbitrarily small since \(t_n''\) and \(\Vert {\varvec{{\theta }}}^*-\widehat{\varvec{{\theta }}}^o\Vert \) are. Noting \(\widehat{\varvec{{\theta }}}^o\in A\) when \(t_n=Cr_n\), exactly the same arguments used in the bound for \(I_1\) and \(I_2\) gives the above is \(O_p((\sqrt{K}/n+\sqrt{K\sqrt{qK\log n}r_n/n})Kq\log (pn))\). Thus
\(\square \)
Rights and permissions
About this article
Cite this article
Zhao, W., Li, R. & Lian, H. High-dimensional quantile varying-coefficient models with dimension reduction. Metrika 85, 1–19 (2022). https://doi.org/10.1007/s00184-021-00814-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-021-00814-5