Skip to main content
Log in

High-dimensional quantile varying-coefficient models with dimension reduction

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

Although semiparametric models, in particular varying-coefficient models, alleviate the curse of dimensionality by avoiding estimation of fully nonparametric multivariate functions, there would typically still be a large number of functions to estimate. We propose a dimension reduction approach to estimating a large number of nonparametric univariate functions in varying-coefficient models, in which these functions are constrained to lie in a finite-dimensional subspace consisting of the linear span of a small number of smooth functions. The proposed methodology is put in the context of quantile regression, which provides more information on the response variable than the more conventional mean regression. Finally, we present some numerical illustrations to demonstrate the performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Cai Z, Xiao Z (2012) Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. J Econom 167:413–425

    Article  MathSciNet  Google Scholar 

  • Cai Z, Xu X (2008) Nonparametric quantile estimations for dynamic smooth coefficient models. J Am Stat Assoc 103(484):1595–1608

    Article  MathSciNet  Google Scholar 

  • Chen R, Tsay RS (1993) Functional-coefficient autoregressive models. J Am Stat Assoc 88(421):298–308

    MathSciNet  MATH  Google Scholar 

  • Fan J, Ma Y, Dai W (2014) Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc 109(507):1270–1284

    Article  MathSciNet  Google Scholar 

  • Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B Methodol 55(4):757–796

    MathSciNet  MATH  Google Scholar 

  • He X, Shi P (1994) Convergence rate of b-spline estimators of nonparametric conditional quantile functions. J Nonparametric Stat 3:299–308

    Article  MathSciNet  Google Scholar 

  • Horowitz JL, Lee S (2005) Nonparametric estimation of an additive quantile regression model. J Am Stat Assoc 100(472):1238–1249

    Article  MathSciNet  Google Scholar 

  • Jiang Q, Wang H, Xia Y, Jiang G (2013) On a principal varying coefficient model. J Am Stat Assoc 108(501):228–236

    Article  MathSciNet  Google Scholar 

  • Kim M (2007) Quantile regression with varying coefficients. Annal Stat 35(1):92–108

    Article  MathSciNet  Google Scholar 

  • Lian H (2012a) A note on the consistency of schwarz’s criterion in linear quantile regression with the scad penalty. Stat Probab Lett 82(7):1224–1228

    Article  MathSciNet  Google Scholar 

  • Lian H (2012b) Variable selection for high-dimensional generalized varying-coefficient models. Stat Sinica 22:1563–1588

    MathSciNet  MATH  Google Scholar 

  • Storlie C, Bondell H, Reich B, Zhang H (2011) Surface estimation, variable selection, and the nonparametric oracle property. Stat Sinica 21:679–705

    Article  MathSciNet  Google Scholar 

  • van der Geer SA (2000) Empirical processes in M-Estimation. Cambridge University Press, Cambridge

    Google Scholar 

  • Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Annal Stat 37(6B):3841–3866

    Article  MathSciNet  Google Scholar 

  • Wei F, Huang J, Li HZ (2011) Variable selection and estimation in high-dimensional varying-coefficient models. Stat Sinica 21:1515–1540

    Article  MathSciNet  Google Scholar 

  • Xue L, Qu A (2012) Variable selection in high-dimensional varying-coefficient models with global optimality. J Mach Learn Res 13(1):1973–1998

    MathSciNet  MATH  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Annal Stat 38(2):894–942

    Article  MathSciNet  Google Scholar 

  • Zhao W, Jiang X, Lian H (2018) A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 127:269–280

    Article  MathSciNet  Google Scholar 

  • Zhao W, Zhang F, Wang X, Li R, Lian H (2019) Principal varying coefficient estimator for high-dimensional models. Statistics 53:1234–1250

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The research of Weihua Zhao is partially supported by National Social Science Fund (15BTJ027). Li Rui’s research was supported by National Social Science Fund of China (No.17BTJ025). The research of Heng Lian is supported by City University of Hong Kong Start-up grant 7200521, by Hong Kong RGC general research fund 11301718 and 11300519, by Project 11871411 from NSFC and by the Shenzhen Research Institute, City University of Hong Kong. On behalf of all authors, the corresponding author states that there is no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

Let \({\varvec{{\theta }}}_{0j}\) be the spline coefficients satisfying \(\sup _t|g_j(t)-\mathbf{B}^{\mathrm{T}}(t){\varvec{{\theta }}}_{0j}|\le CK^{-d}\), which is possible by (A3), and let \({{\varvec{\Theta }}}_0=({\varvec{{\theta }}}_{01},\ldots ,{\varvec{{\theta }}}_{0p})^{\mathrm{T}}\) and \({\varvec{{\theta }}}_0=\mathrm{vec}({{\varvec{\Theta }}}_0^{\mathrm{T}})\). Let \(F(.|T,\mathbf{X})\) be the conditional cdf of e given the covariates and \(F(\cdot )\) the unconditional cdf of e. We also define \(\mathbf{Z}_i=\mathbf{B}(T_i)\otimes \mathbf{X}_i\) and \(m_i=\mathbf{X}_i^{\mathrm{T}}\mathbf{g}(T_i)\). For a matrix, \(\Vert \cdot \Vert \) is used to denote its Frobenius norm while \(\Vert \cdot \Vert _{op}\) is used to denote the operator norm. In the proofs C denotes a generic positive constant which may assume different values even on the same line.

Lemma 1

Let \(r_n=\sqrt{(K+p-r)r/n}+\sqrt{p}K^{-d}\). We have

$$\begin{aligned}&\sup _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n} \left| \sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _{i=1}^n\rho _\tau (Y_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0})\right. \\&\quad +\sum _{i=1}^n \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(\tau -I\{e_i\le 0\})\\&\quad \left. -\sum _{i=1}^nE[\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})|T_i,\mathbf{X}_i]+\sum _{i=1}^nE[\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0})|T_i,\mathbf{X}_i]\right| =o_p(nr_n^2). \end{aligned}$$

Proof of Lemma 1

As in He and Shi (1994), without loss of generality and for simplicity of notation, we consider median regression with \(\tau =1/2,\;\rho _\tau (u)=|u|/2\) only, and the general case can be shown in the same way. Let \(\mathcal{N}=\{{\varvec{{\theta }}}^{(1)},\ldots ,{\varvec{{\theta }}}^{(N)}\}\) be a \(\delta _n\) covering of \(\{{\varvec{{\theta }}}: \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n\}\). By Lemma 2.5 in van der Geer (2000), we have the bound \(\log N\le C pK\log n\) if we set \(\delta _n\sim n^{-a}\) for \(a>0\) and we will choose a to be large enough.

Let \(M_{ni}({\varvec{{\theta }}})=\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|+ \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})\), and \(M_n({\varvec{{\theta }}})=\sum _{i=1}^nM_{ni}({\varvec{{\theta }}})\). Let \(\Omega \) be the event that \(\{\max _{i,j}|X_{ij}|\le C\sqrt{\log (pn)}\}\), and by the assumed sub-Gaussianity of \(X_{ij}\), we have

$$\begin{aligned} P(\Omega ^c)\le n^{-a}, \end{aligned}$$
(6)

for any \(a>0\) as long as C is sufficiently large. Let \(M_{ni}^*({\varvec{{\theta }}})=M_{ni}({\varvec{{\theta }}})I_{\Omega }\) and \(M_n^*({\varvec{{\theta }}})=M_n({\varvec{{\theta }}})I_{\Omega }\) where \(I_{\Omega }\) is the indicator function that takes value 1 when \(\Omega \) happens. By (6), we only need to show that

$$\begin{aligned} \sup _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n}|M_n^*({\varvec{{\theta }}})-E[M_n^*({\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}]|=o_p(nr_n^2). \end{aligned}$$

Since the function |u| is Lipschitz, and using the fact that for any \({\varvec{{\theta }}}\) there exists \({\varvec{{\theta }}}^{(l)}\) in the covering with \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(l)}\Vert ^2\le \delta _n^2\), we have

$$\begin{aligned}&M_n({\varvec{{\theta }}})-E[M_n({\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}]-M_n({\varvec{{\theta }}}^{(l)})+E[M_n({\varvec{{\theta }}}^{(l)})|\{T_i,\mathbf{X}_i\}]\\&\qquad = O_p(\sum _{i=1}^n|\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}^{(l)}) |)=O_p(n\sqrt{pK}\delta _n), \end{aligned}$$

which can obviously be made smaller than \(nr_n^2\) by setting \(\delta _n\sim n^{-a}\) for a large enough.

Furthermore, by simple algebra,

$$\begin{aligned} |M_{ni}({\varvec{{\theta }}})|= & {} \left| \frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|\right. \\&\left. +\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})\right| \\= & {} \left| \frac{1}{2}|e_i+m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|e_i+m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|\right. \\&\left. +\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})\right| \\\le & {} |\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)|\cdot I\{|e_i|\le |\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)|+|m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|\}. \end{aligned}$$

Thus

$$\begin{aligned} |M_{ni}^*({\varvec{{\theta }}})|\le & {} |\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)|I_{\Omega }\\\le & {} C\sqrt{pK\log (pn)}r_n\\=: & {} A. \end{aligned}$$

Furthermore, we have

$$\begin{aligned}&E[|M_{ni}^*({\varvec{{\theta }}})-E[M_{ni}^*({\varvec{{\theta }}})|T_i,\mathbf{X}_i]|^2] \end{aligned}$$
(7)
$$\begin{aligned}&\quad \le E|M_{ni}^*({\varvec{{\beta }}},{\varvec{{\theta }}})|^2\end{aligned}$$
(8)
$$\begin{aligned}&\quad \le C(\sqrt{pK\log (pn)}r_n)E [(\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0))^2]\nonumber \\&\quad \le C(\sqrt{pK\log (pn)}r_n)(r_n^2)=:D^2. \end{aligned}$$
(9)

Using Bernstein’s inequality, we have

$$\begin{aligned} P(\sup _{{\varvec{{\theta }}}\in \mathcal{N}} |M_n^*({\varvec{{\theta }}})-E[M_n^*({\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}]|>a)\le & {} C\exp \{-\frac{a^2}{aA+nD^2}+CpK\log n\}. \end{aligned}$$

When we set \(a=O\left( \max \{(pK\log n)^{3/2}r_n, \sqrt{n(pK\log n)^{3/2}r_n^3}\}\right) =o(nr_n^2)\), the right hand side above converges to zero which proves the result. \(\square \)

Proof of Theorem 1

Suppose \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n\) . Using the Knight’s identity \(\rho _\tau (x-y)-\rho _\tau (x)=-y(\tau -I\{x\le 0\})+\int _0^y (I\{x\le t\}-I\{x\le 0\})dt\), we have that

$$\begin{aligned}&E\left[ \sum _{i=1}^n \rho _\tau (e_i+m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}\right] -E\left[ \sum _{i=1}^n\rho _\tau (e_i+m_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0})|\{T_i,\mathbf{X}_i\}\right] \nonumber \\&\quad =\sum _i\int _{\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}-m_i}^{\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i} F(t|T_i,\mathbf{X}_i)-F(0|T_i,\mathbf{X}_i)dt\nonumber \\&\quad = (1/2) \sum _i f(0|T_i,\mathbf{X}_i)\left[ (\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0))^2+2 \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)R_i\right] \nonumber \\&\quad +O_p(n(\sqrt{pK\log n}r_n)^3),\nonumber \\ \end{aligned}$$
(10)

where \(R_i=\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_0-m_i\).

Now define

$$\begin{aligned} {\varvec{{\theta }}}^*:= & {} {{\,\mathrm{arg\,min}\,}}_{\mathrm{rank}({{\varvec{\Theta }}})\le r} \sum _i -\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(\tau -I\{e_i\le 0\})\nonumber \\&+(1/2)f(0|U_i,\mathbf{X}_i)\left[ ( \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0))^2+2 \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)R_i\right] . \end{aligned}$$
(11)

Lemma 2 shows that \(\Vert {\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0\Vert =O_p(r_n)\).

Denote

$$\begin{aligned} Q({\varvec{{\theta }}})= & {} -\sum _{i=1}^n \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(\tau -I\{e_i\le 0\}) +E\sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})\\&-E\sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}).\end{aligned}$$

If \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =C r_n\), then by Lemma 1, we have

$$\begin{aligned}\sup _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert \le C r_n} \left| \sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _{i=1}^n\rho _\tau (Y_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)-[Q({\varvec{{\theta }}})-Q({\varvec{{\theta }}}^*)]\right| =o_p(nr_n^2).\end{aligned}$$

Since \(Q({\varvec{{\theta }}})\) is a quadratic function of \({\varvec{{\theta }}}\) after ignoring a small term \(O_p(n(\sqrt{pK\log n}r_n)^3)\), we have that when \(\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =Cr_n\),

$$\begin{aligned}|Q({\varvec{{\theta }}})-Q({\varvec{{\theta }}}^*)|\ge C n \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert ^2-O_p(n(\sqrt{pK\log n}r_n)^3)>Cn r_n^2.\end{aligned}$$

As a result, we have

$$\begin{aligned}\lim _{n\rightarrow \infty } P\left\{ \inf _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =Cr_n}\sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})>\sum _{i=1}^n\rho _\tau (Y_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)\right\} =1.\end{aligned}$$

This implies that \(\Vert \widehat{\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =O_p(r_n)\), which proved the convergence rate. \(\square \)

Lemma 2

\(\Vert {\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0\Vert =O_p(r_n)\).

Proof of Lemma 2

By the definition of \({\varvec{{\theta }}}^*\) and comparing the value of (11) at \({\varvec{{\theta }}}={\varvec{{\theta }}}^*\) and at \({\varvec{{\theta }}}={\varvec{{\theta }}}_0\), we have

$$\begin{aligned} \Vert \mathbf{Z}({\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0)\Vert ^2\le +2\langle \mathbf{Z}({\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0),\mathbf{e}+\mathbf{R},\rangle \end{aligned}$$

where \(\mathbf{Z}=(\sqrt{f(0|T_1,\mathbf{X}_1)}\mathbf{Z}_1,\ldots ,\sqrt{f(0|T_n,\mathbf{X}_n)}\mathbf{Z}_n)^{\mathrm{T}}\) and \(\mathbf{e}=(\epsilon _1,\ldots ,\epsilon _n)^{\mathrm{T}}\). The rest of the proof is the same as the proof of Theorem 2.1 in Zhao et al. (2019) and thus the details are omitted. \(\square \)

Proof of Theorem 2

We define the oracle estimator by \(\widehat{\varvec{{\theta }}}^o_{(1)}\), which is a local minimizer of (2) using only the first q components of \(\mathbf{X}\) associated with nonzero \(g_j\). By adding zero components, we also call \(\widehat{\varvec{{\theta }}}^o=( (\widehat{\varvec{{\theta }}}^o_{(1)})^{\mathrm{T}},\mathbf{0}^{\mathrm{T}})^{\mathrm{T}}\) the oracle estimator. The same as shown in Theorem 1, the oracle estimator is consistent with convergence rate \(O_p(r_n)\). Thus what is left to show is that \(\widehat{\varvec{{\theta }}}^o\) is a local minimizer of (4).

The strategy is to compare the objective function value at \(\widehat{\varvec{{\theta }}}^o\) with the value at other \({\varvec{{\theta }}}\) in a sufficiently small neighborhood of \(\widehat{\varvec{{\theta }}}^o\). For this, we consider any \({\varvec{{\theta }}}=({\varvec{{\theta }}}_{(1)}^{\mathrm{T}},{\varvec{{\theta }}}_{(2)}^{\mathrm{T}}=\mathbf{0}^{\mathrm{T}})^{\mathrm{T}}\) with \(\Vert {\varvec{{\theta }}}_{(1)}-\widehat{\varvec{{\theta }}}_{(1)}^o\Vert \le c\) and c is sufficiently small. By the definition of \(\widehat{\varvec{{\theta }}}^o\) as a local minimizer of (2), we have

$$\begin{aligned} \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})\ge \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o), \end{aligned}$$

while

$$\begin{aligned} n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})=n \sum _{j=1}^p p_\lambda (\Vert \widehat{\varvec{{\theta }}}_j^o\Vert _{\mathbf{A}_j}) \end{aligned}$$

since \(\Vert \widehat{\varvec{{\theta }}}_j^o-{\varvec{{\theta }}}_{0j}\Vert =O_p(r_n)=o(\lambda )\) and \(\lambda =o_p(\min _{j\le q}\Vert {\varvec{{\theta }}}_{0j}\Vert _2)\).

Now consider \({\varvec{{\theta }}}=({\varvec{{\theta }}}_{(1)}^{\mathrm{T}},{\varvec{{\theta }}}_{(2)}^{\mathrm{T}})^{\mathrm{T}}\), with \(\Vert {\varvec{{\theta }}}-\widehat{\varvec{{\theta }}}^o\Vert \le c\) (again, c needs to be small enough). Assuming (by way of contradiction) that \({\varvec{{\theta }}}_{j^*}\ne 0\) for some \(j^*> q\), we only need to show that for \(1\le j^*\le p\) uniformly, we have

$$\begin{aligned} \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})+n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})\ge \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)+n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}^*_j\Vert _{\mathbf{A}_j}),\nonumber \\ \end{aligned}$$
(12)

where \({\varvec{{\theta }}}^*\) is the same as \({\varvec{{\theta }}}\) except that we replace \({\varvec{{\theta }}}_{j^*}\) by \(\mathbf{0}\). Furthermore, considering the penalty terms, we have

$$\begin{aligned} n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})-n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})=n p_\lambda (\Vert {\varvec{{\theta }}}_{j^*}\Vert _{\mathbf{A}_{j^*}})=n \lambda \Vert {\varvec{{\theta }}}_{j^*}\Vert _{\mathbf{A}_{j^*}}. \end{aligned}$$
(13)

Next, by convexity of the loss function which implies that \(\rho _\tau (x)-\rho _\tau (y)\ge (\tau -I\{y\le 0\})(x-y)\), we have

$$\begin{aligned}&\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)\nonumber \\&\quad \ge -\sum _i(\tau -I\{Y_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*\})\mathbf{Z}_{ij^*}^{\mathrm{T}}{\varvec{{\theta }}}_{j^*}\nonumber \\&\quad = -\sum _i(\tau -I\{e_i\le 0\})\mathbf{Z}_{ij^*}^{\mathrm{T}}{\varvec{{\theta }}}_{j^*}-\sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\})\mathbf{Z}_{ij^*}^{\mathrm{T}}{\varvec{{\theta }}}_{j^*},\nonumber \\ \end{aligned}$$
(14)

where \(\mathbf{Z}_{ij}=\mathbf{B}(T_i)X_{ij}\). For the first term above, using Bernstein’s inequality, we have \(\max _{1\le j\le p,{\varvec{{\theta }}}_{j}\ne \mathbf{0}}\sum _i(\tau -I\{e_i\le 0\})\mathbf{Z}_{ij}{\varvec{{\theta }}}_{j}=O_p(\sqrt{nK\log (pn)}\Vert {\varvec{{\theta }}}_{j}\Vert )\). For the second term above, we show in Lemma 3 that, if c is sufficiently small,

$$\begin{aligned}&\sup _{\Vert {\varvec{{\theta }}}-\widehat{\varvec{{\theta }}}^o\Vert \le c,1\le j\le p}\Big \Vert \sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}\nonumber \\&\qquad -F(0)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i))\mathbf{Z}_{ij^*}\Big \Vert \nonumber \\&\quad = O_p(\sqrt{nr_n}K^{3/4}q^{5/4}\log ^{5/4}(pn)), \end{aligned}$$
(15)

and thus the second term in (14) is \(O_p(nKr_n\sqrt{q\log (pn)}\Vert {\varvec{{\theta }}}_{j^*}\Vert )\).

Thus by our assumptions, we have \(\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)=-O_p(nKr_n\sqrt{q\log (pn)})\Vert {\varvec{{\theta }}}_{j^*}\Vert \). This is dominated by the difference of the penalties stated in (13), which established (12). \(\square \)

Lemma 3

$$\begin{aligned}&\sup _{\Vert {\varvec{{\theta }}}^*-\widehat{\varvec{{\theta }}}^o\Vert \le c,1\le j\le p}\Big \Vert \sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}\nonumber \\&\qquad -F(0)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i))\mathbf{Z}_{ij}\Big \Vert \nonumber \\&\quad = O_p(\sqrt{nr_n}K^{3/4}q^{5/4}\log ^{5/4}(pn)), \end{aligned}$$

Proof of Lemma 3

We have

$$\begin{aligned} E\left[ \left| X_{ij}B_k(T_i) (I\{e_{ij}\le t_n+\delta _n\}-I\{e_{ij}\le t_n\})\right| ^q\right] \le Cq!(CK)^{q/2}\delta _n, \quad q=1,2,\ldots . \end{aligned}$$

Application of Bernstein’s inequality yields that for \(t_n\rightarrow 0\), \(\delta _n\rightarrow 0\), some \(j\in \{1,\ldots ,p\}\), \(k\in \{1,\ldots ,K\}\), for any \(u>0\),

$$\begin{aligned}&\Pr \left\{ \Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le t_n+\delta _n\}-I\{e_{i}\le t_n\}\right. \nonumber \\&\quad \left. -F(t_n+\delta _n)+F(t_n)\Big |>u/\sqrt{n}\right\} \nonumber \\&\quad \le C\exp \left\{ -C\frac{u^2}{u\sqrt{K/n}+K\delta _n}\right\} . \end{aligned}$$
(16)

Let \(A=\{ {\varvec{{\theta }}}:\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le t_n, {\varvec{{\theta }}}_j=\mathbf{0} \text{ for } j>q \}\). Next we derive a bound for

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p}\Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}\\&\qquad -I\{e_{i}\le 0\} -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]\Big |. \end{aligned}$$

We construct a \(n^{-c}\) covering of A with size \(R=O(n^{CqK})\), with elements denoted by \(\{{\varvec{{\theta }}}^{(1)},\ldots ,{\varvec{{\theta }}}^{(R)}\}\). Then we write

$$\begin{aligned}&\Pr \Big \{\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p}\Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\& -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]\Big |>u/\sqrt{n}\Big \} \\\le & {} \Pr \Big \{\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\& -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]>u/\sqrt{n}\Big \} \\&+\Pr \Big \{\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p} -\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\& -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]>u/\sqrt{n}\Big \} . \end{aligned}$$

Since the two terms above are similar, we only consider the first term in the following. Furthermore, since we can consider the positive part and the negative part of \(X_{ij}B_k(T_i)\) separately and in a similar way, we can assume without loss of generality that \(X_{ij}B_k(T_i)\ge 0\) and further restrict to the event \(\Omega \) so that \(X_{ij}B_k(T_i)\le \sqrt{K\log (n)}, j\le q\). Then we have

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)] \\&\quad \le \max _{1\le r\le R,k,j}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i\}-I\{e_{i}\le 0\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)+F(0)] \\&\qquad +\sup _{1\le r\le R, {\varvec{{\theta }}}\in A,\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c},k,j} \frac{1}{n}\sum _{i}X_{ij}B_k(T_i))[I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}\\&\qquad -I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i) ]\\&\quad =: I_{1}+I_{2}. \end{aligned}$$

By (16), using the union bound, we have

$$\begin{aligned}&\Pr \Big (\max _{1\le r\le R,k,j}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i\}-I\{e_{i}\le 0\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)+F(0)]>u/\sqrt{n}\Big )\\&\quad \le CKpR\exp \left\{ -C\frac{u^2}{u \sqrt{K/n}+Kb_n}\right\} , \end{aligned}$$

where \(b_n=C(\sqrt{qK\log n}t_n+\sqrt{q}K^{-d})\) is an upper bound for \(|\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i|\). Taking \(u\asymp (\sqrt{\frac{K}{n}}+\sqrt{Kb_n})Kq\log (pn)\), we see that

$$\begin{aligned} I_{1}=O_p\left( \sqrt{\frac{1}{n}}(\sqrt{\frac{K}{n}}+\sqrt{Kb_n})Kq\log (pn)\right) . \end{aligned}$$

For \(I_{2}\), using the monotonicity of the indicator function and that for \({\varvec{{\theta }}}\in A, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c}\), \(|\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)})|\le t_n':=C\sqrt{qK\log (n)}n^{-c}\), we have

$$\begin{aligned}&I_2 \le \sup _{{\varvec{{\theta }}}\in A,1\le r\le R, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c},k,j}\frac{1}{n}\sum _{i}X_{ij}B_k(T_i) \Big [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i+ t_n'\}\\&\qquad -I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i) \Big ]\\&\quad \le \sup _{1\le r\le R, k,j}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) \Big [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i +t_n'\}-I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i+t_n')+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i) \Big ]\\&\qquad +\sup _{1\le r\le R, {\varvec{{\theta }}}\in A, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c},k,j}\frac{1}{n}\sum _{i}X_{ij}B_k(T_i)\left( F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i+t_n')\right. \\&\qquad \left. -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i) \right) \\&\quad = I_{21}+I_{22}. \end{aligned}$$

Again by (16) for \(I_{21}\) with union bound, and Taylor’s expansion for \(I_{22}\), we obtain

$$\begin{aligned} I_{2}=O_p\left( \sqrt{\frac{1}{n}}(\sqrt{\frac{K}{n}})qK\log (pn)\right) . \end{aligned}$$

Combining the bounds for \(I_1\) and \(I_{2}\) we get

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p}\Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]\Big |\\&\quad = O_p((\sqrt{K}/n+\sqrt{Kb_n/n})Kq\log (pn)), \end{aligned}$$

when c is chosen to be large enough.

Now, using \(|\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o|\le t_n'':= C\sqrt{pK\log (pn)}c\) (note c can be chosen to be arbitrarily small), we can bound (as before we can assume \(X_{ij}B_k(T_i)\ge 0\) and also get rid of the absolute value around the sum without loss of generality)

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}^*,1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i)+F(0)]\Big |\\&\quad \le \sup _{1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n''\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i)+F(0)] \\&\quad \le \sup _{1\le k\le K, 1\le j\le p}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n''\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n'')+F(0)] \\&\qquad +\sup _{1\le k\le K, 1\le j\le p}\frac{1}{n}\sum _{i}X_{ij}B_k(T_i)[F(\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n'')\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i)]. \end{aligned}$$

The second sum above can be arbitrarily small since \(t_n''\) and \(\Vert {\varvec{{\theta }}}^*-\widehat{\varvec{{\theta }}}^o\Vert \) are. Noting \(\widehat{\varvec{{\theta }}}^o\in A\) when \(t_n=Cr_n\), exactly the same arguments used in the bound for \(I_1\) and \(I_2\) gives the above is \(O_p((\sqrt{K}/n+\sqrt{K\sqrt{qK\log n}r_n/n})Kq\log (pn))\). Thus

$$\begin{aligned}&\Big \Vert \max _j\sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}\nonumber \\&\qquad -F(0)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i))\mathbf{Z}_{ij^*}\Big \Vert \nonumber \\&\quad = O_p((\sqrt{K}+\sqrt{nK\sqrt{qK\log n}r_n})K^{3/2}q\log (pn)). \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, W., Li, R. & Lian, H. High-dimensional quantile varying-coefficient models with dimension reduction. Metrika 85, 1–19 (2022). https://doi.org/10.1007/s00184-021-00814-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-021-00814-5

Keywords

Navigation