High-dimensional quantile varying-coefficient models with dimension reduction

Zhao, Weihua; Li, Rui; Lian, Heng

doi:10.1007/s00184-021-00814-5

High-dimensional quantile varying-coefficient models with dimension reduction

Published: 29 June 2021

Volume 85, pages 1–19, (2022)
Cite this article

Metrika Aims and scope Submit manuscript

442 Accesses
2 Citations
Explore all metrics

Abstract

Although semiparametric models, in particular varying-coefficient models, alleviate the curse of dimensionality by avoiding estimation of fully nonparametric multivariate functions, there would typically still be a large number of functions to estimate. We propose a dimension reduction approach to estimating a large number of nonparametric univariate functions in varying-coefficient models, in which these functions are constrained to lie in a finite-dimensional subspace consisting of the linear span of a small number of smooth functions. The proposed methodology is put in the context of quantile regression, which provides more information on the response variable than the more conventional mean regression. Finally, we present some numerical illustrations to demonstrate the performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional local polynomial regression with variable selection and dimension reduction

Article 17 October 2023

Central quantile subspace

Article 18 December 2019

Testing for additivity in nonparametric quantile regression

Article 26 April 2014

References

Cai Z, Xiao Z (2012) Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. J Econom 167:413–425
Article MathSciNet Google Scholar
Cai Z, Xu X (2008) Nonparametric quantile estimations for dynamic smooth coefficient models. J Am Stat Assoc 103(484):1595–1608
Article MathSciNet Google Scholar
Chen R, Tsay RS (1993) Functional-coefficient autoregressive models. J Am Stat Assoc 88(421):298–308
MathSciNet MATH Google Scholar
Fan J, Ma Y, Dai W (2014) Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J Am Stat Assoc 109(507):1270–1284
Article MathSciNet Google Scholar
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B Methodol 55(4):757–796
MathSciNet MATH Google Scholar
He X, Shi P (1994) Convergence rate of b-spline estimators of nonparametric conditional quantile functions. J Nonparametric Stat 3:299–308
Article MathSciNet Google Scholar
Horowitz JL, Lee S (2005) Nonparametric estimation of an additive quantile regression model. J Am Stat Assoc 100(472):1238–1249
Article MathSciNet Google Scholar
Jiang Q, Wang H, Xia Y, Jiang G (2013) On a principal varying coefficient model. J Am Stat Assoc 108(501):228–236
Article MathSciNet Google Scholar
Kim M (2007) Quantile regression with varying coefficients. Annal Stat 35(1):92–108
Article MathSciNet Google Scholar
Lian H (2012a) A note on the consistency of schwarz’s criterion in linear quantile regression with the scad penalty. Stat Probab Lett 82(7):1224–1228
Article MathSciNet Google Scholar
Lian H (2012b) Variable selection for high-dimensional generalized varying-coefficient models. Stat Sinica 22:1563–1588
MathSciNet MATH Google Scholar
Storlie C, Bondell H, Reich B, Zhang H (2011) Surface estimation, variable selection, and the nonparametric oracle property. Stat Sinica 21:679–705
Article MathSciNet Google Scholar
van der Geer SA (2000) Empirical processes in M-Estimation. Cambridge University Press, Cambridge
Google Scholar
Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Annal Stat 37(6B):3841–3866
Article MathSciNet Google Scholar
Wei F, Huang J, Li HZ (2011) Variable selection and estimation in high-dimensional varying-coefficient models. Stat Sinica 21:1515–1540
Article MathSciNet Google Scholar
Xue L, Qu A (2012) Variable selection in high-dimensional varying-coefficient models with global optimality. J Mach Learn Res 13(1):1973–1998
MathSciNet MATH Google Scholar
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Annal Stat 38(2):894–942
Article MathSciNet Google Scholar
Zhao W, Jiang X, Lian H (2018) A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 127:269–280
Article MathSciNet Google Scholar
Zhao W, Zhang F, Wang X, Li R, Lian H (2019) Principal varying coefficient estimator for high-dimensional models. Statistics 53:1234–1250
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research of Weihua Zhao is partially supported by National Social Science Fund (15BTJ027). Li Rui’s research was supported by National Social Science Fund of China (No.17BTJ025). The research of Heng Lian is supported by City University of Hong Kong Start-up grant 7200521, by Hong Kong RGC general research fund 11301718 and 11300519, by Project 11871411 from NSFC and by the Shenzhen Research Institute, City University of Hong Kong. On behalf of all authors, the corresponding author states that there is no conflict of interest.

Author information

Authors and Affiliations

School of Science, Nantong University, Nantong, People’s Republic of China
Weihua Zhao
Shanghai University of International Business and Economics, Shanghai, China
Rui Li
Department of Mathematics, City University of Hong Kong, Hong Kong, China
Heng Lian
City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
Heng Lian

Authors

Weihua Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Heng Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Let ${\varvec{{\theta }}}_{0j}$ be the spline coefficients satisfying $\sup _t|g_j(t)-\mathbf{B}^{\mathrm{T}}(t){\varvec{{\theta }}}_{0j}|\le CK^{-d}$, which is possible by (A3), and let ${{\varvec{\Theta }}}_0=({\varvec{{\theta }}}_{01},\ldots ,{\varvec{{\theta }}}_{0p})^{\mathrm{T}}$ and ${\varvec{{\theta }}}_0=\mathrm{vec}({{\varvec{\Theta }}}_0^{\mathrm{T}})$. Let $F(.|T,\mathbf{X})$ be the conditional cdf of e given the covariates and $F(\cdot )$ the unconditional cdf of e. We also define $\mathbf{Z}_i=\mathbf{B}(T_i)\otimes \mathbf{X}_i$ and $m_i=\mathbf{X}_i^{\mathrm{T}}\mathbf{g}(T_i)$. For a matrix, $\Vert \cdot \Vert $ is used to denote its Frobenius norm while $\Vert \cdot \Vert _{op}$ is used to denote the operator norm. In the proofs C denotes a generic positive constant which may assume different values even on the same line.

Lemma 1

Let $r_n=\sqrt{(K+p-r)r/n}+\sqrt{p}K^{-d}$. We have

$$\begin{aligned}&\sup _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n} \left| \sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _{i=1}^n\rho _\tau (Y_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0})\right. \\&\quad +\sum _{i=1}^n \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(\tau -I\{e_i\le 0\})\\&\quad \left. -\sum _{i=1}^nE[\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})|T_i,\mathbf{X}_i]+\sum _{i=1}^nE[\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0})|T_i,\mathbf{X}_i]\right| =o_p(nr_n^2). \end{aligned}$$

Proof of Lemma 1

As in He and Shi (1994), without loss of generality and for simplicity of notation, we consider median regression with $\tau =1/2,\;\rho _\tau (u)=|u|/2$ only, and the general case can be shown in the same way. Let $\mathcal{N}=\{{\varvec{{\theta }}}^{(1)},\ldots ,{\varvec{{\theta }}}^{(N)}\}$ be a $\delta _n$ covering of $\{{\varvec{{\theta }}}: \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n\}$. By Lemma 2.5 in van der Geer (2000), we have the bound $\log N\le C pK\log n$ if we set $\delta _n\sim n^{-a}$ for $a>0$ and we will choose a to be large enough.

Let $M_{ni}({\varvec{{\theta }}})=\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|+ \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})$, and $M_n({\varvec{{\theta }}})=\sum _{i=1}^nM_{ni}({\varvec{{\theta }}})$. Let $\Omega $ be the event that $\{\max _{i,j}|X_{ij}|\le C\sqrt{\log (pn)}\}$, and by the assumed sub-Gaussianity of $X_{ij}$, we have

$$\begin{aligned} P(\Omega ^c)\le n^{-a}, \end{aligned}$$

(6)

for any $a>0$ as long as C is sufficiently large. Let $M_{ni}^*({\varvec{{\theta }}})=M_{ni}({\varvec{{\theta }}})I_{\Omega }$ and $M_n^*({\varvec{{\theta }}})=M_n({\varvec{{\theta }}})I_{\Omega }$ where $I_{\Omega }$ is the indicator function that takes value 1 when $\Omega $ happens. By (6), we only need to show that

$$\begin{aligned} \sup _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n}|M_n^*({\varvec{{\theta }}})-E[M_n^*({\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}]|=o_p(nr_n^2). \end{aligned}$$

Since the function |u| is Lipschitz, and using the fact that for any ${\varvec{{\theta }}}$ there exists ${\varvec{{\theta }}}^{(l)}$ in the covering with $\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(l)}\Vert ^2\le \delta _n^2$, we have

$$\begin{aligned}&M_n({\varvec{{\theta }}})-E[M_n({\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}]-M_n({\varvec{{\theta }}}^{(l)})+E[M_n({\varvec{{\theta }}}^{(l)})|\{T_i,\mathbf{X}_i\}]\\&\qquad = O_p(\sum _{i=1}^n|\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}^{(l)}) |)=O_p(n\sqrt{pK}\delta _n), \end{aligned}$$

which can obviously be made smaller than $nr_n^2$ by setting $\delta _n\sim n^{-a}$ for a large enough.

Furthermore, by simple algebra,

$$\begin{aligned} |M_{ni}({\varvec{{\theta }}})|= & {} \left| \frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|\right. \\&\left. +\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})\right| \\= & {} \left| \frac{1}{2}|e_i+m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}|-\frac{1}{2}|e_i+m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|\right. \\&\left. +\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(1/2-I\{e_i\le 0\})\right| \\\le & {} |\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)|\cdot I\{|e_i|\le |\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)|+|m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}|\}. \end{aligned}$$

Thus

$$\begin{aligned} |M_{ni}^*({\varvec{{\theta }}})|\le & {} |\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)|I_{\Omega }\\\le & {} C\sqrt{pK\log (pn)}r_n\\=: & {} A. \end{aligned}$$

Furthermore, we have

$$\begin{aligned}&E[|M_{ni}^*({\varvec{{\theta }}})-E[M_{ni}^*({\varvec{{\theta }}})|T_i,\mathbf{X}_i]|^2] \end{aligned}$$

(7)

$$\begin{aligned}&\quad \le E|M_{ni}^*({\varvec{{\beta }}},{\varvec{{\theta }}})|^2\end{aligned}$$

(8)

$$\begin{aligned}&\quad \le C(\sqrt{pK\log (pn)}r_n)E [(\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0))^2]\nonumber \\&\quad \le C(\sqrt{pK\log (pn)}r_n)(r_n^2)=:D^2. \end{aligned}$$

(9)

Using Bernstein’s inequality, we have

$$\begin{aligned} P(\sup _{{\varvec{{\theta }}}\in \mathcal{N}} |M_n^*({\varvec{{\theta }}})-E[M_n^*({\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}]|>a)\le & {} C\exp \{-\frac{a^2}{aA+nD^2}+CpK\log n\}. \end{aligned}$$

When we set $a=O\left( \max \{(pK\log n)^{3/2}r_n, \sqrt{n(pK\log n)^{3/2}r_n^3}\}\right) =o(nr_n^2)$, the right hand side above converges to zero which proves the result. $\square $

Proof of Theorem 1

Suppose $\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le Cr_n$ . Using the Knight’s identity $\rho _\tau (x-y)-\rho _\tau (x)=-y(\tau -I\{x\le 0\})+\int _0^y (I\{x\le t\}-I\{x\le 0\})dt$, we have that

$$\begin{aligned}&E\left[ \sum _{i=1}^n \rho _\tau (e_i+m_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})|\{T_i,\mathbf{X}_i\}\right] -E\left[ \sum _{i=1}^n\rho _\tau (e_i+m_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0})|\{T_i,\mathbf{X}_i\}\right] \nonumber \\&\quad =\sum _i\int _{\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}-m_i}^{\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i} F(t|T_i,\mathbf{X}_i)-F(0|T_i,\mathbf{X}_i)dt\nonumber \\&\quad = (1/2) \sum _i f(0|T_i,\mathbf{X}_i)\left[ (\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0))^2+2 \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)R_i\right] \nonumber \\&\quad +O_p(n(\sqrt{pK\log n}r_n)^3),\nonumber \\ \end{aligned}$$

(10)

where $R_i=\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_0-m_i$.

Now define

$$\begin{aligned} {\varvec{{\theta }}}^*:= & {} {{\,\mathrm{arg\,min}\,}}_{\mathrm{rank}({{\varvec{\Theta }}})\le r} \sum _i -\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(\tau -I\{e_i\le 0\})\nonumber \\&+(1/2)f(0|U_i,\mathbf{X}_i)\left[ ( \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0))^2+2 \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)R_i\right] . \end{aligned}$$

(11)

Lemma 2 shows that $\Vert {\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0\Vert =O_p(r_n)$.

Denote

$$\begin{aligned} Q({\varvec{{\theta }}})= & {} -\sum _{i=1}^n \mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}_0)(\tau -I\{e_i\le 0\}) +E\sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})\\&-E\sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}_{0}).\end{aligned}$$

If $\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =C r_n$, then by Lemma 1, we have

$$\begin{aligned}\sup _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert \le C r_n} \left| \sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _{i=1}^n\rho _\tau (Y_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)-[Q({\varvec{{\theta }}})-Q({\varvec{{\theta }}}^*)]\right| =o_p(nr_n^2).\end{aligned}$$

Since $Q({\varvec{{\theta }}})$ is a quadratic function of ${\varvec{{\theta }}}$ after ignoring a small term $O_p(n(\sqrt{pK\log n}r_n)^3)$, we have that when $\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =Cr_n$,

$$\begin{aligned}|Q({\varvec{{\theta }}})-Q({\varvec{{\theta }}}^*)|\ge C n \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert ^2-O_p(n(\sqrt{pK\log n}r_n)^3)>Cn r_n^2.\end{aligned}$$

As a result, we have

$$\begin{aligned}\lim _{n\rightarrow \infty } P\left\{ \inf _{\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =Cr_n}\sum _{i=1}^n\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})>\sum _{i=1}^n\rho _\tau (Y_i- \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)\right\} =1.\end{aligned}$$

This implies that $\Vert \widehat{\varvec{{\theta }}}-{\varvec{{\theta }}}^*\Vert =O_p(r_n)$, which proved the convergence rate. $\square $

Lemma 2

$\Vert {\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0\Vert =O_p(r_n)$.

Proof of Lemma 2

By the definition of ${\varvec{{\theta }}}^*$ and comparing the value of (11) at ${\varvec{{\theta }}}={\varvec{{\theta }}}^*$ and at ${\varvec{{\theta }}}={\varvec{{\theta }}}_0$, we have

$$\begin{aligned} \Vert \mathbf{Z}({\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0)\Vert ^2\le +2\langle \mathbf{Z}({\varvec{{\theta }}}^*-{\varvec{{\theta }}}_0),\mathbf{e}+\mathbf{R},\rangle \end{aligned}$$

where $\mathbf{Z}=(\sqrt{f(0|T_1,\mathbf{X}_1)}\mathbf{Z}_1,\ldots ,\sqrt{f(0|T_n,\mathbf{X}_n)}\mathbf{Z}_n)^{\mathrm{T}}$ and $\mathbf{e}=(\epsilon _1,\ldots ,\epsilon _n)^{\mathrm{T}}$. The rest of the proof is the same as the proof of Theorem 2.1 in Zhao et al. (2019) and thus the details are omitted. $\square $

Proof of Theorem 2

We define the oracle estimator by $\widehat{\varvec{{\theta }}}^o_{(1)}$, which is a local minimizer of (2) using only the first q components of $\mathbf{X}$ associated with nonzero $g_j$. By adding zero components, we also call $\widehat{\varvec{{\theta }}}^o=( (\widehat{\varvec{{\theta }}}^o_{(1)})^{\mathrm{T}},\mathbf{0}^{\mathrm{T}})^{\mathrm{T}}$ the oracle estimator. The same as shown in Theorem 1, the oracle estimator is consistent with convergence rate $O_p(r_n)$. Thus what is left to show is that $\widehat{\varvec{{\theta }}}^o$ is a local minimizer of (4).

The strategy is to compare the objective function value at $\widehat{\varvec{{\theta }}}^o$ with the value at other ${\varvec{{\theta }}}$ in a sufficiently small neighborhood of $\widehat{\varvec{{\theta }}}^o$. For this, we consider any ${\varvec{{\theta }}}=({\varvec{{\theta }}}_{(1)}^{\mathrm{T}},{\varvec{{\theta }}}_{(2)}^{\mathrm{T}}=\mathbf{0}^{\mathrm{T}})^{\mathrm{T}}$ with $\Vert {\varvec{{\theta }}}_{(1)}-\widehat{\varvec{{\theta }}}_{(1)}^o\Vert \le c$ and c is sufficiently small. By the definition of $\widehat{\varvec{{\theta }}}^o$ as a local minimizer of (2), we have

$$\begin{aligned} \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})\ge \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o), \end{aligned}$$

while

$$\begin{aligned} n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})=n \sum _{j=1}^p p_\lambda (\Vert \widehat{\varvec{{\theta }}}_j^o\Vert _{\mathbf{A}_j}) \end{aligned}$$

since $\Vert \widehat{\varvec{{\theta }}}_j^o-{\varvec{{\theta }}}_{0j}\Vert =O_p(r_n)=o(\lambda )$ and $\lambda =o_p(\min _{j\le q}\Vert {\varvec{{\theta }}}_{0j}\Vert _2)$.

Now consider ${\varvec{{\theta }}}=({\varvec{{\theta }}}_{(1)}^{\mathrm{T}},{\varvec{{\theta }}}_{(2)}^{\mathrm{T}})^{\mathrm{T}}$, with $\Vert {\varvec{{\theta }}}-\widehat{\varvec{{\theta }}}^o\Vert \le c$ (again, c needs to be small enough). Assuming (by way of contradiction) that ${\varvec{{\theta }}}_{j^*}\ne 0$ for some $j^*> q$, we only need to show that for $1\le j^*\le p$ uniformly, we have

$$\begin{aligned} \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})+n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})\ge \sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)+n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}^*_j\Vert _{\mathbf{A}_j}),\nonumber \\ \end{aligned}$$

(12)

where ${\varvec{{\theta }}}^*$ is the same as ${\varvec{{\theta }}}$ except that we replace ${\varvec{{\theta }}}_{j^*}$ by $\mathbf{0}$. Furthermore, considering the penalty terms, we have

$$\begin{aligned} n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})-n \sum _{j=1}^p p_\lambda (\Vert {\varvec{{\theta }}}_j\Vert _{\mathbf{A}_j})=n p_\lambda (\Vert {\varvec{{\theta }}}_{j^*}\Vert _{\mathbf{A}_{j^*}})=n \lambda \Vert {\varvec{{\theta }}}_{j^*}\Vert _{\mathbf{A}_{j^*}}. \end{aligned}$$

(13)

Next, by convexity of the loss function which implies that $\rho _\tau (x)-\rho _\tau (y)\ge (\tau -I\{y\le 0\})(x-y)$, we have

$$\begin{aligned}&\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)\nonumber \\&\quad \ge -\sum _i(\tau -I\{Y_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*\})\mathbf{Z}_{ij^*}^{\mathrm{T}}{\varvec{{\theta }}}_{j^*}\nonumber \\&\quad = -\sum _i(\tau -I\{e_i\le 0\})\mathbf{Z}_{ij^*}^{\mathrm{T}}{\varvec{{\theta }}}_{j^*}-\sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\})\mathbf{Z}_{ij^*}^{\mathrm{T}}{\varvec{{\theta }}}_{j^*},\nonumber \\ \end{aligned}$$

(14)

where $\mathbf{Z}_{ij}=\mathbf{B}(T_i)X_{ij}$. For the first term above, using Bernstein’s inequality, we have $\max _{1\le j\le p,{\varvec{{\theta }}}_{j}\ne \mathbf{0}}\sum _i(\tau -I\{e_i\le 0\})\mathbf{Z}_{ij}{\varvec{{\theta }}}_{j}=O_p(\sqrt{nK\log (pn)}\Vert {\varvec{{\theta }}}_{j}\Vert )$. For the second term above, we show in Lemma 3 that, if c is sufficiently small,

$$\begin{aligned}&\sup _{\Vert {\varvec{{\theta }}}-\widehat{\varvec{{\theta }}}^o\Vert \le c,1\le j\le p}\Big \Vert \sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}\nonumber \\&\qquad -F(0)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i))\mathbf{Z}_{ij^*}\Big \Vert \nonumber \\&\quad = O_p(\sqrt{nr_n}K^{3/4}q^{5/4}\log ^{5/4}(pn)), \end{aligned}$$

(15)

and thus the second term in (14) is $O_p(nKr_n\sqrt{q\log (pn)}\Vert {\varvec{{\theta }}}_{j^*}\Vert )$.

Thus by our assumptions, we have $\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}})-\sum _i\rho _\tau (Y_i-\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*)=-O_p(nKr_n\sqrt{q\log (pn)})\Vert {\varvec{{\theta }}}_{j^*}\Vert $. This is dominated by the difference of the penalties stated in (13), which established (12). $\square $

Lemma 3

$$\begin{aligned}&\sup _{\Vert {\varvec{{\theta }}}^*-\widehat{\varvec{{\theta }}}^o\Vert \le c,1\le j\le p}\Big \Vert \sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}\nonumber \\&\qquad -F(0)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i))\mathbf{Z}_{ij}\Big \Vert \nonumber \\&\quad = O_p(\sqrt{nr_n}K^{3/4}q^{5/4}\log ^{5/4}(pn)), \end{aligned}$$

Proof of Lemma 3

We have

$$\begin{aligned} E\left[ \left| X_{ij}B_k(T_i) (I\{e_{ij}\le t_n+\delta _n\}-I\{e_{ij}\le t_n\})\right| ^q\right] \le Cq!(CK)^{q/2}\delta _n, \quad q=1,2,\ldots . \end{aligned}$$

Application of Bernstein’s inequality yields that for $t_n\rightarrow 0$, $\delta _n\rightarrow 0$, some $j\in \{1,\ldots ,p\}$, $k\in \{1,\ldots ,K\}$, for any $u>0$,

$$\begin{aligned}&\Pr \left\{ \Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le t_n+\delta _n\}-I\{e_{i}\le t_n\}\right. \nonumber \\&\quad \left. -F(t_n+\delta _n)+F(t_n)\Big |>u/\sqrt{n}\right\} \nonumber \\&\quad \le C\exp \left\{ -C\frac{u^2}{u\sqrt{K/n}+K\delta _n}\right\} . \end{aligned}$$

(16)

Let $A=\{ {\varvec{{\theta }}}:\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}_0\Vert \le t_n, {\varvec{{\theta }}}_j=\mathbf{0} \text{ for } j>q \}$. Next we derive a bound for

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p}\Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}\\&\qquad -I\{e_{i}\le 0\} -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]\Big |. \end{aligned}$$

We construct a $n^{-c}$ covering of A with size $R=O(n^{CqK})$, with elements denoted by $\{{\varvec{{\theta }}}^{(1)},\ldots ,{\varvec{{\theta }}}^{(R)}\}$. Then we write

$$\begin{aligned}&\Pr \Big \{\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p}\Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\& -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]\Big |>u/\sqrt{n}\Big \} \\\le & {} \Pr \Big \{\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\& -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]>u/\sqrt{n}\Big \} \\&+\Pr \Big \{\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p} -\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\& -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]>u/\sqrt{n}\Big \} . \end{aligned}$$

Since the two terms above are similar, we only consider the first term in the following. Furthermore, since we can consider the positive part and the negative part of $X_{ij}B_k(T_i)$ separately and in a similar way, we can assume without loss of generality that $X_{ij}B_k(T_i)\ge 0$ and further restrict to the event $\Omega $ so that $X_{ij}B_k(T_i)\le \sqrt{K\log (n)}, j\le q$. Then we have

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)] \\&\quad \le \max _{1\le r\le R,k,j}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i\}-I\{e_{i}\le 0\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)+F(0)] \\&\qquad +\sup _{1\le r\le R, {\varvec{{\theta }}}\in A,\Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c},k,j} \frac{1}{n}\sum _{i}X_{ij}B_k(T_i))[I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}\\&\qquad -I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i) ]\\&\quad =: I_{1}+I_{2}. \end{aligned}$$

By (16), using the union bound, we have

$$\begin{aligned}&\Pr \Big (\max _{1\le r\le R,k,j}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i\}-I\{e_{i}\le 0\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)+F(0)]>u/\sqrt{n}\Big )\\&\quad \le CKpR\exp \left\{ -C\frac{u^2}{u \sqrt{K/n}+Kb_n}\right\} , \end{aligned}$$

where $b_n=C(\sqrt{qK\log n}t_n+\sqrt{q}K^{-d})$ is an upper bound for $|\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i|$. Taking $u\asymp (\sqrt{\frac{K}{n}}+\sqrt{Kb_n})Kq\log (pn)$, we see that

$$\begin{aligned} I_{1}=O_p\left( \sqrt{\frac{1}{n}}(\sqrt{\frac{K}{n}}+\sqrt{Kb_n})Kq\log (pn)\right) . \end{aligned}$$

For $I_{2}$, using the monotonicity of the indicator function and that for ${\varvec{{\theta }}}\in A, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c}$, $|\mathbf{Z}_i^{\mathrm{T}}({\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)})|\le t_n':=C\sqrt{qK\log (n)}n^{-c}$, we have

$$\begin{aligned}&I_2 \le \sup _{{\varvec{{\theta }}}\in A,1\le r\le R, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c},k,j}\frac{1}{n}\sum _{i}X_{ij}B_k(T_i) \Big [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i+ t_n'\}\\&\qquad -I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i) \Big ]\\&\quad \le \sup _{1\le r\le R, k,j}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) \Big [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i +t_n'\}-I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i)\}\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i+t_n')+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i) \Big ]\\&\qquad +\sup _{1\le r\le R, {\varvec{{\theta }}}\in A, \Vert {\varvec{{\theta }}}-{\varvec{{\theta }}}^{(r)}\Vert \le n^{-c},k,j}\frac{1}{n}\sum _{i}X_{ij}B_k(T_i)\left( F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^{(r)}-m_i+t_n')\right. \\&\qquad \left. -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i) \right) \\&\quad = I_{21}+I_{22}. \end{aligned}$$

Again by (16) for $I_{21}$ with union bound, and Taylor’s expansion for $I_{22}$, we obtain

$$\begin{aligned} I_{2}=O_p\left( \sqrt{\frac{1}{n}}(\sqrt{\frac{K}{n}})qK\log (pn)\right) . \end{aligned}$$

Combining the bounds for $I_1$ and $I_{2}$ we get

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}\in A,1\le k\le K, 1\le j\le p}\Big |\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}-m_i)+F(0)]\Big |\\&\quad = O_p((\sqrt{K}/n+\sqrt{Kb_n/n})Kq\log (pn)), \end{aligned}$$

when c is chosen to be large enough.

Now, using $|\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o|\le t_n'':= C\sqrt{pK\log (pn)}c$ (note c can be chosen to be arbitrarily small), we can bound (as before we can assume $X_{ij}B_k(T_i)\ge 0$ and also get rid of the absolute value around the sum without loss of generality)

$$\begin{aligned}&\sup _{{\varvec{{\theta }}}^*,1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i)+F(0)]\Big |\\&\quad \le \sup _{1\le k\le K, 1\le j\le p} \frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n''\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i)+F(0)] \\&\quad \le \sup _{1\le k\le K, 1\le j\le p}\frac{1}{n}\sum _{i} X_{ij}B_k(T_i) [I\{e_{i}\le \mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n''\}-I\{e_{i}\le 0\} \\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n'')+F(0)] \\&\qquad +\sup _{1\le k\le K, 1\le j\le p}\frac{1}{n}\sum _{i}X_{ij}B_k(T_i)[F(\mathbf{Z}_i^{\mathrm{T}}\widehat{\varvec{{\theta }}}^o-m_i+t_n'')\\&\qquad -F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i)]. \end{aligned}$$

The second sum above can be arbitrarily small since $t_n''$ and $\Vert {\varvec{{\theta }}}^*-\widehat{\varvec{{\theta }}}^o\Vert $ are. Noting $\widehat{\varvec{{\theta }}}^o\in A$ when $t_n=Cr_n$, exactly the same arguments used in the bound for $I_1$ and $I_2$ gives the above is $O_p((\sqrt{K}/n+\sqrt{K\sqrt{qK\log n}r_n/n})Kq\log (pn))$. Thus

$$\begin{aligned}&\Big \Vert \max _j\sum _i(I\{e_i\le 0\}-I\{e_i\le \mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i\}\nonumber \\&\qquad -F(0)+F(\mathbf{Z}_i^{\mathrm{T}}{\varvec{{\theta }}}^*-m_i))\mathbf{Z}_{ij^*}\Big \Vert \nonumber \\&\quad = O_p((\sqrt{K}+\sqrt{nK\sqrt{qK\log n}r_n})K^{3/2}q\log (pn)). \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, W., Li, R. & Lian, H. High-dimensional quantile varying-coefficient models with dimension reduction. Metrika 85, 1–19 (2022). https://doi.org/10.1007/s00184-021-00814-5

Download citation

Received: 03 January 2020
Accepted: 17 March 2021
Published: 29 June 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00184-021-00814-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional quantile varying-coefficient models with dimension reduction

Abstract

Access this article

Similar content being viewed by others

High-dimensional local polynomial regression with variable selection and dimension reduction

Central quantile subspace

Testing for additivity in nonparametric quantile regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Lemma 2

Proof of Lemma 2

Proof of Theorem 2

Lemma 3

Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-dimensional quantile varying-coefficient models with dimension reduction

Abstract

Access this article

Similar content being viewed by others

High-dimensional local polynomial regression with variable selection and dimension reduction

Central quantile subspace

Testing for additivity in nonparametric quantile regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Lemma 2

Proof of Lemma 2

Proof of Theorem 2

Lemma 3

Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation