Abstract
Varying coefficient model is often used in statistical modeling since it is more flexible than the parametric model. However, model detection and variable selection of varying coefficient model are poorly understood in mode regression. Existing methods in the literature for these problems are often based on mean regression and quantile regression. In this paper, we propose a novel method to solve these problems for mode varying coefficient model based on the B-spline approximation and SCAD penalty. Moreover, we present a new algorithm to estimate the parameters of interest, and discuss the parameters selection for the tuning parameters and bandwidth. We also establish the asymptotic properties of estimated coefficients under some regular conditions. Finally, we illustrate the proposed method by some simulation studies and an empirical example.
Similar content being viewed by others
Availability of data and materials
The datasets generated during and analysed during the current study are available in the R package “mlbench”.
References
Cai ZW, Fan JQ, Li RZ (2000) Efficient estimation and inferences for varying-coefficient models. J Am Stat Assoc 95:888–902
Fan JQ, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11:1031–1057
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Härdle W, Liang H, Gao J (2000) Partially linear models. Springer, Berlin, Heidelberg
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London
Hastie T, Tibshirani R (1993) Varying-coefficient model. J R Stat Soc Ser B-Stat Methodol 55:757–796
Huang JHZ, Wu CO, Zhou L (2002) Varying-coefficient models and basis function approximation for the analysis of repeated measurements. Biometrika 89:111–128
Hu T, Xia YC (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599
Lee M (1989) Mode regression. J Econom 42:337–349
Li Q, Racine JS (2010) Smooth varying-coefficient estimation and inference for qualitative and quantitative data. Economet Theory 26:1607–1637
Ma XJ, Zhang JX (2016) A new variable selection approach for varying coefficient models. Metrika 79:59–72
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
Stone C (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040–1053
Tang YL, Wang HJ, Zhu ZY et al (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B-Stat Methodol 58:267–288
Wang HS, Xia YC (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Xia YC, Li WK (1999) On the estimation and testing of functional-coefficient linearmodels. Stat Sin 9:735–757
Yao WX, Lindsay BG, Li RZ (2012) Local modal regression. J Nonparametr Stat 24:647–663
Yao WX, Li LH (2013) A new regression model: modal linear regression. Scand J Stat 41:656–671
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B-Stat Methodol 68:49–67
Zhao PX, Xue LG (2009) Variable selection for semiparametric varying coefficient partially linear models. Stat Probabil Lett 79:2148–2157
Zhao WH, Zhang RQ, Liu JC et al (2014) Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Ann Inst Stat Math 66:165–191
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang RQ, Zhao WH, Liu JC (2013) Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. J Nonparametr Stat 25:523–544
Zhu HT, Li RZ, Kong LL (2003) Multivariate varying coefficent model for functional responses. Biometrics 59:263–273
Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
The authors thank the editor and two referees for their constructive suggestions. The research was supported by the Natural Science Foundation of Jiangsu Province (Grant No. BK20200854) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 20KJB110016).
Funding
The Natural Science Foundation of Jiangsu Province (Grant No. BK20200854) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 20KJB110016).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Code availability
The authors used R software and had custom code.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma A.1. Suppose conditions (C1)-(C4) hold, \(\varvec{\gamma }^{best}\) is the best approximation of \(\varvec{\gamma }\), and \(\epsilon _1\), \(a_1\), \(a_2\) are positive constants, such that
-
(1)
\(\vert \vert \varvec{\gamma }_{j*}^{best} \vert \vert _{L_2} >\epsilon _1,j=1,\ldots ,v\), \(\varvec{\gamma }_j^{best}=(\gamma _{j1},\varvec{0}_{q-1}^{\top })^{\top },j=v+1,\ldots ,s\), \(\varvec{\gamma }_j^{best}=\varvec{0},j=s+1,\ldots ,p\).
-
(2)
\(\sup \vert \alpha _j(U) - \varvec{B}(U)^{\top } \varvec{\gamma }_j^{best} \vert \le a_1 k_n^{-t}, j=1,\ldots ,v\)
-
(3)
\(\sup \vert \varvec{\Pi }^{\top } \varvec{\gamma }^{best} -\varvec{X}\varvec{\alpha }(U) \vert \le a_2k_n^{-t}\)
Lemma A.2. Let \(\delta =O\,(n^{-t/(2t+1)})\). Define \(\varvec{\gamma }=\varvec{\gamma }^{best}+\delta \varvec{v}\), given \(\rho >0\), there exists a large C such that ( Zhao et al. (2014))
Proof 1. Proof of Theorem 1 Similar to the proof of Theorem 1 in Zhao et al. (2014), let
As \(b_n \rightarrow 0\), we have \(\vert \vert \hat{\varvec{\gamma }} -\varvec{\gamma }^{best} \vert \vert = O_p(n^{-t/(2t+1)} + a_n)\) by the Lemma A.2.. Therefore
Because \(\vert \vert \int _{0}^{1}{\varvec{B}(U) \varvec{B}(U)^{\top }} dU \vert \vert =O(1)\), we have
And according to the Lemma A.1., we have
Consequently, the proof has been completed. \(\square\)
Proof 2. Proof of Theorem 2 By the property of SCAD, we know that \(\max \{\lambda _{1j},\lambda _{2j}\} \rightarrow 0\) as \(n \rightarrow \infty\), then \(a_n=0\), then by Theorem 1, we have \(\vert \vert \varvec{\gamma } -\varvec{\gamma }^{best} \vert \vert =O_p(n^{-t/2t+1})\)
Firstly, if \(\varvec{\gamma }_{j*}=0\), it is clear that \(\alpha _j(U)\) is a constant. If \(\varvec{\gamma }_{j*} \ne 0\), we have
where \(\eta _i\) is between \(Y_i-\varvec{\Pi }_i^{\top } \varvec{\gamma }\) and \(\epsilon _i + D_{ni}\), \(\epsilon _i=Y_i-\varvec{X}_i^{\top } \varvec{\alpha }(U_i)\), \(D_{ni}= \varvec{X}_i^{\top } \varvec{\alpha }(U_i)- \varvec{\Pi }_i^{\top }\varvec{\gamma }^{best}\).
As we all know, \(\sup _{u} \vert \vert \bar{\varvec{B}}(U) \vert \vert =O(1)\), and by condition (C4) \(n^{t/(2t+1)}\min \{\lambda _{1j},\lambda _{2j}\} \rightarrow \infty\) and \({\lim \inf }_{n \rightarrow \infty } {\lim \inf }_{\vert \vert \varvec{\gamma }_{j*} \vert \vert _{L_2} \rightarrow 0^{+}} \frac{p_{\lambda _{1j}}^{'}(\vert \vert \varvec{\gamma }_{j*} \vert \vert _{L_2})}{\lambda _{1j}} >0 ,j=s+1, \ldots ,p\), we prove that the sign of the derivation is completely determined by the second part of the derivation. Hence, with the Lemma A.2. \(l_1(\varvec{\gamma })\) gets its minimizer at \(\hat{\varvec{\gamma }}_{j*}^{VC} =0\), and \({\hat{\alpha }}_j(U)\approx {\hat{\gamma }}_{j1}^{VC} + \bar{\varvec{B}}^{\top }(U) \hat{\varvec{\gamma }}_{j*}^{VC} = {\hat{\gamma }}_{j1}^{VC}\), i.e. \({\hat{\alpha }}_j(U), j=v+1,\ldots ,p\) are constants.
Secondly, since we have proved \({\hat{\alpha }}_j,j=v+1,\ldots ,p\) are constant, we only need to prove \({\hat{\gamma }}_{j1}^{CZ}=0\) to obtain \({\hat{\alpha }}_j=0,\) for \(j=s+1,\ldots ,p\).
where \(\eta _i\) is between \(Y_i-\varvec{\Pi }_i^{\top } \varvec{\gamma }\) and \(\epsilon _i + D_{ni}\), \(\epsilon _i=Y_i-\varvec{X}_i^{\top } \varvec{\alpha }(U_i)\), \(D_{ni}= \varvec{X}_i^{\top } \varvec{\alpha }(U_i)- \varvec{\Pi }_i^{\top }\varvec{\gamma }^{best}\).
By condition (C4) \(n^{t/(2t+1)}\min \{\lambda _{1j},\lambda _{2j}\} \rightarrow \infty\) and \({\lim \inf }_{n \rightarrow \infty } {\lim \inf }_{\gamma _{j1} \rightarrow 0^{+}} \frac{p_{\lambda _{2j}}^{'}(|\alpha _{j1}|)}{\lambda _{2j}} >0 ,j=s+1, \ldots ,p\), we prove
where \(\delta =O(n^{-t/(2t+1)})\). Hence, \(l_2(\varvec{\gamma })\) gets its minimizer at \({\hat{\gamma }}_{j1}^{CZ} =0\), i.e. \({\hat{\alpha }}_j=0,j=s+1,\ldots ,p\). \(\square\)
Proof 3. Proof of Theorem 3 Let \(\hat{\varvec{\gamma }}=(\hat{\varvec{\gamma }}_{v}^{\top }, \hat{\varvec{\gamma }}_{s-v}^{\top },0)^{\top }\), where \(\hat{\varvec{\gamma }}_{v}^{\top }=(\hat{\varvec{\gamma }}_0, \ldots , \hat{\varvec{\gamma }}_{v})^{\top }, \hat{\varvec{\gamma }}_{s-v}^{\top }=({\hat{\gamma }}_{v+1,1},\ldots , {\hat{\gamma }}_{s,1})\), and for simplicity of notations, let \(\hat{\varvec{\gamma }}_{s-v}^{\top }=({\hat{\gamma }}_{v+1},\ldots , {\hat{\gamma }}_{s})\). From Theorems 1 and 2, we know that as \(n \rightarrow \infty\), with probability tending to 1, \(l(\varvec{\gamma })\) attain the minimal value at \(\hat{\varvec{\gamma }}\), let \(L_1=\frac{\partial l(\varvec{\gamma })}{\partial \varvec{\gamma }_{s-v}}\), \(L_2=\frac{\partial l(\varvec{\gamma })}{\partial \varvec{\gamma }_{v}}\), we know that
and
where “\({\circ }\)” denotes the Hadamard (componentwise) product and the k-th component of \(p^{'}_{\lambda _{2}}(\vert \hat{\varvec{\gamma }}_{s-v} \vert )\) is \(p^{'}_{\lambda _{2k}}(\vert {\hat{\gamma }}_{k} \vert )\), \(v+1 \le k \le s\). The j-th block subvector of \(\varvec{\kappa }\) is \(p^{'}_{\lambda _{1j}}(\vert \vert \varvec{\gamma }_{j*}\vert \vert _{L_2})\)
\(\Big (\frac{p^{'}_{\lambda _{2j}}(\vert {\hat{\gamma }}_{j} \vert )}{p^{'}_{\lambda _{1j}}(\vert \vert \varvec{\gamma }_{j*}\vert \vert _{L_2})}sign(\gamma _{j,1}), sign(\gamma _{j,2})\vert \gamma _{j,2}\vert ,\ldots ,sign(\gamma _{j,q})\vert \gamma _{j,q} \vert \Big )^{\top }\), \(0 \le\, j \,\le v\). Applying the Taylor expansion to \(p^{'}_{\lambda _{2k}}(\vert {\hat{\gamma }}_{k} \vert )\),
Note that \(p^{'}_{\lambda _{2k}}(\vert {\hat{\gamma }}_{k}^{best} \vert )=0\) as \(\lambda _{max} \rightarrow 0\) and \(b_n \rightarrow 0\), as for (8) we have
where \(\varvec{R}(U_i)=(R_0(U_i),\ldots , R_v(U_i))^{\top }\), \(R_j(U_i)=\alpha _j(U_i) - \varvec{B}(U_i)^{\top } \varvec{\gamma }_j^{best}, 0\le j \le v\), and \(\xi _i\) between \(\epsilon _i\) and \(Y_i-\varvec{\Pi }_i^{\top } \hat{\varvec{\gamma }}\).
From Theorem 3 in Zhao et al. (2014), we know that
where \(\Psi =E(K^{''}_h(\epsilon )\varvec{\Pi }_{v}\varvec{X}_{c}^{\top })\), \(\Phi =E(K^{''}_h(\epsilon )\varvec{\Pi }_{v}\varvec{\Pi }_{v}^{\top })\) and \(\Gamma _n=\frac{1}{n}\sum _{i=1}^{n}\varvec{\Pi }_{vi}\{K_h^{'}(\epsilon _i) + K_h^{''}(\epsilon _i)\varvec{X}_{vi}^{\top }\varvec{R}(U_i)\}\).
Substituing (11) into (10), we obtain
Note that
Hence
From Theorem 3 in Zhao et al. (2014), we know that \(J_2=o_p(1)\), and
where \(\Delta =E(G(\varvec{x},u,h)\tilde{\varvec{X}}_{c} \tilde{\varvec{X}}_{c}^{\top })\). Then combining (13) and (14) and using the Slutsky’s theorem, it follows that
It’s clearly that \(\hat{\varvec{\alpha }}_c - \varvec{\alpha }_c= \hat{\varvec{\gamma }}_{s-v}- \varvec{\gamma }_{s-v}\), so we complete the proof of Theorem 3. And the proof of (14) can be found in Zhao et al. (2014).
Rights and permissions
About this article
Cite this article
Ma, X., Du, Y. & Wang, J. Model detection and variable selection for mode varying coefficient model. Stat Methods Appl 31, 321–341 (2022). https://doi.org/10.1007/s10260-021-00576-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-021-00576-4