Skip to main content
Log in

Efficient information-based criteria for model selection in quantile regression

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Information-based model selection criteria such as the AIC and BIC employ check loss functions to measure the goodness of fit for quantile regression models. Model selection using a check loss function is robust due to its resistance to outlying observations. In the present study, we suggest modifying the check loss function to achieve a more efficient goodness of fit. Because the cusp of the check loss is quadratically adjusted in the modified version, greater efficiency (or variance reduction) in the model selection is expected. Because we focus on model selection here, we do not modify the model-fitting process. Generalized cross-validation is another common method for choosing smoothing parameters in quantile smoothing splines. We describe how this can be adjusted using the modified check loss to increase efficiency. The proposed generalized cross-validation is designed to reflect the target quantile and sample size. Two real data sets and simulation studies are presented to evaluate its performance using linear and nonlinear quantile regression models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Chen, J., & Chen, Z. (2012). Extended BIC for small-n-large-p sparse GLM. Statistica Sinica, 22, 555–574.

    MathSciNet  MATH  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2), 215–223.

    Article  MathSciNet  Google Scholar 

  • He, X., Ng, P., & Portnoy, S. (1998). Bivariate quantile smoothing splines, Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(3), 537-550.

  • Jung, Y., MacEachern, S. N., & Kim, H. J. (2021). Modified check loss for efficient estimation via model selection in quantile regression. Journal of Applied Statistics, 48(5), 866–886.

    Article  MathSciNet  Google Scholar 

  • Koenker, R. (1994). Quantile smoothing splines. Biometrika, 81(4), 673–680.

    Article  MathSciNet  Google Scholar 

  • Koenker, R. (2005). Quantile regression. Cambridge University Press.

    Book  Google Scholar 

  • Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46(1), 33–50.

    Article  MathSciNet  Google Scholar 

  • Konishi, S., & Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika, 83(4), 875–890.

    Article  MathSciNet  Google Scholar 

  • Lee, Y., MacEachern, S. N., & Jung, Y. (2012). Regularization of case-specific parameters for robustness and efficiency. Statistical Science, 27(3), 350–372.

    Article  MathSciNet  Google Scholar 

  • Muggeo, V. M., Sciandra, M., & Augugliaro, L. (2012). Quantile regression via iterative least squares computations. Journal of Statistical Computation and Simulation, 82(11), 1557–1569.

    Article  MathSciNet  Google Scholar 

  • Nychka, D., Furrer, R., Paige, J., & Sain, S. (2017). fields: Tools for spatial data. R package version 11.6.

  • Nychka, D., Gray, G., Haaland, P., Martin, D., & O’Connell, M. (1995). A nonparametric regression approach to syringe grading for quality improvement. Journal of the American Statistical Association, 90(432), 1171–1178.

    Article  Google Scholar 

  • Ronchetti, E. (1985). Robust model selection in regression. Statistis & Probability Letters, 3(1), 21–23.

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.

    Article  MathSciNet  Google Scholar 

  • Yuan, M. (2006). GACV for quantile smoothing splines. Computational Statistics & Data Analysis, 50(3), 813–829.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Yoonsuh Jung’s work is partially supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (MIST) (No. 2019R1F1A1040515 and No. 2019R1A4A1028134).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoonsuh Jung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (R 3 KB)

Appendix

Appendix

In practice, \(\sigma\) is unknown and it is estimated as \({\hat{\sigma }}=\sum _{i=1}^{n}\rho _q(u_i)/n\). Maximizing the log likelihood with \({\hat{\sigma }}\) is thus equivalent to maximizing

$$\begin{aligned}&\log \left( \prod _{i=1}^{n} \frac{q(1-q)}{{\hat{\sigma }}} \exp \left( -\frac{\rho _q(u_i)}{{\hat{\sigma }}}\right) \right) \\&\quad = n\log (q(1-q))-n\log ({\hat{\sigma }})-\frac{\sum _{i=1}^{n}\rho _q(u_i)}{{\hat{\sigma }}} \\&\quad = -n\log ({\hat{\sigma }})+ n\log (q(1-q))-n \end{aligned}$$

Because the second and the third terms in the last equation are constant, maximizing the log likelihood is equivalent to minimizing \(n\log ({\hat{\sigma }})\). Then, adding the penalty term \(\alpha (n,k)\) yields the information-based criterion in (2).

1.1 Simulations for a rule \(c_{q,n}\)

Although \(c_{q,n}\) in GCV is not explicitly expressed in (5), it was originally set to 1 in Nychka et al. (1995). However, using this value was not satisfactory in our experiments. To find an appropriate value for \(c_{q,n}\) and to provide an empirical rule, we attempt several values for \(c_{q,n}\). Tables 11 and 12 shows the MSE when \(c_{q,n}=1\) under Model 1 and Model 2. Tables 13 and 14 contain the results when \(c_{q,n} = n^{(q - 0.5)^2}\). Finally, the current results (Tables 8 and 9 ) are the results from our rule \(c_{q,n} = n^{|q - 0.5|}\). We chose the current rule because the overall MSE and the reduction in MSE when GCV is replaced by EGCV are the most satisfactory.

Table 11 Mean MSE (standard error in parentheses) using GCV and EGCV from 1000 MC data sets for Model 1 with various sample sizes. MSE is multiplied by \(10^4\)
Table 12 Mean MSE (standard error in parentheses) using GCV and EGCV from 1000 MC data sets for Model 2 with various sample sizes. MSE is multiplied by \(10^4\)
Table 13 Mean MSE (standard error in parentheses) using GCV and EGCV from 1000 MC data sets for Model 1 with various sample sizes. MSE is multiplied by \(10^4\)
Table 14 Mean MSE (standard error in parentheses) using GCV and EGCV from 1000 MC data sets for Model 2 with various sample sizes. MSE is multiplied by \(10^4\)

1.2 Modified check loss in model fitting process

In our work, the modified check loss is used only for the model selection process. However, using it in the model fitting process can improve the overall modeling procedure. Lee et al. (2012)’s work show the theoretical justification and intuitive explanation of the modified check loss in terms of model fitting. Recent work by Jung et al. (2021) presents some theoretical properties of the modified check loss under the framework of cross-validation. In this section, we replace the check loss with the modified check in the model fitting process. The results are summarized in Table 15. Here, GCV and EGCV use check loss for the model fitting, and use the modified check loss for the model selection. \(GCV^{M}\) and \(EGCV^{M}\) employ the modified check loss for both the model fitting and selection. Overall, we see some clear pattern. There is larger reduction when the modified check loss is used in the model fitting process. (\(GCV^{M}\) and \(EGCV^{M}\) are respectively show lower MSE than GCV and EGCV.) When the modified check loss is used the model selection process, we still observe some improvement. (EGCV and \(EGCV^{M}\) are respectively show lower MSE than GCV and \(GCV^{M}\).) A final comparison would be the usage of the modified check loss for only fitting part (\(GCV^{M}\)) and the usage of it for only selection process (EGCV). Employing the modified check loss only in the fitting process show better results. This reflects the general fact that model fitting procedure is more important than the tuning parameter selection. Thus, improving the model fitting produces greater reduction in MSE compared to improving the model selection. However, after improving the model fitting process, there still is room for further improvement. Table 15 clearly shows this by comparing \(GCV^{M}\) and \(EGCV^{M}\). Please note that the main topic of this paper is to show the improvement of the model selection procedure. The improvement in the model fitting part by the modified check loss is extensively shown in Lee et al. (2012). Therefore, we want to focus its role only in the model selection part in this paper. Of course, we know that using the modified check loss for both the model fitting and model selection produce the best results. Finally the difference between all four methods is gradually reducing as sample size increase because the modification is designed to disappear as n increases.

Table 15 Mean MSE (standard error in parentheses) using GCV and EGCV from 1000 MC data sets for Model 1 with various sample sizes and fixed quantile at \(q = 0.5\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shin, W., Kim, M. & Jung, Y. Efficient information-based criteria for model selection in quantile regression. J. Korean Stat. Soc. 51, 245–281 (2022). https://doi.org/10.1007/s42952-021-00137-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-021-00137-1

Keywords

Navigation