Skip to main content

Advertisement

Log in

Sparse estimations in kink regression model

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

When modeling the kink regression model, it is possible to have an excessive number of explanatory variables and their corresponding coefficients, thereby leading to the over-parameterization and multicollinearity problems. Motivated by these problems, five sparse estimation methods, namely LASSO, sparse Ridge, SCAD, MCP, and Bridge, are considered to perform simultaneous variable selection and parameter estimation, as alternatives to the Ordinary Least Squares (OLS), in the kink regression model. To compare the performance of these sparse estimators, both simulation and real data applications are proposed. According to the simulation results, we demonstrate the superior performance of sparse estimations in terms of selection accuracy and prediction by comparing them to the non-sparse estimations. However, it is not apparent which sparse estimations are more appropriate for estimating the kink regression. However, in an application study, the comparison result indicates that the SCAD penalty would be a preferable penalty function for the application of kink regression to the life expectancy data as the lowest EBIC and the highest \({\text{Adj - }}R^{2}\) are obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ahrens A, Hansen CB, Schaffer ME (2019) lassopack: Model selection and prediction with regularized regression in Stata. arXiv preprint arXiv:1901.05397

  • Bertsimas D, Van Parys B (2020) Sparse high-dimensional regression: exact scalable algorithms and phase transitions. Ann Stat 48(1):300–323

    Article  MathSciNet  Google Scholar 

  • Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media. Springer-Verlag Berlin Heidelberg, Berlin

  • Card D, Lee DS, Pei Z, Weber A (2015) Inference on causal effects in a generalized regression kink design. Econometrica 83(6):2453–2483

    Article  MathSciNet  Google Scholar 

  • Cervantes F, Usevitch B, Valera L, Kreinovich V (2018) Why sparse? Fuzzy techniques explain empirical efficiency of sparsity-based data-and image-processing algorithms. In: Zadeh L, Yager R, Shahbazova S, Reformat M, Kreinovich V (eds) Recent developments and the new direction in soft-computing foundations and applications. Studies in Fuzziness and Soft Computing, vol 361. Springer, Cham, pp 419–428. https://doi.org/10.1007/978-3-319-75408-6

  • Chalise P, Fridley BL (2012) Comparison of penalty functions for sparse canonical correlation analysis. Comput Stat Data Anal 56(2):245–254

    Article  MathSciNet  Google Scholar 

  • Cilluffo G, Sottile G, La Grutta S, Muggeo VM (2020) The Induced Smoothed lasso: a practical framework for hypothesis testing in high dimensional regression. Stat Methods Med Res 29(3):765–777

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Farrar DE, Glauber RR (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49:92–107

    Article  Google Scholar 

  • Fokianos K (2008) Comparing two samples by penalized logistic regression. Electron J Stat 2:564–580

    Article  MathSciNet  Google Scholar 

  • Fong Y, Huang Y, Gilbert PB, Permar SR (2017) chngpt: threshold regression model estimation and inference. BMC Bioinform 18(1):454

    Article  Google Scholar 

  • Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148

    Article  Google Scholar 

  • Froymson MA (1960) Multiple regression analysis. In: Ralston A, Wilf HS (eds) Mathematical methods for digital computers. Wiley, New York

    Google Scholar 

  • Fu WJ (1998) Penalized regressions: the bridge versus the Lasso. J Comput Graph Stat 7(3):397–416

    MathSciNet  Google Scholar 

  • Hansen BE (2017) Regression kink with an unknown threshold. J Bus Econ Stat 35(2):228–240

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85

    Google Scholar 

  • Hebiri M, Van De Geer S (2011) The Smooth-Lasso and other ℓ1+ ℓ2-penalized methods. Electron J Stat 5:1184–1226

    Article  MathSciNet  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  • Huang A, Liu D (2016) EBglmnet: a comprehensive R package for sparse generalized linear regression models. Bioinformatics: btw143. https://doi.org/https://doi.org/10.1093/bioinformatics/btw143 (advance online publication)

  • Kim Y, Choi YK, Emery S (2013) Logistic regression with multiple random effects: a simulation study of estimation methods and statistical packages. Am Stat 67(3):171–182

    Article  Google Scholar 

  • Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic, vol 4. Prentice hall, Upper Saddle River, New Jersey

    MATH  Google Scholar 

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  Google Scholar 

  • Lee S (2015) An additive sparse penalty for variable selection in high-dimensional linear regression model. Commun Stat Appl Methods 22(2):147–157

    Google Scholar 

  • Lien D, Hu Y, Liu L (2017) Subjective well-being and income: a re-examination of satiation using the regression kink model with an unknown threshold. J Appl Economet 32(2):463–469

    Article  MathSciNet  Google Scholar 

  • Maneejuk P, Yamaka W (2020) Significance test for linear regression: how to test without P-values? J Appl Stat 48(5):827–845

    Article  MathSciNet  Google Scholar 

  • Maneejuk P, Pastpipatkul P, Sriboonchitta S (2016) Economic growth and income inequality: evidence from Thailand. In: Huynh VN, Inuiguchi M, Le B, Le B, Denoeux T (eds) Integrated uncertainty in knowledge modelling and decision making. IUKM 2016. Lecture Notes in Computer Science, vol 9978. Springer, Cham, pp 649–663. https://doi.org/10.1007/978-3-319-49046-5

  • Sriboochitta S, Yamaka W, Maneejuk P, Pastpipatkul P (2017) A generalized information theoretical approach to nonlinear time series model. In: Kreinovich V, Sriboonchitta S, Huynh VN (eds) Robustness in econometrics. Studies in Computational Intelligence, vol 692. Springer, Cham, pp 333–348. https://doi.org/10.1007/978-3-319-50742-2

  • Stone M (1977) An Asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J Roy Stat Soc Ser B 39(1):44–47

    MathSciNet  MATH  Google Scholar 

  • Tateishi S, Matsui H, Konishi S (2010) Nonlinear regression modeling via the lasso-type regularization. J Stat Plan Infer 140(5):1125–1134

    Article  MathSciNet  Google Scholar 

  • Tibprasorn P, Maneejuk P, Sriboochitta S (2017) Generalized information theoretical approach to panel regression kink model. Thai J Math 133–145

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Roy Stat Soc Ser B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Wang T, Zhu L (2011) Consistent tuning parameter selection in high dimensional sparse linear regression. J Multivar Anal 102(7):1141–1151

    Article  MathSciNet  Google Scholar 

  • Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70(2):129–133

    Article  MathSciNet  Google Scholar 

  • Yamaka W (2021) Variable selection and estimation in kink regression model. In: Ngoc Thach N, Kreinovich V, Trung ND (eds) Data science for financial econometrics. Studies in Computational Intelligence, vol 898. Springer, Cham, pp 151–164. https://doi.org/10.1007/978-3-030-48853-6

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  Google Scholar 

  • Zhang Y, Zhou Q, Jiang L (2017) Panel kink regression with an unknown threshold. Econ Lett 157:116–121

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author is grateful to two reviewers for several helpful suggestions and discussions. Thanks also go to Dr. Laxmi Worachai for her helpful comments.

Funding

This study is funded by the Center of Excellence in Econometrics, Faculty of Economics, Chiang Mai University (Grant number: R000023389).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Woraphon Yamaka.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any authors.

Additional information

Communicated by Vladik Kreinovich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yamaka, W. Sparse estimations in kink regression model. Soft Comput 25, 7825–7838 (2021). https://doi.org/10.1007/s00500-021-05797-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05797-z

Keywords

Navigation