Abstract
When modeling the kink regression model, it is possible to have an excessive number of explanatory variables and their corresponding coefficients, thereby leading to the over-parameterization and multicollinearity problems. Motivated by these problems, five sparse estimation methods, namely LASSO, sparse Ridge, SCAD, MCP, and Bridge, are considered to perform simultaneous variable selection and parameter estimation, as alternatives to the Ordinary Least Squares (OLS), in the kink regression model. To compare the performance of these sparse estimators, both simulation and real data applications are proposed. According to the simulation results, we demonstrate the superior performance of sparse estimations in terms of selection accuracy and prediction by comparing them to the non-sparse estimations. However, it is not apparent which sparse estimations are more appropriate for estimating the kink regression. However, in an application study, the comparison result indicates that the SCAD penalty would be a preferable penalty function for the application of kink regression to the life expectancy data as the lowest EBIC and the highest \({\text{Adj - }}R^{2}\) are obtained.
Similar content being viewed by others
References
Ahrens A, Hansen CB, Schaffer ME (2019) lassopack: Model selection and prediction with regularized regression in Stata. arXiv preprint arXiv:1901.05397
Bertsimas D, Van Parys B (2020) Sparse high-dimensional regression: exact scalable algorithms and phase transitions. Ann Stat 48(1):300–323
Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media. Springer-Verlag Berlin Heidelberg, Berlin
Card D, Lee DS, Pei Z, Weber A (2015) Inference on causal effects in a generalized regression kink design. Econometrica 83(6):2453–2483
Cervantes F, Usevitch B, Valera L, Kreinovich V (2018) Why sparse? Fuzzy techniques explain empirical efficiency of sparsity-based data-and image-processing algorithms. In: Zadeh L, Yager R, Shahbazova S, Reformat M, Kreinovich V (eds) Recent developments and the new direction in soft-computing foundations and applications. Studies in Fuzziness and Soft Computing, vol 361. Springer, Cham, pp 419–428. https://doi.org/10.1007/978-3-319-75408-6
Chalise P, Fridley BL (2012) Comparison of penalty functions for sparse canonical correlation analysis. Comput Stat Data Anal 56(2):245–254
Cilluffo G, Sottile G, La Grutta S, Muggeo VM (2020) The Induced Smoothed lasso: a practical framework for hypothesis testing in high dimensional regression. Stat Methods Med Res 29(3):765–777
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Farrar DE, Glauber RR (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49:92–107
Fokianos K (2008) Comparing two samples by penalized logistic regression. Electron J Stat 2:564–580
Fong Y, Huang Y, Gilbert PB, Permar SR (2017) chngpt: threshold regression model estimation and inference. BMC Bioinform 18(1):454
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Froymson MA (1960) Multiple regression analysis. In: Ralston A, Wilf HS (eds) Mathematical methods for digital computers. Wiley, New York
Fu WJ (1998) Penalized regressions: the bridge versus the Lasso. J Comput Graph Stat 7(3):397–416
Hansen BE (2017) Regression kink with an unknown threshold. J Bus Econ Stat 35(2):228–240
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Hebiri M, Van De Geer S (2011) The Smooth-Lasso and other ℓ1+ ℓ2-penalized methods. Electron J Stat 5:1184–1226
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Huang A, Liu D (2016) EBglmnet: a comprehensive R package for sparse generalized linear regression models. Bioinformatics: btw143. https://doi.org/https://doi.org/10.1093/bioinformatics/btw143 (advance online publication)
Kim Y, Choi YK, Emery S (2013) Logistic regression with multiple random effects: a simulation study of estimation methods and statistical packages. Am Stat 67(3):171–182
Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic, vol 4. Prentice hall, Upper Saddle River, New Jersey
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Lee S (2015) An additive sparse penalty for variable selection in high-dimensional linear regression model. Commun Stat Appl Methods 22(2):147–157
Lien D, Hu Y, Liu L (2017) Subjective well-being and income: a re-examination of satiation using the regression kink model with an unknown threshold. J Appl Economet 32(2):463–469
Maneejuk P, Yamaka W (2020) Significance test for linear regression: how to test without P-values? J Appl Stat 48(5):827–845
Maneejuk P, Pastpipatkul P, Sriboonchitta S (2016) Economic growth and income inequality: evidence from Thailand. In: Huynh VN, Inuiguchi M, Le B, Le B, Denoeux T (eds) Integrated uncertainty in knowledge modelling and decision making. IUKM 2016. Lecture Notes in Computer Science, vol 9978. Springer, Cham, pp 649–663. https://doi.org/10.1007/978-3-319-49046-5
Sriboochitta S, Yamaka W, Maneejuk P, Pastpipatkul P (2017) A generalized information theoretical approach to nonlinear time series model. In: Kreinovich V, Sriboonchitta S, Huynh VN (eds) Robustness in econometrics. Studies in Computational Intelligence, vol 692. Springer, Cham, pp 333–348. https://doi.org/10.1007/978-3-319-50742-2
Stone M (1977) An Asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J Roy Stat Soc Ser B 39(1):44–47
Tateishi S, Matsui H, Konishi S (2010) Nonlinear regression modeling via the lasso-type regularization. J Stat Plan Infer 140(5):1125–1134
Tibprasorn P, Maneejuk P, Sriboochitta S (2017) Generalized information theoretical approach to panel regression kink model. Thai J Math 133–145
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Roy Stat Soc Ser B 58(1):267–288
Wang T, Zhu L (2011) Consistent tuning parameter selection in high dimensional sparse linear regression. J Multivar Anal 102(7):1141–1151
Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70(2):129–133
Yamaka W (2021) Variable selection and estimation in kink regression model. In: Ngoc Thach N, Kreinovich V, Trung ND (eds) Data science for financial econometrics. Studies in Computational Intelligence, vol 898. Springer, Cham, pp 151–164. https://doi.org/10.1007/978-3-030-48853-6
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zhang Y, Zhou Q, Jiang L (2017) Panel kink regression with an unknown threshold. Econ Lett 157:116–121
Acknowledgments
The author is grateful to two reviewers for several helpful suggestions and discussions. Thanks also go to Dr. Laxmi Worachai for her helpful comments.
Funding
This study is funded by the Center of Excellence in Econometrics, Faculty of Economics, Chiang Mai University (Grant number: R000023389).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any authors.
Additional information
Communicated by Vladik Kreinovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yamaka, W. Sparse estimations in kink regression model. Soft Comput 25, 7825–7838 (2021). https://doi.org/10.1007/s00500-021-05797-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05797-z