Sparse regression over clusters: SparClur

Bertsimas, Dimitris; Dunn, Jack; Kapelevich, Lea; Zhang, Rebecca

doi:10.1007/s11590-021-01770-9

Sparse regression over clusters: SparClur

Short Communication
Published: 08 July 2021

Volume 16, pages 433–448, (2022)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Dimitris Bertsimas ORCID: orcid.org/0000-0002-1985-1003¹,
Jack Dunn²,
Lea Kapelevich² &
…
Rebecca Zhang²

329 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Prediction tasks in personalized medicine require models that combine accuracy and interpretability. We propose an integer optimization approach for building sparse regression models with enforced coordination, using data partitioned among leaves in a prediction tree. We show that the method recovers the true underlying relationship between observations and target variables in large-scale synthetic data in seconds. We apply our method to several real-world medical prediction problems and observe that the additional structure imposed provides a substantial gain in interpretability, at a low cost to accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

All synthetic datasets and all publicly available datasets are available to interested readers. Medical data are protected under privacy rules and are not available.

References

Benjamin, E.J., Levy, D., Vaziri, S.M., D’agostino, R.B., Belanger, A.J., Wolf, P.A.: Independent risk factors for atrial fibrillation in a population-based cohort: the Framingham Heart Study. JAMA 271(11), 840–844 (1994)
Article Google Scholar
Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270, 931–942 (2018)
Article MathSciNet Google Scholar
Bertsimas, D., Dunn, J.: Machine Learning Under a Modern Optimization Lens. Dynamic Ideas (2019)
Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: exact scalable algorithms and phase transitions. Ann. Stat. 48(1), 300–323 (2020)
Article MathSciNet Google Scholar
Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Article MathSciNet Google Scholar
Bertsimas, D., Kallus, N., Weinstein, A.M., Zhuo, Y.D.: Personalized diabetes management using electronic medical records. Diabetes Care 40(2), 210–217 (2017)
Article Google Scholar
Bertsimas, D., Pauphilet, J., Van Parys, B.: Sparse classification and phase transitions: a discrete optimization perspective (2017). arXiv preprint arXiv:1710.01352
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)
Article MathSciNet Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press (1984)
Dunn, J.: Optimal trees for prediction and prescription. PhD thesis, Massachusetts Institute of Technology (2018)
Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Article MathSciNet Google Scholar
Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36(3), 307–339 (1986)
Article MathSciNet Google Scholar
Kaggle: House sales in King County, USA. https://www.kaggle.com/harlfoxem/housesalesprediction. Accessed: 2020-12-05 (2016)
Kapelevich, L., Zhang, R.: Sparclur/Sparclur.jl: v0.1 (2020). https://doi.org/10.5281/zenodo.4308537
Kornblith, S., Contributors: GLMNet.jl: Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet (2013). https://github.com/JuliaStats/GLMNet.jl
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39, 195–198 (1943)
MathSciNet Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374.

Author information

Authors and Affiliations

Sloan School of Management, MIT, Cambridge, USA
Dimitris Bertsimas
Operations Research Center, MIT, Cambridge, USA
Jack Dunn, Lea Kapelevich & Rebecca Zhang

Authors

Dimitris Bertsimas
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dunn
View author publications
You can also search for this author in PubMed Google Scholar
Lea Kapelevich
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris Bertsimas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertsimas, D., Dunn, J., Kapelevich, L. et al. Sparse regression over clusters: SparClur. Optim Lett 16, 433–448 (2022). https://doi.org/10.1007/s11590-021-01770-9

Download citation

Received: 28 December 2020
Accepted: 11 June 2021
Published: 08 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11590-021-01770-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse regression over clusters: SparClur

Abstract

Access this article

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation