Skip to main content
Log in

Sparse regression over clusters: SparClur

  • Short Communication
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

Prediction tasks in personalized medicine require models that combine accuracy and interpretability. We propose an integer optimization approach for building sparse regression models with enforced coordination, using data partitioned among leaves in a prediction tree. We show that the method recovers the true underlying relationship between observations and target variables in large-scale synthetic data in seconds. We apply our method to several real-world medical prediction problems and observe that the additional structure imposed provides a substantial gain in interpretability, at a low cost to accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Data availability

All synthetic datasets and all publicly available datasets are available to interested readers. Medical data are protected under privacy rules and are not available.

References

  1. Benjamin, E.J., Levy, D., Vaziri, S.M., D’agostino, R.B., Belanger, A.J., Wolf, P.A.: Independent risk factors for atrial fibrillation in a population-based cohort: the Framingham Heart Study. JAMA 271(11), 840–844 (1994)

    Article  Google Scholar 

  2. Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270, 931–942 (2018)

    Article  MathSciNet  Google Scholar 

  3. Bertsimas, D., Dunn, J.: Machine Learning Under a Modern Optimization Lens. Dynamic Ideas (2019)

  4. Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: exact scalable algorithms and phase transitions. Ann. Stat. 48(1), 300–323 (2020)

    Article  MathSciNet  Google Scholar 

  5. Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)

    Article  MathSciNet  Google Scholar 

  6. Bertsimas, D., Kallus, N., Weinstein, A.M., Zhuo, Y.D.: Personalized diabetes management using electronic medical records. Diabetes Care 40(2), 210–217 (2017)

    Article  Google Scholar 

  7. Bertsimas, D., Pauphilet, J., Van Parys, B.: Sparse classification and phase transitions: a discrete optimization perspective (2017). arXiv preprint arXiv:1710.01352

  8. Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)

    Article  MathSciNet  Google Scholar 

  9. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press (1984)

  10. Dunn, J.: Optimal trees for prediction and prescription. PhD thesis, Massachusetts Institute of Technology (2018)

  11. Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)

    Article  MathSciNet  Google Scholar 

  12. Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36(3), 307–339 (1986)

    Article  MathSciNet  Google Scholar 

  13. Kaggle: House sales in King County, USA. https://www.kaggle.com/harlfoxem/housesalesprediction. Accessed: 2020-12-05 (2016)

  14. Kapelevich, L., Zhang, R.: Sparclur/Sparclur.jl: v0.1 (2020). https://doi.org/10.5281/zenodo.4308537

  15. Kornblith, S., Contributors: GLMNet.jl: Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet (2013). https://github.com/JuliaStats/GLMNet.jl

  16. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  17. Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39, 195–198 (1943)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitris Bertsimas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertsimas, D., Dunn, J., Kapelevich, L. et al. Sparse regression over clusters: SparClur. Optim Lett 16, 433–448 (2022). https://doi.org/10.1007/s11590-021-01770-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-021-01770-9

Keywords

Navigation