Skip to main content
Log in

Regression and subgroup detection for heterogeneous samples

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Regression analysis of heterogeneous samples with subgroup structure is essential to the development of precision medicine. In practice, this task is often challenging owing to the lack of prior knowledge of subgroup labels. Therefore, detecting the subgroups with similar characteristics becomes critical, which often controls the accuracy of regression analysis. In this article, we investigate a new framework for detecting the subgroups that have similar characters in feature space and similar treatment effects. The key idea is that we incorporate K-means clustering into the regression framework of concave pairwise fusion, so that the regression and subgroup detection tasks can be performed simultaneously. Our method is specifically tailored for handling the situations where the sample is not homogeneous in the sense that the response variables in different domains of feature space are generated through different mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Eckstein J (2012) Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. In: RUTCOR research report RRR 32-2012, Rutgers University, pp 1–34

  • El-Banna M (2017) Modified Mahalanobis Taguchi system for imbalance data classification. Comput Intell Neurosc 2017:5874896–15

    Article  Google Scholar 

  • Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Arnold, London

    MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  Google Scholar 

  • Fortin M, Glowinski R (1983) On decomposition-coordination methods using an augmented Lagrangian. In: Fortin M, Glowinski R (eds) Augmented Lagrangian methods: applications to the solution of boundary-value problems. North-Holland, Amsterdam

    MATH  Google Scholar 

  • Huang H (2017) Regression in heterogeneous problems. Statistica Sinica 27(1):71–88

    MathSciNet  MATH  Google Scholar 

  • Hartigan JA (1975) Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2016) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin, pp 459–463

    MATH  Google Scholar 

  • Huber PJ (1981) Robust statistics. Wiley, New York, pp 153–164

    Book  Google Scholar 

  • Kumar P, Kanaujia SK, Singh A, Pradhan A (2019) In vivo detection of oral precancer using a fluorescence-based, in-house-fabricated device: a Mahalanobis distance-based classification. Lasers Med Sci 34(6):1243–1251

    Article  Google Scholar 

  • Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J Am Stat Assoc 112(517):410–423

    Article  MathSciNet  Google Scholar 

  • Martino A, Ghiglietti A, Ieva F, Paganoni AM (2019) A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data. Stat Methods Appl 28(2):301–322

    Article  MathSciNet  Google Scholar 

  • Meier L, van de Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J R Stat Soc Ser B (Stat Methodol) 70(1):53–71

    Article  MathSciNet  Google Scholar 

  • Morgan KL, Rubin DB (2015) Rerandomization to balance tiers of covariates. J Am Stat Assoc 110(512):1412–1421

    Article  MathSciNet  Google Scholar 

  • Nikpay S, Freedman S, Levy H, Buchmueller T (2017) Effect of the affordable care act medicaid expansion on emergency department visits: evidence from state-level emergency department databases. Ann Emerg Med 70(2):215–225.e6

    Article  Google Scholar 

  • Sorensen T (1996) Which patients may be harmed by good treatments? Lancet 384:351–352

    Article  Google Scholar 

  • Shen J, He X (2015) Inference for subgroup analysis with a structured logistic-normal mixture model. J Am Stat Assoc 110(509):303–312

    Article  MathSciNet  Google Scholar 

  • Tehan H, Witteveen K, Tolan GA, Tehan G, Senior GJ (2018) Using mahalanobis distance to evaluate recovery in acute stroke. Arch Clin Neuropsychol 33(5):577–582

    Article  Google Scholar 

  • Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568

    Article  MathSciNet  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  MathSciNet  Google Scholar 

  • Zhang Y, Wang HJ, Zhu Z (2019) Robust subgroup identification. Stat Sin 29(4):1873–1889

    MathSciNet  MATH  Google Scholar 

  • Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selecting a target population for a future comparative study. J Am Stat Assoc 108(502):527–539

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank AE and two anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this article. The authors also thank professor Shujie Ma for her constructive comments on our work during the meeting at LICAS 2019. This research was supported by the Fundamental Research Funds for the Central Universities, Beijing Natural Science Foundation (No. 1204031), and the National Natural Science Foundation of China (No. 11901013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanping Qiu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 217 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, B., Wu, P., Tong, X. et al. Regression and subgroup detection for heterogeneous samples. Comput Stat 35, 1853–1878 (2020). https://doi.org/10.1007/s00180-020-00965-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-00965-5

Keywords

Navigation