Skip to main content
Log in

A generalized likelihood-based Bayesian approach for scalable joint regression and covariance selection in high dimensions

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the latter. While Bayesian methods have the advantage of providing natural uncertainty quantification through posterior inclusion probabilities and credible intervals, current Bayesian approaches either restrict to specific sub-classes of sparsity patterns and/or are not scalable to settings with hundreds of responses and predictors. Bayesian approaches that only focus on estimating the posterior mode are scalable, but do not generate samples from the posterior distribution for uncertainty quantification. Using a bi-convex regression-based generalized likelihood and spike-and-slab priors, we develop an algorithm called joint regression network selector (JRNS) for joint regression and covariance selection, which (a) can accommodate general sparsity patterns, (b) provides posterior samples for uncertainty quantification, and (c) is scalable and orders of magnitude faster than the state-of-the-art Bayesian approaches providing uncertainty quantification. We demonstrate the statistical and computational efficacy of the proposed approach on synthetic data and through the analysis of selected cancer data sets. We also establish high-dimensional posterior consistency for one of the developed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Alquier, P.: Approximate bayesian inference. Entropy (Basel) 22, 1272 (2020)

  • Barbieri, M.M., Berger, J.O.: Optimal predictive model selection. Ann. Stat. 32(3), 870–897 (2004)

    Article  MathSciNet  Google Scholar 

  • Besag, J.: Statistical analysis of non-lattice data. J. R. Stat. Soc.: Ser. D (The Statistician) 24(3), 179–195 (1975)

    Google Scholar 

  • Bhadra, A., Mallick, B.K.: Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis. Biometrics 69(2), 447–457 (2013)

    Article  MathSciNet  Google Scholar 

  • Bissiri, P., Holmes, C., Walker, S.: A general framework for updating belief distributions. J. R. Stat. Soc. Ser. B 78, 1103–1130 (2016)

    Article  MathSciNet  Google Scholar 

  • Brown, P.J., Vannucci, M., Fearn, T.: Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. Ser. B 60, 627–641 (1998)

    Article  MathSciNet  Google Scholar 

  • Cai, T.T., Li, H., Liu, W., Xie, J.: Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika 100(1), 139–156 (2013)

    Article  MathSciNet  Google Scholar 

  • Cao, X., Khare, K., Ghosh, M.: Posterior graph selection and estimation consistency for high-dimensional bayesian dag models. Ann. Stat. 47(1), 319–348 (2019)

    Article  MathSciNet  Google Scholar 

  • Consonni, G., La Rocca, L., Peluso, S.: Objective Bayes covariate-adjusted sparse graphical model selection. Scand. J. Stat. 44, 741–764 (2017)

    Article  MathSciNet  Google Scholar 

  • Deshpande, S.K., Ročková, V., George, E.I.: Simultaneous variable and covariance selection with the multivariate spike-and-slab lasso. J. Comput. Gr. Stat. 28(4), 921–931 (2019)

    Article  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  Google Scholar 

  • Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014)

    MathSciNet  MATH  Google Scholar 

  • Gonzalez, D.M., Medici, D.: Signaling mechanisms of the epithelial-mesenchymal transition. Sci. Signal. 7(344), 8 (2014)

    Article  Google Scholar 

  • Ha, MJ., Stingo, F., Baladandayuthapani, V.: Supplemental material for ‘Bayesian Structure Learning in Multi-layered Genomic Networks’. Github (2020a)

  • Ha, M.J., Stingo, F.C., Baladandayuthapani, V.: Bayesian structure learning in multi-layered genomic networks. J. Am. Stat. Assoc. 1, 1–33 (2020b)

    Google Scholar 

  • Khare, K., Oh, S., Rajaratnam, B.: A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J. R. Stat. Soc. B 77, 803–825 (2015)

  • Lee, K., Lee, K., Lee, J.: Post-processed posteriors for banded covariances. arXiv preprint arXiv:2011.12627 (2020)

  • Lee, W., Liu, Y.: Simultaneous multiple response regression and inverse covariance matrix estimation via penalized gaussian maximum likelihood. J. Multivar. Anal. 111, 241–255 (2012)

    Article  MathSciNet  Google Scholar 

  • Li, Y., Datta, J., Craig, B.A., Bhadra, A.: Joint mean-covariance estimation via the horseshoe. J. Multivar. Anal. 183, 104716 (2021)

    Article  MathSciNet  Google Scholar 

  • Lin, J., Basu, S., Banerjee, M., Michailidis, G.: Penalized maximum likelihood estimation of multi-layered gaussian graphical models. J. Mach. Learn. Res. 17, 1–51 (2016)

    MathSciNet  MATH  Google Scholar 

  • Lin, L., Drton, M., Shojaie, A.: High-dimensional inference of graphical models using regularized score matching. Electron. J. Stat. 10(1), 394–422 (2016)

    Article  MathSciNet  Google Scholar 

  • Ma, J., Michailidis, G.: Joint structural estimation of multiple graphical models. J. Mach. Learn. Res. 17(166), 1–48 (2016)

    MathSciNet  MATH  Google Scholar 

  • McCarter, C., Kim, S.: (2014) On sparse gaussian chain graph models. In: Advances in Neural Information Processing Systems, pp. 3212–3220

  • Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 1, 1436–1462 (2006)

    MathSciNet  MATH  Google Scholar 

  • Narisetty, N., He, X.: Bayesian variable selection with shrinking and diffusing priors. Ann. Stat. 42, 789–817 (2014)

  • Peng, J., Wang, P., Zhou, N., Zhu, J.: Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc. 104, 735–746 (2009)

  • Richardson, S., Bottolo, L., Rosenthal, J.S.: Bayesian models for sparse regression analysis of high dimensional data. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A.F.M., West, M. (Eds.) Bayesian Statistics 9 (2010)

  • Rothman, A.J., Levina, E., Zhu, J.: Sparse multivariate regression with covariance estimation. J. Comput. Gr. Stat. 19(4), 947–962 (2010)

    Article  MathSciNet  Google Scholar 

  • Sohn, KA., Kim, S.: (2012) Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. In: International Conference on Artificial Intelligence and Statistics, pp 1081–1089

  • Wang, H.: Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7(4), 867–886 (2012)

    Article  MathSciNet  Google Scholar 

  • Yuan, X.T., Zhang, T.: Partial gaussian graphical model estimation. IEEE Trans. Inf. Theory 60(3), 1673–1687 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The work of GM was supported in part by NIH grants 1U01CA235489-01 and 1R01GM114029-01A1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Michailidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 2254 KB)

Appendix: Pathways for TCGA cancer data

Appendix: Pathways for TCGA cancer data

Table 13 lists the indices of all the genes and proteins in the LUAD cancer data, and Table 14 lists all the pathways that have been considered in the analysis of the TCGA cancer data in Sect. 5 and their gene members.

Table 13 Indices of genes and proteins for LUAD lung cancer data. The first column lists the components of the dataset mRNA(genes), and the second column lists the components of the dataset RPPA(proteins)
Table 14 Pathways and gene membership

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samanta, S., Khare, K. & Michailidis, G. A generalized likelihood-based Bayesian approach for scalable joint regression and covariance selection in high dimensions. Stat Comput 32, 47 (2022). https://doi.org/10.1007/s11222-022-10102-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10102-5

Keywords

Navigation