Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches

Wang, Lu; Zhu, Dongxiao

doi:10.1007/s10618-021-00746-8

Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches

Published: 23 March 2021

Volume 35, pages 1134–1161, (2021)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

609 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich domains, such as natural, health and social sciences. Most existing ordinal regression approaches work well for independent and identically distributed (IID) instances via formulating a single ordinal regression task. However, for heterogeneous non-IID instances with well-defined local geometric structures, e.g., subpopulation groups, multi-task learning (MTL) provides a promising framework to encode task (subgroup) relatedness, bridge data from all tasks, and simultaneously learn multiple related tasks in efforts to improve generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for heterogeneous data with ordinal labels. We tackle this important problem via sparse and deep multi-task approaches. Specifically, we develop a regularized multi-task ordinal regression (MTOR) model for smaller datasets and a deep neural networks based MTOR model for large-scale datasets. We evaluate the performance using three real-world healthcare datasets with applications to multi-stage disease progression diagnosis. Our experiments indicate that the proposed MTOR models markedly improve the prediction performance comparing with single-task ordinal regression models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Ordinal Regression via Factorization Machines

Ordinal factorization machine with hierarchical sparsity

Article 07 March 2019

CASSOR: Class-Aware Sample Selection for Ordinal Regression with Noisy Labels

Notes

References

Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Machine Learn Res 6:1817–1853
MathSciNet MATH Google Scholar
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Machine Learn 73(3):243–272
Article Google Scholar
Baetschmann G, Staub KE, Winkelmann R (2015) Consistent estimation of the fixed effects ordered logit model. J Royal Statistical Soc: Series A (Statistics Soc) 178(3):685–703
Article MathSciNet Google Scholar
Baxter J (1997) A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine learn 28(1):7–39
Article Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag sci 2(1):183–202
Article MathSciNet Google Scholar
Cruickshank TM, Reyes AR, Ziman MR (2015) A systematic review and meta-analysis of strength training in individuals with multiple sclerosis or parkinson disease. Medicine 94:4
Brookmeyer R, Johnson E, Ziegler-Graham K, Arrighi HM (2007) Forecasting the global burden of alzheimer’s disease. Alzheimer’s & dementia: J Alzheimer’s Assoc 3(3):186–191
Article Google Scholar
Buja A, Damiani G, Gini R, Visca M, Federico B, Donato D, Francesconi P, Marini A, Donatini A, Brugaletta S et al (2014) Systematic age-related differences in chronic disease management in a population-based cohort study: a new paradigm of primary care is required. PLoS One 9(3):e91340
Article Google Scholar
Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data min knowl discov 19(2):210–226
Chan DS, Norat T (2015) Obesity and breast cancer: not only a risk factor of the disease. Current treat opt oncol 16(5):22
Article Google Scholar
Cheng J, Wang Z, Pollastri G (2008) A neural network approach to ordinal regression, in Neural Networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. IEEE 2008:1279–1284
Hamidi D. Yar, Wennberg K, Berglund H (2008) Creativity in entrepreneurship education. J small bus enterp dev 15(2):304–320
Chu W, Keerthi SS (2007) Support vector ordinal regression. Neural comput 19(3):792–815
Article MathSciNet Google Scholar
Liu Y, Kong A. W.-K, Goh C. K (2017) “Deep ordinal regression based on data relationship for small datasets.” in IJCAI, pp. 2372–2378
Cruickshank TM, Reyes AR, Ziman MR (2015) A systematic review and meta-analysis of strength training in individuals with multiple sclerosis or parkinson disease. Medicine 94:4
Article Google Scholar
Cruz GD, Galvis DL, Kim M, Le-Geros RZ, Barrow S-YL, Tavares M, Bachiman R (2001) Self-perceived oral health among three subgroups of asian-americans in new york city: a preliminary study. Commun dent oral epidemiol 29(2):99–106
Article Google Scholar
Davis DA, Chawla NV, Christakis NA, Barabási A-L (2010) Time to care: a collaborative engine for practical disease prediction. Data Min Knowl Discov 20(3):388–415
Article MathSciNet Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Lanfranchi M, Giannetto C, Zirilli A, Alibrandi A (2014) Analysis of the demand of wine in sicily through ordinal logistic regression model. Calitatea 15(139):87
Duricova D, Burisch J, Jess T, Gower-Rousseau C, Lakatos PL (2014) ECCO-EpiCom, & Age-related differences in presentation and course of inflammatory bowel disease an update on the population-based literature. Journal of Crohn’s and Colitis 8(11):1351–1361
Article Google Scholar
Kato T, Kashima H, Sugiyama M, Asai K (2008) “Multi-task learning via conic programming,” in Advances in Neural Information Processing Systems, pp. 737–744
Park S-H, Fürnkranz J (2012) Efficient prediction algorithms for binary decomposition techniques. Data Min Knowl Discov 24(1):40–77
Har-Peled S, Roth D, Zimak D, (2002) “Constraint classification: A new approach to multiclass classification and ranking,” in In Advances in Neural Information Processing Systems 15. Citeseer,
Gursoy ME, Inan A, Nergiz ME, Saygin Y (2017) Differentially private nearest neighbor classification. Data Min Knowl Discov 31(5):1544–1575
Geifman N, Cohen R, Rubin E (2013) Redefining meaningful age groups in the context of disease. Age 35(6):2357–2366
Article Google Scholar
Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data min knowl discov 19(2):210–226
Article MathSciNet Google Scholar
Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural netw learn syst 26(7):1403–1416
Article MathSciNet Google Scholar
Gursoy ME, Inan A, Nergiz ME, Saygin Y (2017) Differentially private nearest neighbor classification. Data Min Knowl Discov 31(5):1544–1575
Article MathSciNet Google Scholar
Gutiérrez PA, Perez-Ortiz M, Sanchez-Monedero J, Fernandez-Navarro F, Hervas-Martinez C (2016) Ordinal regression methods: survey and experimental study. IEEE Trans Knowl Data Eng 28(1):127–146
Article Google Scholar
Schmidt-Richberg A, Guerrero R, Ledig C, Molina-Abril H, Frangi A. F, Rueckert D, Initiative A. D. N et al., (2015) “Multi-stage biomarker models for progression estimation in alzheimer’s disease,” in International Conference on Information Processing in Medical Imaging. Springer, pp. 387–398
Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural netw learn syst 26(7):1403–1416
Henriques R, Madeira SC, Antunes C (2015) Multi-period classification: learning sequent classes from temporal domains. Data Min Knowl Discov 29(3):792–819
Article MathSciNet Google Scholar
Hong HG, He X (2010) Prediction of functional status for the elderly based on a new ordinal regression model. J Am Statistical Assoc 105(491):930–941
Article MathSciNet Google Scholar
Wang L, Dong M, Towner E, Zhu D (2019) “Prioritization of multi-level risk factors for obesity,” in 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp. 1065–1072
Kaplan D (2004) The Sage handbook of quantitative methodology for the social sciences. Sage
Yu S, Yu K, Tresp V, Kriegel H.-P (2006) “Collaborative ordinal regression,” in Proceedings of the 23rd international conference on Machine learning. ACM, , pp. 1089–1096
Kim M (2014) Conditional ordinal random fields for structured ordinal-valued label prediction. Data min knowl discov 28(2):378–401
Article MathSciNet Google Scholar
Kockelman KM, Kweon Y-J (2002) Driver injury severity: an application of ordered probit models. Accident Analysis & Prevention 34(3):313–321
Article Google Scholar
Lanfranchi M, Giannetto C, Zirilli A, Alibrandi A (2014) Analysis of the demand of wine in sicily through ordinal logistic regression model. Calitatea 15(139):87
Google Scholar
Lemmerich F, Atzmueller M, Puppe F (2016) Fast exhaustive subgroup discovery with numerical target concepts. Data Min Knowl Discov 30(3):711–762
Article MathSciNet Google Scholar
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Machine Learn 73(3):243–272
Liu J, Ji S, Ye J (2009) “Multi-task feature learning via efficient l 2, 1-norm minimization,” in Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp. 339–348
Gutiérrez PA, Perez-Ortiz M, Sanchez-Monedero J, Fernandez-Navarro F, Hervas-Martinez C (2016) Ordinal regression methods: survey and experimental study. IEEE Trans Knowl Data Eng 28(1):127–146
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, United States
Li L, Lin H.-T (2007) “Ordinal regression by extended binary classification,” in Advances in neural information processing systems, pp. 865–872
Menon AK, Elkan C (2010) Predicting labels for dyadic data. Data Min Knowl Discov 21(2):327–343
Article MathSciNet Google Scholar
Montañés E, Suárez-Vázquez A, Quevedo JR (2014) Ordinal classification/regression for analyzing the influence of superstars on spectators in cinema marketing. Expert Syst Appl 41(18):8101–8111
Article Google Scholar
Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W, Trojanowski JQ, Toga AW, Beckett L (2005) The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics 15(4):869–877
Article Google Scholar
Nesterov Y (2013) Introductory lectures on convex optimization: A basic course, vol 87. Springer Science & Business Media, Berlin
MATH Google Scholar
Ye F, Lord D (2014) Comparing three commonly used crash severity models on sample size requirements: multinomial logit, ordered probit and mixed logit models. Analyt methods accident res 1:72–85
Nesterov Y (2013) Introductory lectures on convex optimization: A basic course, vol 87. Springer Science & Business Media, Berlin
Park S-H, Fürnkranz J (2012) Efficient prediction algorithms for binary decomposition techniques. Data Min Knowl Discov 24(1):40–77
Article MathSciNet Google Scholar
Zhou J, Chen J, Ye J (2011) “Clustered multi-task learning via alternating structure optimization,” in Advances in neural information processing systems, pp. 702–710
Ruder S (2017) “An overview of multi-task learning in deep neural networks,” arXiv preprintarXiv:1706.05098,
Duong L, Cohn T, Bird S, Cook P (2015) “Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 845–850
Yang Y, Hospedales T. M (2016) “Trace norm regularised deep multi-task learning,” arXiv preprintarXiv:1606.04038,
Tran T, Phung D, Luo W, Venkatesh S (2015) Stabilized sparse ordinal regression for medical risk stratification. Knowl Info Syst 43(3):555–582
Article Google Scholar
Lu Y, Kumar A, Zhai S, Cheng Y, Javidi T, Feris R (2016) “Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification,” arXiv preprintarXiv:1611.05377,
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Machine Learn Res 6:1817–1853
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag sci 2(1):183–202
Williams R et al (2006) Generalized ordered logit/partial proportional odds models for ordinal dependent variables. Stata J 6(1):58
Article Google Scholar
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, United States
Google Scholar
Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W, Trojanowski JQ, Toga AW, Beckett L (2005) The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics 15(4):869–877
Yar Hamidi D, Wennberg K, Berglund H (2008) Creativity in entrepreneurship education. J small bus enterp dev 15(2):304–320
Article Google Scholar
Ye F, Lord D (2014) Comparing three commonly used crash severity models on sample size requirements: multinomial logit, ordered probit and mixed logit models. Analyt methods accident res 1:72–85
Article Google Scholar
Westbrook M. T, Viney L. L (1983) “Age and sex differences in patients’ reactions to illness,” Journal of health and social behavior, pp. 313–324,
Geifman N, Cohen R, Rubin E (2013) Redefining meaningful age groups in the context of disease. Age 35(6):2357–2366

Download references

Acknowledgements

This paper is based upon work supported by the National Science Foundation under grants CNS-1637312 and CCF-1451316.

Author information

Authors and Affiliations

Department of Computer Science, Wayne State University, Detroit, MI, 48202, USA
Lu Wang & Dongxiao Zhu

Authors

Lu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongxiao Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongxiao Zhu.

Additional information

Responsible editor: Pierre Baldi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, L., Zhu, D. Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches. Data Min Knowl Disc 35, 1134–1161 (2021). https://doi.org/10.1007/s10618-021-00746-8

Download citation

Received: 09 January 2019
Accepted: 04 March 2021
Published: 23 March 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10618-021-00746-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches

Abstract

Access this article

Similar content being viewed by others

Sparse Ordinal Regression via Factorization Machines

Ordinal factorization machine with hierarchical sparsity

CASSOR: Class-Aware Sample Selection for Ordinal Regression with Noisy Labels

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches

Abstract

Access this article

Similar content being viewed by others

Sparse Ordinal Regression via Factorization Machines

Ordinal factorization machine with hierarchical sparsity

CASSOR: Class-Aware Sample Selection for Ordinal Regression with Noisy Labels

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation