Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches,Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › Data Min. Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2021-03-23 , DOI: 10.1007/s10618-021-00746-8
Lu Wang ₁ , Dongxiao Zhu ₁

Affiliation

Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich domains, such as natural, health and social sciences. Most existing ordinal regression approaches work well for independent and identically distributed (IID) instances via formulating a single ordinal regression task. However, for heterogeneous non-IID instances with well-defined local geometric structures, e.g., subpopulation groups, multi-task learning (MTL) provides a promising framework to encode task (subgroup) relatedness, bridge data from all tasks, and simultaneously learn multiple related tasks in efforts to improve generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for heterogeneous data with ordinal labels. We tackle this important problem via sparse and deep multi-task approaches. Specifically, we develop a regularized multi-task ordinal regression (MTOR) model for smaller datasets and a deep neural networks based MTOR model for large-scale datasets. We evaluate the performance using three real-world healthcare datasets with applications to multi-stage disease progression diagnosis. Our experiments indicate that the proposed MTOR models markedly improve the prediction performance comparing with single-task ordinal regression models.

中文翻译：

解决异构数据的序数回归问题：稀疏和深度多任务学习方法

许多现实世界的数据集都用自然顺序进行标记，即序数标签。序数回归是一种预测序数标签的方法，在数据丰富的领域（例如自然科学、健康科学和社会科学）有着广泛的应用。大多数现有的序数回归方法通过制定单个序数回归任务，对于独立同分布 (IID) 实例效果很好。然而，对于具有明确定义的局部几何结构的异构非独立同分布实例（例如子群组），多任务学习（MTL）提供了一个有前景的框架来编码任务（子组）相关性、桥接所有任务的数据并同时学习多个任务。努力提高泛化性能的相关任务。尽管 MTL 方法已被广泛研究，但几乎没有研究针对具有序数标签的异构数据的 MTL 的工作。我们通过稀疏和深度多任务方法来解决这个重要问题。具体来说，我们为较小的数据集开发了正则化多任务序数回归（MTOR）模型，并为大规模数据集开发了基于深度神经网络的 MTOR 模型。我们使用三个真实世界的医疗数据集及其在多阶段疾病进展诊断中的应用来评估性能。我们的实验表明，与单任务序数回归模型相比，所提出的 MTOR 模型显着提高了预测性能。

更新日期：2021-03-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11