当前位置: X-MOL 学术Artif. Intell. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel method for clinical risk prediction with low-quality data
Artificial Intelligence in Medicine ( IF 7.5 ) Pub Date : 2021-03-17 , DOI: 10.1016/j.artmed.2021.102052
Zeyuan Wang 1 , Josiah Poon 2 , Shuze Wang 3 , Shiding Sun 4 , Simon Poon 2
Affiliation  

In real-world data, predictive models for clinical risks (such as adverse drug reactions, hospital readmission, and chronic disease onset) are constantly struggling with low-quality issues, namely redundant and highly correlated features, extreme category imbalances, and most importantly, a large number of missing values. In most existing work, each patient is represented as a value vector with the fixed-length from some feature space, and missing values are forced to be imputed, which introduces much noise for prediction if the data set is highly incomplete. Besides, other challenges are either remaining unresolved or only partially solved when modeling, but without a systematic approach. In this paper, we propose a novel framework to address these low-quality problems, that we first treat patients as bags with the various number of feature-value pairs, called instances, and map them to an embedding space through our proposed feature embedding method to learn from it directly. In this way, predictive models can avoid the negative impact of missing data naturally. A novel multi-instance neural network is then connected, using two computational modules to deal with the problems of correlated and redundant features: multi-head attention and attention-based multi-instance pooling. They are capable of capturing the instance correlations and locating valuable information in each instance or bag. The feature embedding and multi-instance neural network are parameterized and optimized jointly in an end-to-end manner. Moreover, the training process is under both main and auxiliary supervision with focal loss functions to avoid the caveat of a highly imbalanced label set. This proposed framework is named AMI-Net3. We evaluate it on three suitable data sets from real-world settings with different clinical risk prediction tasks: adverse drug reaction of risperidone, schizophrenia relapse, and invasive fungi infection, respectively. The comprehensive experimental results demonstrate the effectiveness and superiority of our proposed method over competitive baselines.



中文翻译:

一种利用低质量数据进行临床风险预测的新方法

在现实世界的数据中,临床风险(如药物不良反应、再入院和慢性病发作)的预测模型不断与低质量问题作斗争,即冗余和高度相关的特征、极端类别不平衡,最重要的是,大量缺失值。在大多数现有工作中,每个患者都被表示为来自某个特征空间的具有固定长度的值向量,并且强制对缺失值进行插补,如果数据集高度不完整,这会给预测带来很多噪音。此外,在建模时,其他挑战要么仍未解决,要么仅部分解决,但没有系统的方法。在本文中,我们提出了一个新的框架来解决这些低质量问题,我们首先将患者视为具有不同数量的特征值对的袋子,称为实例,并通过我们提出的特征嵌入方法将它们映射到嵌入空间以直接从中学习。这样,预测模型自然可以避免数据缺失带来的负面影响。然后连接一个新的多实例神经网络,使用两个计算模块来处理相关和冗余特征的问题:多头注意力和基于注意力的多实例池化。它们能够捕获实例相关性并在每个实例或包中定位有价值的信息。特征嵌入和多实例神经网络以端到端的方式联合参数化和优化。而且,训练过程在主要和辅助监督下都有焦点损失函数,以避免标签集高度不平衡的警告。这个提议的框架被命名为 AMI-Net3。我们在具有不同临床风险预测任务的现实世界环境中的三个合适的数据集上对其进行评估:利培酮的不良药物反应、精神分裂症复发和侵袭性真菌感染,分别。综合实验结果证明了我们提出的方法在竞争基线上的有效性和优越性。

更新日期:2021-03-25
down
wechat
bug