当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning hidden patterns from patient multivariate time series data using convolutional neural networks: A case study of healthcare cost prediction
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2020-09-25 , DOI: 10.1016/j.jbi.2020.103565
Mohammad Amin Morid 1 , Olivia R Liu Sheng 2 , Kensaku Kawamoto 3 , Samir Abdelrahman 4
Affiliation  

Objective

To develop an effective and scalable individual-level patient cost prediction method by automatically learning hidden temporal patterns from multivariate time series data in patient insurance claims using a convolutional neural network (CNN) architecture.

Methods

We used three years of medical and pharmacy claims data from 2013 to 2016 from a healthcare insurer, where data from the first two years were used to build the model to predict costs in the third year. The data consisted of the multivariate time series of cost, visit and medical features that were shaped as images of patients’ health status (i.e., matrices with time windows on one dimension and the medical, visit and cost features on the other dimension). Patients’ multivariate time series images were given to a CNN method with a proposed architecture. After hyper-parameter tuning, the proposed architecture consisted of three building blocks of convolution and pooling layers with an LReLU activation function and a customized kernel size at each layer for healthcare data. The proposed CNN learned temporal patterns became inputs to a fully connected layer. We benchmarked the proposed method against three other methods: (1) a spike temporal pattern detection method, as the most accurate method for healthcare cost prediction described to date in the literature; (2) a symbolic temporal pattern detection method, as the most common approach for leveraging healthcare temporal data; and (3) the most commonly used CNN architectures for image pattern detection (i.e., AlexNet, VGGNet and ResNet) (via transfer learning). Moreover, we assessed the contribution of each type of data (i.e., cost, visit and medical). Finally, we externally validated the proposed method against a separate cohort of patients. All prediction performances were measured in terms of mean absolute percentage error (MAPE).

Results

The proposed CNN configuration outperformed the spike temporal pattern detection and symbolic temporal pattern detection methods with a MAPE of 1.67 versus 2.02 and 3.66, respectively (p < 0.01). The proposed CNN outperformed ResNet, AlexNet and VGGNet with MAPEs of 4.59, 4.85 and 5.06, respectively (p < 0.01). Removing medical, visit and cost features resulted in MAPEs of 1.98, 1.91 and 2.04, respectively (p < 0.01).

Conclusions

Feature learning through the proposed CNN configuration significantly improved individual-level healthcare cost prediction. The proposed CNN was able to outperform temporal pattern detection methods that look for a pre-defined set of pattern shapes, since it is capable of extracting a variable number of patterns with various shapes. Temporal patterns learned from medical, visit and cost data made significant contributions to the prediction performance. Hyper-parameter tuning showed that considering three-month data patterns has the highest prediction accuracy. Our results showed that patients’ images extracted from multivariate time series data are different from regular images, and hence require unique designs of CNN architectures. The proposed method for converting multivariate time series data of patients into images and tuning them for convolutional learning could be applied in many other healthcare applications with multivariate time series data.



中文翻译:

使用卷积神经网络从患者多元时间序列数据中学习隐藏模式:医疗费用预测的案例研究

目的

通过使用卷积神经网络(CNN)体系结构从患者保险索赔中的多元时间序列数据中自动学习隐藏的时间模式,从而开发出有效且可扩展的个人水平患者费用预测方法。

方法

我们使用了医疗保险公司从2013年至2016年的三年医疗和药学索赔数据,其中前两年的数据用于构建模型以预测第三年的成本。数据由成本,就诊和医疗特征的多元时间序列组成,这些时间序列被塑造成患者健康状况的图像(即,一维具有时间窗口的矩阵,而另一维具有医疗,就诊和成本特征)。将患者的多元时间序列图像提供给具有建议架构的CNN方法。经过超参数调整后,所提出的体系结构由卷积和合并层的三个构建块组成,这些层具有LReLU激活功能,并在每一层为医疗数据定制了内核大小。提议的CNN学习的时间模式成为全连接层的输入。我们将提出的方法与其他三种方法进行了比较:(1)峰值时间模式检测方法,这是迄今为止文献中描述的最准确的医疗费用预测方法;(2)一种象征性的时间模式检测方法,是利用医疗时间数据的最常用方法;(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。我们将提出的方法与其他三种方法进行了比较:(1)峰值时间模式检测方法,这是迄今为止文献中描述的最准确的医疗费用预测方法;(2)一种象征性的时间模式检测方法,是利用医疗时间数据的最常用方法;(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。我们将提出的方法与其他三种方法进行了比较:(1)峰值时间模式检测方法,这是迄今为止文献中描述的最准确的医疗费用预测方法;(2)一种象征性的时间模式检测方法,是利用医疗时间数据的最常用方法;(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。作为迄今为止文献中描述的最准确的医疗保健费用预测方法;(2)一种象征性的时间模式检测方法,是利用医疗时间数据的最常用方法;(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。作为迄今为止文献中描述的最准确的医疗保健费用预测方法;(2)一种象征性的时间模式检测方法,是利用医疗时间数据的最常用方法;(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。(3)用于图像模式检测的最常用的CNN体​​系结构(即AlexNet,VGGNet和ResNet)(通过转移学习)。此外,我们评估了每种数据(即费用,访问和医疗)的贡献。最后,我们在外部针对一组单独的患者验证了该方法。所有预测性能均根据平均绝对百分比误差(MAPE)进行衡量。

结果

所提出的CNN配置的性能优于峰值时间模式检测和符号时间模式检测方法,MAPE分别为1.67、2.02和3.66(p <0.01)。拟议的CNN优于ResNet,AlexNet和VGGNet,MAPE分别为4.59、4.85和5.06(p <0.01)。消除医疗,就诊和费用特征后,MAPE分别为1.98、1.91和2.04(p <0.01)。

结论

通过建议的CNN配置进行特征学习可显着改善个人级别的医疗费用预测。所提出的CNN能够胜过寻找一组预先定义的图案形状的时间图案检测方法,因为它能够提取可变数量的各种形状的图案。从医疗,就诊和费用数据中学习的时间模式对预测性能做出了重要贡献。超参数调整表明,考虑三个月的数据模式具有最高的预测准确性。我们的结果表明,从多元时间序列数据中提取的患者图像与常规图像不同,因此需要CNN体系结构的独特设计。

更新日期:2020-10-19
down
wechat
bug