Feature extraction from unequal length heterogeneous EHR time series via dynamic time warping and tensor decomposition,Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › Data Min. Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature extraction from unequal length heterogeneous EHR time series via dynamic time warping and tensor decomposition
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2021-01-04 , DOI: 10.1007/s10618-020-00724-6
Chi Zhang , Hadi Fanaee-T , Magne Thoresen

Electronic Health Records (EHR) data is routinely generated patient data that can provide useful information for analytical tasks such as disease detection and clinical event prediction. However, temporal EHR data such as physiological vital signs and lab test results are particularly challenging. Temporal EHR features typically have different sampling frequencies; such examples include heart rate (measured almost continuously) and blood test results (a few times during a patient’s entire stay). Different patients also have different length of stays. Existing approaches for temporal EHR sequence extraction either ignore the temporal pattern within features, or use a predefined window to select a section of the sequences without taking into account all the information. We propose a novel approach to tackle the issue of irregularly sampled, unequal length EHR time series using dynamic time warping and tensor decomposition. We use DTW to learn the pairwise distances for each temporal feature among the patient cohort and stack the distance matrices into a tensor. We then decompose the tensor to learn the latent structure, which is consequently used for patient representation. Finally, we use the patient representation for in-hospital mortality prediction. We illustrate our method on two cohorts from the MIMIC-III database: the sepsis and the acute kidney failure cohorts. We show that our method produces outstanding classification performance in terms of AUROC, AUPRC and accuracy compared with the baseline methods: LSTM and DTW-KNN. In the end we provide a detailed analysis on the feature importance for the interpretability of our method.

中文翻译：

通过动态时间规整和张量分解从不等长的异构EHR时间序列中提取特征

电子健康记录（EHR）数据是例行生成的患者数据，可以为分析任务（例如疾病检测和临床事件预测）提供有用的信息。但是，暂时的EHR数据（例如生理性生命体征和实验室检查结果）尤其具有挑战性。时间EHR特征通常具有不同的采样频率；这样的例子包括心率（几乎是连续测量的）和验血结果（在患者整个住院期间几次）。不同的患者也有不同的住院时间。用于时间EHR序列提取的现有方法要么忽略特征中的时间模式，要么使用预定义的窗口来选择序列的一部分，而不考虑所有信息。我们提出了一种新颖的方法来解决不规则采样问题，使用动态时间扭曲和张量分解的不等长EHR时间序列。我们使用DTW来学习患者队列中每个时间特征的成对距离，并将距离矩阵堆叠到张量中。然后，我们将张量分解以学习潜在结构，从而将其用于患者表示。最后，我们使用患者代表来预测院内死亡率。我们从MIMIC-III数据库的两个队列中说明了我们的方法：败血症和急性肾衰竭队列。我们证明，与基线方法LSTM和DTW-KNN相比，我们的方法在AUROC，AUPRC和准确性方面均具有出色的分类性能。最后，我们对功能重要性对我们方法的可解释性进行了详细分析。

更新日期：2021-01-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11