当前位置: X-MOL 学术BMC Med. Res. Methodol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The application of unsupervised deep learning in predictive models using electronic health records.
BMC Medical Research Methodology ( IF 4 ) Pub Date : 2020-02-26 , DOI: 10.1186/s12874-020-00923-1
Lei Wang 1, 2 , Liping Tong 3 , Darcy Davis 3 , Tim Arnold 4 , Tina Esposito 3
Affiliation  

BACKGROUND The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. METHODS We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. RESULTS On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. CONCLUSIONS We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.

中文翻译:

无监督深度学习在使用电子健康记录的预测模型中的应用。

背景技术这项研究的主要目的是在预测建模中探索由无监督的深度学习算法自动编码器生成的代表患者水平电子健康记录(EHR)数据的功能。由于自动编码器功能不受监督,因此本文重点介绍了在各种预测任务中其对EHR信息的一般低维表示。方法我们将具有自动编码器功能的模型与传统模型进行比较:具有最小绝对收缩和选择算子(LASSO)的逻辑模型和随机森林算法。此外,我们包括一个使用一小部分特定于响应的变量的预测模型(Simple Reg),以及一个将这些变量与自动编码器功能结合在一起的模型(Enhanced Reg)。我们首先对模拟现实世界EHR数据的模拟数据进行了研究,然后对八家倡导医院的实际EHR数据进行了研究。结果对于类别错误和数据丢失的模拟数据,将召回率固定为0.7时,自动编码器的精度为24.16%,高于随机森林(23.61%)和低于LASSO(25.32%)。Simple Reg的精度为20.92%,Enhanced Reg的精度为24.89%。使用真实的EHR数据预测30天的重新录入率时,自动编码器的精度为19.04%,再次高于随机森林(18.48%)和低于LASSO(19.70%)。Simple Reg和Enhanced Reg的精度分别为18.70%和19.69%。也就是说,与LASSO相比,增强型Reg可以具有竞争性的预测性能。此外,结果表明,在本文的仿真设置下,增强型Reg通常依赖较少的功能。结论我们得出结论,自动编码器可以创建有用的功能,这些功能代表EHR数据的整个空间,并且适用于各种预测任务。与重要的特定于响应的预测器一起,我们可以在数据提取和模型训练中以更少的工作量获得高效,强大的预测模型。
更新日期:2020-04-22
down
wechat
bug