当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2020-12-31 , DOI: 10.1016/j.jbi.2020.103671
Yuqi Si 1 , Jingcheng Du 1 , Zhao Li 1 , Xiaoqian Jiang 1 , Timothy Miller 2 , Fei Wang 3 , W Jim Zheng 1 , Kirk Roberts 1
Affiliation  

Objectives:

Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs). This is generally performed using advanced deep learning methods. This study presents a systematic review of this field and provides both qualitative and quantitative analyses from a methodological perspective.

Methods:

We identified studies developing patient representations from EHRs with deep learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49 papers were included for a comprehensive data collection.

Results:

Publications developing patient representations almost doubled each year from 2015 until 2019. We noticed a typical workflow starting with feeding raw data, applying deep learning models, and ending with clinical outcome predictions as evaluations of the learned representations. Specifically, learning representations from structured EHR data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely applied as the deep learning architecture (Long short-term memory: 13 studies, Gated recurrent unit: 11 studies). Learning was mainly performed in a supervised manner (30 studies) optimized with cross-entropy loss. Disease prediction was the most common application and evaluation (31 studies). Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns of EHR data, and code availability was assured in 20 studies.

Discussion & Conclusion:

The existing predictive models mainly focus on the prediction of single diseases, rather than considering the complex mechanisms of patients from a holistic review. We show the importance and feasibility of learning comprehensive representations of patient EHR data through a systematic review. Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses. Future work will still be devoted to leveraging the richness and potential of available EHR data. Reproducibility and transparency of reported results will hopefully improve. Knowledge distillation and advanced learning techniques will be exploited to assist the capability of learning patient representation further.



中文翻译:

来自电子健康记录 (EHR) 的患者数据的深度表征学习:系统评价

目标:

患者表示学习是指学习患者的密集数学表示,该表示对来自电子健康记录 (EHR) 的有意义的信息进行编码。这通常使用高级深度学习方法来执行。本研究对该领域进行了系统回顾,并从方法论的角度提供了定性和定量分析。

方法:

我们确定了使用 MEDLINE、EMBASE、Scopus、计算机协会 (ACM) 数字图书馆和电气和电子工程师协会 (IEEE) Xplore 数字图书馆的深度学习方法从 EHR 开发患者表征的研究。在筛选了 363 篇文章后,纳入了 49 篇文章进行了全面的数据收集。

结果:

从 2015 年到 2019 年,开发患者表征的出版物几乎每年翻一番。我们注意到一个典型的工作流程,从提供原始数据开始,应用深度学习模型,最后以临床结果预测作为对学习表征的评估。具体而言,从结构化 EHR 数据中学习表征占主导地位(49 项研究中的 37 项)。循环神经网络被广泛用作深度学习架构(长短期记忆:13 项研究,门控循环单元:11 项研究)。学习主要以使用交叉熵损失优化的监督方式(30 项研究)进行。疾病预测是最常见的应用和评估(31 项研究)。由于 EHR 数据的隐私问题,基准数据集大多不可用(28 项研究),并且在 20 项研究中确保了代码可用性。

讨论与结论:

现有的预测模型主要侧重于对单一疾病的预测,而不是从整体审查来考虑患者的复杂机制。我们通过系统回顾展示了学习患者 EHR 数据综合表示的重要性和可行性。患者表征学习技术的进步对于支持患者级 EHR 分析至关重要。未来的工作仍将致力于利用可用 EHR 数据的丰富性和潜力。报告结果的再现性和透明度有望得到改善。将利用知识蒸馏和先进的学习技术来帮助进一步学习患者表征的能力。

更新日期:2020-12-31
down
wechat
bug