A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records,arXiv - CS - Computers and Society

当前位置： X-MOL 学术 › arXiv.cs.CY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records
arXiv - CS - Computers and Society Pub Date : 2020-06-23 , DOI: arxiv-2006.16926
Leopold Franz, Yash Raj Shrestha, Bibek Paudel

Augmentation of disease diagnosis and decision-making in healthcare with machine learning algorithms is gaining much impetus in recent years. In particular, in the current epidemiological situation caused by COVID-19 pandemic, swift and accurate prediction of disease diagnosis with machine learning algorithms could facilitate identification and care of vulnerable clusters of population, such as those having multi-morbidity conditions. In order to build a useful disease diagnosis prediction system, advancement in both data representation and development of machine learning architectures are imperative. First, with respect to data collection and representation, we face severe problems due to multitude of formats and lack of coherency prevalent in Electronic Health Records (EHRs). This causes hindrance in extraction of valuable information contained in EHRs. Currently, no universal global data standard has been established. As a useful solution, we develop and publish a Python package to transform public health dataset into an easy to access universal format. This data transformation to an international health data format facilitates researchers to easily combine EHR datasets with clinical datasets of diverse formats. Second, machine learning algorithms that predict multiple disease diagnosis categories simultaneously remain underdeveloped. We propose two novel model architectures in this regard. First, DeepObserver, which uses structured numerical data to predict the diagnosis categories and second, ClinicalBERT_Multi, that incorporates rich information available in clinical notes via natural language processing methods and also provides interpretable visualizations to medical practitioners. We show that both models can predict multiple diagnoses simultaneously with high accuracy.

中文翻译：

使用电子健康记录进行患者诊断预测的深度学习管道

近年来，使用机器学习算法增强医疗保健中的疾病诊断和决策正在获得很大的推动力。特别是，在当前由 COVID-19 大流行引起的流行病学情况下，使用机器学习算法快速准确地预测疾病诊断可以促进对易感人群的识别和护理，例如患有多种疾病的人群。为了建立一个有用的疾病诊断预测系统，数据表示和机器学习架构的发展都势在必行。首先，在数据收集和表示方面，由于电子健康记录 (EHR) 中普遍存在的多种格式和缺乏一致性，我们面临着严重的问题。这会阻碍提取包含在 EHR 中的有价值信息。目前，尚未建立统一的全球数据标准。作为一个有用的解决方案，我们开发并发布了一个 Python 包，用于将公共卫生数据集转换为易于访问的通用格式。这种向国际健康数据格式的数据转换有助于研究人员轻松地将 EHR 数据集与不同格式的临床数据集结合起来。其次，同时预测多种疾病诊断类别的机器学习算法仍然不发达。我们在这方面提出了两种新颖的模型架构。首先是 DeepObserver，它使用结构化数值数据来预测诊断类别，其次是 ClinicalBERT_Multi，它通过自然语言处理方法整合了临床笔记中可用的丰富信息，并为医疗从业者提供了可解释的可视化。我们表明，这两种模型都可以同时高精度地预测多个诊断。

更新日期：2020-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文