A Disease Inference Method Based on Symptom Extraction and Bidirectional Long Short Term Memory networks,Methods

当前位置： X-MOL 学术 › Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Disease Inference Method Based on Symptom Extraction and Bidirectional Long Short Term Memory networks
Methods ( IF 4.2 ) Pub Date : 2020-02-01 , DOI: 10.1016/j.ymeth.2019.07.009
Donglin Guo , Guihua Duan , Ying Yu , Yaohang Li , Fang-Xiang Wu , Min Li

The wide applications of automatic disease inference in many medical fields improve the efficiency of medical treatments. Many efforts have been made to predict patients' future health conditions according to their full clinical texts, clinical measurements or medical codes. Symptoms reflect the onset of diseases and can provide credible information for disease diagnosis. In this study, we propose a new disease inference method by extracting symptoms and integrating two symptom representation approaches. To reduce the uncertainty and irregularity of symptom descriptions in Electronic Medical Records (EMR), a comprehensive clinical knowledge database consisting of massive amount of data about diseases, symptoms, and their relationships, we extract symptoms with existing nature language process tool Metamap which is designed for biomedical texts. To take advantages of the complex relationship between symptoms and diseases to enhance the accuracy of disease inference, we present two symptom representation models: term frequency-inverse document frequency (TF-IDF) model for the representation of the relationship between symptoms and diseases and Word2Vec for the expression of the semantic relationship between symptoms. Based on these two symptom representations, we employ the bidirectional Long Short Term Memory networks (BiLSTMs) to model symptom sequences in EMR. Our proposed model shows a significant improvement in term of AUC (0.895) and F1 (0.572) for 50 diseases in MIMIC-III dataset. The results illustrate that the model with the combination of the two symptom representations perform better than the one with only one of them.

中文翻译：

一种基于症状提取和双向长短期记忆网络的疾病推断方法

自动疾病推理在许多医学领域的广泛应用提高了医疗效率。已经做出了许多努力来根据患者的完整临床文本、临床测量或医学规范来预测患者未来的健康状况。症状反映了疾病的发生，可以为疾病的诊断提供可信的信息。在这项研究中，我们通过提取症状和整合两种症状表示方法提出了一种新的疾病推理方法。为了减少电子病历（EMR）中症状描述的不确定性和不规则性，EMR是一个由大量关于疾病、症状及其关系的数据组成的综合临床知识数据库，我们使用现有的自然语言处理工具Metamap来提取症状用于生物医学文本。为了利用症状和疾病之间的复杂关系来提高疾病推断的准确性，我们提出了两种症状表示模型：用于表示症状和疾病之间关系的词频-逆文档频率（TF-IDF）模型和 Word2Vec用于表征症状之间的语义关系。基于这两种症状表示，我们采用双向长短期记忆网络 (BiLSTM) 对 EMR 中的症状序列进行建模。我们提出的模型显示 MIMIC-III 数据集中 50 种疾病的 AUC (0.895) 和 F1 (0.572) 显着改善。结果表明，结合了两种症状表征的模型比只有一种表征的模型表现更好。

更新日期：2020-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11