Deep Learning versus Conventional Machine Learning for Detection of Healthcare-Associated Infections in French Clinical Narratives.,Methods of Information in Medicine

当前位置： X-MOL 学术 › Methods Inf. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning versus Conventional Machine Learning for Detection of Healthcare-Associated Infections in French Clinical Narratives.
Methods of Information in Medicine ( IF 1.7 ) Pub Date : 2019-03-15 , DOI: 10.1055/s-0039-1677692
Sara Rabhi ₁ , Jérémie Jakubowicz ₁ , Marie-Helene Metzger _{2,

3,

4}

Affiliation

OBJECTIVE The objective of this article was to compare the performances of health care-associated infection (HAI) detection between deep learning and conventional machine learning (ML) methods in French medical reports. METHODS The corpus consisted in different types of medical reports (discharge summaries, surgery reports, consultation reports, etc.). A total of 1,531 medical text documents were extracted and deidentified in three French university hospitals. Each of them was labeled as presence (1) or absence (0) of HAI. We started by normalizing the records using a list of preprocessing techniques. We calculated an overall performance metric, the F1 Score, to compare a deep learning method (convolutional neural network [CNN]) with the most popular conventional ML models (Bernoulli and multi-naïve Bayes, k-nearest neighbors, logistic regression, random forests, extra-trees, gradient boosting, support vector machines). We applied the hyperparameter Bayesian optimization for each model based on its HAI identification performances. We included the set of text representation as an additional hyperparameter for each model, using four different text representations (bag of words, term frequency-inverse document frequency, word2vec, and Glove). RESULTS CNN outperforms all other conventional ML algorithms for HAI classification. The best F1 Score of 97.7% ± 3.6% and best area under the curve score of 99.8% ± 0.41% were achieved when CNN was directly applied to the processed clinical notes without a pretrained word2vec embedding. Through receiver operating characteristic curve analysis, we could achieve a good balance between false notifications (with a specificity equal to 0.937) and system detection capability (with a sensitivity equal to 0.962) using the Youden's index reference. CONCLUSIONS The main drawback of CNNs is their opacity. To address this issue, we investigated CNN inner layers' activation values to visualize the most meaningful phrases in a document. This method could be used to build a phrase-based medical assistant algorithm to help the infection control practitioner to select relevant medical records. Our study demonstrated that deep learning approach outperforms other classification learning algorithms for automatically identifying HAIs in medical reports.

中文翻译：

深度学习与传统机器学习在法国临床叙事中检测与医疗保健相关的感染。

目的本文的目的是比较法国医学报告中深度学习与传统机器学习（ML）方法之间的医疗保健相关感染（HAI）检测的性能。方法语料库包含不同类型的医学报告（出院摘要，手术报告，咨询报告等）。在法国的三所大学医院中，总共提取了1,531份医学文本文件并进行了身份识别。它们每个都被标记为HAI（1）或不存在（0）。我们首先使用一系列预处理技术对记录进行标准化。我们计算了总体性能指标F1得分，以将深度学习方法（卷积神经网络[CNN]）与最流行的常规ML模型（伯努利和多朴素贝叶斯，k近邻，逻辑回归，随机森林，额外树，梯度增强，支持向量机）。我们基于每个模型的HAI识别性能对每个模型应用了超参数贝叶斯优化。我们使用四种不同的文本表示形式（单词袋，术语频率-反文档频率，word2vec和Glove）将文本表示形式集作为每个模型的附加超参数。结果CNN胜过HAI分类的所有其他常规ML算法。当将CNN直接应用于未经预先训练的word2vec嵌入的已处理临床笔记时，可获得最佳F1得分97.7％±3.6％和最佳曲线下面积99.8％±0.41％。通过接收器工作特性曲线分析，我们可以在错误通知（特异性等于0）之间取得良好的平衡。937）和使用Youden指数参考的系统检测能力（灵敏度等于0.962）。结论CNN的主要缺点是其不透明性。为了解决此问题，我们研究了CNN内层的激活值以可视化文档中最有意义的短语。该方法可用于构建基于短语的医疗助手算法，以帮助感染控制从业人员选择相关的医疗记录。我们的研究表明，深度学习方法在自动识别医学报告中的HAI方面优于其他分类学习算法。激活值以可视化文档中最有意义的短语。该方法可用于构建基于短语的医疗助手算法，以帮助感染控制从业人员选择相关的医疗记录。我们的研究表明，深度学习方法在自动识别医学报告中的HAI方面优于其他分类学习算法。激活值以可视化文档中最有意义的短语。该方法可用于构建基于短语的医疗助手算法，以帮助感染控制从业人员选择相关的医疗记录。我们的研究表明，深度学习方法在自动识别医疗报告中的HAI方面优于其他分类学习算法。

更新日期：2019-03-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>