当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of rule-based and neural network models for negation detection in radiology reports
Natural Language Engineering ( IF 2.5 ) Pub Date : 2020-11-18 , DOI: 10.1017/s1351324920000509
D. Sykes , A. Grivas , C. Grover , R. Tobin , C. Sudlow , W. Whiteley , A. Mcintosh , H. Whalley , B. Alex

Using natural language processing, it is possible to extract structured information from raw text in the electronic health record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult. We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, (https://www.ltg.ed.ac.uk/software/edie-r/) and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study (ESS) and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans. These models are compared with two existing rule-based models: pyConText (Harkema et al. 2009. Journal of Biomedical Informatics42(5), 839–851), a python implementation of a generalisation of NegEx, and NegBio (Peng et al. 2017. NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898), which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods. EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data were used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size. The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models.

中文翻译:

放射学报告中否定检测的基于规则和神经网络模型的比较

使用自然语言处理,可以以相当高的准确度从电子健康记录 (EHR) 中的原始文本中提取结构化信息。然而,临床术语的否定和非否定提及之间的准确区分仍然是一个挑战。EHR 文本包括声明疾病不存在或仅假设存在的情况,这意味着当疾病未报告为存在时,可以在报告中提及该疾病。这使得文档分类和摘要等任务更加困难。我们开发了基于规则的 EdIE-R-Neg,它是现有文本挖掘管道的一部分,称为 EdIE-R(放射学报告的爱丁堡信息提取),用于处理脑成像报告,(https://www.ltg.ed.ac.uk/software/edie-r/) 和两种机器学习方法;一个使用双向长短期记忆网络,另一个使用前馈神经网络。这些是根据爱丁堡中风研究 (ESS) 的数据开发的,并根据 NHS Tayside (Tayside) 的常规报告数据进行测试。这两个数据集都包含来自医学扫描的书面报告。这些模型与两个现有的基于规则的模型进行了比较:pyConText (Harkema等人. 2009 年。生物医学信息学杂志42(5), 839–851),NegEx 和 NegBio (Peng) 泛化的 python 实现等人. 2017. NegBio:用于放射学报告中的否定和不确定性检测的高性能工具。arXiv 电子版,p。arXiv:1712.05898),它通过应用于句子句法表示的模式来识别否定范围。在我们开发模型的数据集的测试集以及非常相似的 Tayside 测试集上,神经网络模型和我们定制的基于规则的系统都优于现有方法。EdIE-R-Neg 在 F1 分数上得分最高,特别是在 Tayside 数据集的测试集上,这些实验中没有使用开发数据,显示了定制的基于规则的系统在该数据集上进行否定检测的能力尺寸。通过将开发 Tayside 数据添加到 ESS 训练集中,机器学习模型在 Tayside 测试集上与 EdIE-R-Neg 的性能差距缩小,证明了神经网络模型的适应性。
更新日期:2020-11-18
down
wechat
bug