当前位置: X-MOL 学术Comput. Methods Programs Biomed. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adverse Drug Reaction extraction: Tolerance to entity recognition errors and sub-domain variants
Computer Methods and Programs in Biomedicine ( IF 4.9 ) Pub Date : 2020-12-05 , DOI: 10.1016/j.cmpb.2020.105891
Sara Santiso , Alicia Pérez , Arantza Casillas

Background and Objective:This work tackles the Adverse Drug Reaction (ADR) extraction in Electronic Health Records (EHRs) written in Spanish. This task is within the framework of natural language processing. It consists of extracting relations between drug-disease pairs, with the drug as the causing agent of the reaction. To this end, a pipeline is employed: first, relevant clinical entities are recognized (e.g. drugs, active ingredients, findings, symptoms); next, drug-disease candidate pairs are judged as either ADR or non-ADR. To develop this task, it is necessary to tackle some challenges. First, EHRs show high lexical variability. Second, EHRs are scarce due to their sensitive information. Third, the ADR detection stage has to cope with errors derived from the entity recognition.

Methods:To develop the ADR detection we decided to employ a deep neural network approach. In order to asses the tolerance to external variations, the system was exposed to different levels of noise. First, with three corpora that contain documents from different hospitals, size and class imbalance ratio. Furthermore, it was exposed to cross-corpus relation extraction. Second, we assessed the sensitivity of the ADR detection stage to noise introduced by the automatic Medical Entity Recognition (MER).

Results:The system can cope with cross-hospital predictions provided that it was trained with a large corpus. In the most challenging situation an f-measure of 75.2 was achieved. With respect to the tolerance to errors derived from the entity recognition step, with a medical entity recognizer that missed 20% of the entities, the f-measure in the ADR detection stage decreased to 68.6.

Conclusions:The ADR extraction is tackled as a cause-effect relation extraction task between drugs and diseases. It is advisable to employ as many EHRs as possible in order to make more robust the ADR extraction. Despite the entities missed in the MER step, the drop in the performance is not high with the proposed system.



中文翻译:

药物不良反应提取:对实体识别错误和亚域变异的耐受性

背景与目的:这项工作解决了用西班牙语编写的电子健康记录(EHR)中的药物不良反应(ADR)提取问题。此任务在自然语言处理的框架内。它包括提取药物-疾病对之间的关​​系,并以药物为反应的起因。为此,采用了一条管道:首先,识别相关的临床实体(例如药物,有效成分,发现,症状);接下来,将候选药物疾病对判定为ADR或非ADR。为了完成这项任务,有必要解决一些挑战。首先,EHR显示出很高的词汇变异性。其次,EHR由于其敏感信息而稀缺。第三,ADR检测阶段必须应对源自实体识别的错误。

方法:为了开发ADR检测,我们决定采用深度神经网络方法。为了评估对外部变化的容忍度,系统暴露于不同级别的噪声中。首先,使用三个语料库,其中包含来自不同医院的文件,规模和类别失衡比率。此外,它还受到跨主体关系提取的影响。其次,我们评估了ADR检测阶段对自动医疗实体识别(MER)引入的噪声的敏感性。

结果:该系统可以应对跨医院的预测,前提是要使用大型语料库对其进行培训。在最具挑战性的情况下,可以达到75.2的f值。关于从实体识别步骤得出的错误的容忍度,医疗实体识别器遗漏了20 在实体中,ADR检测阶段的f度量降低到68.6。

结论: ADR提取是药物与疾病之间因果关系提取的任务。建议使用尽可能多的EHR,以使ADR提取更加可靠。尽管在MER步骤中缺少实体,但是使用建议的系统时性能下降并不高。

更新日期:2020-12-14
down
wechat
bug