当前位置: X-MOL 学术Comput. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A fast, accurate, and generalisable heuristic-based negation detection algorithm for clinical text
Computers in Biology and Medicine ( IF 7.7 ) Pub Date : 2021-01-16 , DOI: 10.1016/j.compbiomed.2021.104216
Luke T Slater 1 , William Bradlow 2 , Dino Fa Motti 2 , Robert Hoehndorf 3 , Simon Ball 4 , Georgios V Gkoutos 5
Affiliation  

Negation detection is an important task in biomedical text mining. Particularly in clinical settings, it is of critical importance to determine whether findings mentioned in text are present or absent. Rule-based negation detection algorithms are a common approach to the task, and more recent investigations have resulted in the development of rule-based systems utilising the rich grammatical information afforded by typed dependency graphs. However, interacting with these complex representations inevitably necessitates complex rules, which are time-consuming to develop and do not generalise well. We hypothesise that a heuristic approach to determining negation via dependency graphs could offer a powerful alternative. We describe and implement an algorithm for negation detection based on grammatical distance from a negatory construct in a typed dependency graph. To evaluate the algorithm, we develop two testing corpora comprised of sentences of clinical text extracted from the MIMIC-III database and documents related to hypertrophic cardiomyopathy patients routinely collected at University Hospitals Birmingham NHS trust. Gold-standard validation datasets were built by a combination of human annotation and examination of algorithm error. Finally, we compare the performance of our approach with four other rule-based algorithms on both gold-standard corpora. The presented algorithm exhibits the best performance by f-measure over the MIMIC-III dataset, and a similar performance to the syntactic negation detection systems over the HCM dataset. It is also the fastest of the dependency-based negation systems explored in this study. Our results show that while a single heuristic approach to dependency-based negation detection is ignorant to certain advanced cases, it nevertheless forms a powerful and stable method, requiring minimal training and adaptation between datasets. As such, it could present a drop-in replacement or augmentation for many-rule negation approaches in clinical text-mining pipelines, particularly for cases where adaptation and rule development is not required or possible.



中文翻译:

一种快速,准确,可推广的基于启发式的临床文本否定检测算法

否定检测是生物医学文本挖掘中的重要任务。特别是在临床环境中,确定是否存在正文中提到的发现至关重要。基于规则的否定检测算法是完成此任务的常用方法,最近的研究已经开发出利用类型化依赖图提供的丰富语法信息的基于规则的系统的开发。但是,与这些复杂的表示进行交互不可避免地需要复杂的规则,这需要耗时的开发并且不能很好地概括。我们假设,一种通过依赖图确定否定的启发式方法可以提供强大的替代方法。我们描述并实现了一种基于否定检测的算法,该算法基于距类型化依赖图中否定构造的语法距离。为了评估该算法,我们开发了两个测试语料库,包括从MIMIC-III数据库中提取的临床文本句子以及与在大学伯明翰NHS信托中常规收集的肥厚型心肌病患者相关的文档。金标准的验证数据集是通过人工注释和算法错误检查而建立的。最后,我们在两种黄金标准语料库上比较了我们的方法与其他四种基于规则的算法的性能。提出的算法在MIMIC-III数据集上通过f度量表现出最佳性能,并且在HCM数据集上具有与语法否定检测系统相似的性能。在本研究中,它也是最快的基于依赖的否定系统。我们的结果表明,尽管基于启发式的否定检测的一种启发式方法对于某些高级案例是无知的,但它形成了一种功能强大且稳定的方法,需要对数据集进行最少的培训和适应。这样,它可以为临床文本挖掘管道中的多规则否定方法提供直接替换或增强功能,特别是对于不需要或不可能进行适应和规则制定的情况。要求对数据集进行最少的培训和调整。这样,它可以为临床文本挖掘管道中的多规则否定方法提供直接替换或增强功能,特别是对于不需要或不可能进行适应和规则制定的情况。要求对数据集进行最少的培训和调整。这样,它可以为临床文本挖掘管道中的多规则否定方法提供直接替换或增强功能,特别是对于不需要或不可能进行适应和规则制定的情况。

更新日期:2021-01-22
down
wechat
bug