当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora
Natural Language Engineering ( IF 2.3 ) Pub Date : 2020-06-30 , DOI: 10.1017/s1351324920000352
Clément Dalloux , Vincent Claveau , Natalia Grabar , Lucas Emanuel Silva Oliveira , Claudia Maria Cabral Moro , Yohan Bonescki Gumiel , Deborah Ribeiro Carvalho

Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.

中文翻译:

在法语和巴西葡萄牙语生物医学语料库中检测否定及其范围的监督学习

否定内容的自动检测通常是各个领域的信息提取系统的先决条件。特别是在生物医学领域,这项任务很重要,因为否定起着重要作用。在这项工作中,提出了两个主要贡献。首先,我们使用迄今为止处理不佳的语言:巴西葡萄牙语和法语。因此,我们为这两种语言开发了新的语料库,这些语料库已被手动注释以标记否定线索及其范围。其次,我们提出了基于监督机器学习方法的自动方法,用于自动检测否定标记及其范围。这些方法在两种语言(巴西葡萄牙语和法语)和跨域(一般和生物医学语言)上下文中都显示出稳健性。该方法还在最先进的英语数据上得到了验证:它产生了非常好的结果并且优于其他现有方法。此外,该应用程序可在线访问和使用。我们假设,通过这些问题(新的注释语料库、可在线访问的应用程序和跨域健壮性),将增强结果的可重复性和 NLP 应用程序的健壮性。
更新日期:2020-06-30
down
wechat
bug