当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!
arXiv - CS - Computation and Language Pub Date : 2020-09-22 , DOI: arxiv-2009.10684
Bruno Taill\'e, Vincent Guigue, Geoffrey Scoutheeten and Patrick Gallinari

Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the impact of the most common mistake and evaluate it leads to overestimating the final RE performance by around 5% on ACE05. We also seize this opportunity to study the unexplored ablations of two recent developments: the use of language model pretraining (specifically BERT) and span-level NER. This meta-analysis emphasizes the need for rigor in the report of both the evaluation setting and the datasets statistics and we call for unifying the evaluation setting in end-to-end RE.

中文翻译:

让我们停止端到端关系提取中的错误比较!

尽管努力区分三种不同的评估设置(Bekoulis 等人,2018 年),但许多端到端关系提取 (RE) 文章与以前的工作相比表现出不可靠的性能。在本文中,我们首先确定已发表论文中的几种无效比较模式,并对其进行描述以避免其传播。然后,我们提出了一项小型实证研究,以量化最常见错误的影响,并评估它导致在 ACE05 上高估最终 RE 性能约 5%。我们还抓住这个机会研究了两个近期发展的未探索消融:语言模型预训练(特别是 BERT)和跨度级 NER 的使用。
更新日期:2020-10-27
down
wechat
bug