当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision
arXiv - CS - Computation and Language Pub Date : 2020-06-28 , DOI: arxiv-2006.15509
Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, Chao Zhang

We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge bases. To address this challenge, we propose a new computational framework -- BOND, which leverages the power of pre-trained language models (e.g., BERT and RoBERTa) to improve the prediction performance of NER models. Specifically, we propose a two-stage training algorithm: In the first stage, we adapt the pre-trained language model to the NER tasks using the distant labels, which can significantly improve the recall and precision; In the second stage, we drop the distant labels, and propose a self-training approach to further improve the model performance. Thorough experiments on 5 benchmark datasets demonstrate the superiority of BOND over existing distantly supervised NER methods. The code and distantly labeled data have been released in https://github.com/cliang1453/BOND.

中文翻译:

BOND:具有远程监督的 BERT 辅助的开放域命名实体识别

我们研究了远程监督下的开放域命名实体识别 (NER) 问题。远程监督虽然不需要大量手动注释,但会通过外部知识库产生高度不完整和嘈杂的远程标签。为了应对这一挑战,我们提出了一种新的计算框架——BOND,它利用预训练语言模型(例如,BERT 和 RoBERTa)的力量来提高 NER 模型的预测性能。具体来说,我们提出了一个两阶段的训练算法:在第一阶段,我们使用远距离标签使预训练的语言模型适应 NER 任务,这可以显着提高召回率和准确率;在第二阶段,我们去掉远处的标签,并提出一种自我训练的方法来进一步提高模型性能。对 5 个基准数据集的彻底实验证明了 BOND 优于现有的远程监督 NER 方法。代码和远距离标记的数据已在 https://github.com/cliang1453/BOND 中发布。
更新日期:2020-06-30
down
wechat
bug