当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SEPT: Improving Scientific Named Entity Recognition with Span Representation
arXiv - CS - Information Retrieval Pub Date : 2019-11-08 , DOI: arxiv-1911.03353
Tan Yan, Heyan Huang, Xian-Ling Mao

We introduce a new scientific named entity recognizer called SEPT, which stands for Span Extractor with Pre-trained Transformers. In recent papers, span extractors have been demonstrated to be a powerful model compared with sequence labeling models. However, we discover that with the development of pre-trained language models, the performance of span extractors appears to become similar to sequence labeling models. To keep the advantages of span representation, we modified the model by under-sampling to balance the positive and negative samples and reduce the search space. Furthermore, we simplify the origin network architecture to combine the span extractor with BERT. Experiments demonstrate that even simplified architecture achieves the same performance and SEPT achieves a new state of the art result in scientific named entity recognition even without relation information involved.

中文翻译:

SEPT:使用跨度表示改进科学命名实体识别

我们引入了一种新的科学命名实体识别器,称为 SEPT,它代表具有预训练变压器的跨度提取器。在最近的论文中,与序列标记模型相比,跨度提取器已被证明是一种强大的模型。然而,我们发现随着预训练语言模型的发展,跨度提取器的性能似乎变得与序列标记模型相似。为了保持跨度表示的优势,我们通过欠采样修改模型以平衡正负样本并减少搜索空间。此外,我们简化了原始网络架构,将跨度提取器与 BERT 相结合。
更新日期:2020-10-14
down
wechat
bug