Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts,Journal of Data and Information Science

当前位置： X-MOL 学术 › Journal of Data and Information Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts
Journal of Data and Information Science ( IF 1.5 ) Pub Date : 2019-12-27 , DOI: 10.2478/jdis-2019-0020
Gaihong Yu _{1,

2} , Zhixiong Zhang _{1,

2,

3} , Huan Liu _{1,

2} , Liangping Ding _{1,

2}

Affiliation

Abstract Purpose Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms the BERT-based method. Design/methodology/approach Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT masked language model (MLM), we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. Then, we compare our model with HSLN-RNN, BERT-based and SciBERT using the same dataset. Findings Compared with the BERT-based and SciBERT models, the F1 score of our model outperforms them by 4.96% and 4.34%, respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present. Research limitations The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed, which is a typical biomedical database, to fine-tune our model. Practical implications The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences. Originality/value T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way. The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.

中文翻译：

基于BERT的掩蔽句子模型在医学论文摘要中的移动识别。

摘要目的科学摘要中的移动识别是一项NLP任务，将摘要的句子分类为不同类型的语言单元。为了提高运动识别在科学文摘中的性能，提出了一种优于基于BERT的运动识别模型。设计/方法/方法基于BERT的用于句子分类的流行模型通常对句子进行分类而不考虑句子的上下文。在本文中，受BERT掩码语言模型（MLM）的启发，我们提出了一种称为掩码句子模型的新颖模型，该模型在移动识别中集成了句子的内容和上下文信息。分三个步骤在基准数据集PubMed 20K RCT上进行实验。然后，我们将模型与HSLN-RNN进行比较，基于BERT的和SciBERT使用相同的数据集。研究结果与基于BERT的模型和SciBERT的模型相比，我们的模型的F1得分分别胜过4.96％和4.34％，这表明该新颖模型的可行性和有效性，并且我们的模型结果与状态最接近- HSLN-RNN的最新结果。研究局限性没有考虑移动标签的顺序特征，这可能是HSLN-RNN性能更好的原因之一。我们的模型仅限于处理生物医学英语文献，因为我们使用来自典型医学生物数据库PubMed的数据集来微调我们的模型。实际意义提出的模型在识别科学摘要中的移动结构时更好，更简单，并且值得进行文本分类实验以获取句子的上下文特征。独创性/价值他研究提出了一种基于BERT的蒙版句子模型，该模型以新的方式考虑了摘要中句子的上下文特征。通过在不更改神经网络结构的情况下重建输入层，可以显着提高此分类模型的性能。

更新日期：2019-12-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文