当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TNT-KID: Transformer-based neural tagger for keyword identification
Natural Language Engineering ( IF 2.5 ) Pub Date : 2021-06-10 , DOI: 10.1017/s1351324921000127
Matej Martinc , Blaž Škrlj , Senja Pollak

With growing amounts of available textual data, development of algorithms capable of automatic analysis, categorization, and summarization of these data has become a necessity. In this research, we present a novel algorithm for keyword identification, that is, an extraction of one or multiword phrases representing key aspects of a given document, called Transformer-Based Neural Tagger for Keyword IDentification (TNT-KID). By adapting the transformer architecture for a specific task at hand and leveraging language model pretraining on a domain-specific corpus, the model is capable of overcoming deficiencies of both supervised and unsupervised state-of-the-art approaches to keyword extraction by offering competitive and robust performance on a variety of different datasets while requiring only a fraction of manually labeled data required by the best-performing systems. This study also offers thorough error analysis with valuable insights into the inner workings of the model and an ablation study measuring the influence of specific components of the keyword identification workflow on the overall performance.



中文翻译:

TNT-KID:用于关键字识别的基于 Transformer 的神经标记器

随着可用文本数据量的不断增加,开发能够自动分析、分类和汇总这些数据的算法已成为必要。在这项研究中,我们提出了一种新的关键字识别算法,即提取一个或多个表示给定文档关键方面的词组,称为基于变换器的关键字识别神经标记器 (TNT-KID)。通过为手头的特定任务调整转换器架构,并在特定领域的语料库上利用语言模型预训练,该模型能够通过在各种不同的数据集上提供有竞争力和稳健的性能,同时只需要表现最佳的数据集所需的一小部分手动标记数据,从而克服了有监督和无监督的最先进的关键字提取方法的缺陷系统。该研究还提供了全面的错误分析,对模型的内部工作具有有价值的见解,并进行了一项消融研究,测量关键字识别工作流程的特定组件对整体性能的影响。

更新日期:2021-06-10
down
wechat
bug