当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding
arXiv - CS - Computation and Language Pub Date : 2021-06-09 , DOI: arxiv-2106.04939 Narjes Nikzad-Khasmakhi, Mohammad-Reza Feizi-Derakhshi, Meysam Asgari-Chenaghlu, Mohammad-Ali Balafar, Ali-Reza Feizi-Derakhshi, Taymaz Rahkar-Farshi, Majid Ramezani, Zoleikha Jahanbakhsh-Nagadeh, Elnaz Zafarani-Moattar, Mehrdad Ranjbar-Khadivi
arXiv - CS - Computation and Language Pub Date : 2021-06-09 , DOI: arxiv-2106.04939 Narjes Nikzad-Khasmakhi, Mohammad-Reza Feizi-Derakhshi, Meysam Asgari-Chenaghlu, Mohammad-Ali Balafar, Ali-Reza Feizi-Derakhshi, Taymaz Rahkar-Farshi, Majid Ramezani, Zoleikha Jahanbakhsh-Nagadeh, Elnaz Zafarani-Moattar, Mehrdad Ranjbar-Khadivi
Background: Keyword extraction is a popular research topic in the field of
natural language processing. Keywords are terms that describe the most relevant
information in a document. The main problem that researchers are facing is how
to efficiently and accurately extract the core keywords from a document.
However, previous keyword extraction approaches have utilized the text and
graph features, there is the lack of models that can properly learn and combine
these features in a best way. Methods: In this paper, we develop a multimodal Key-phrase extraction
approach, namely Phraseformer, using transformer and graph embedding
techniques. In Phraseformer, each keyword candidate is presented by a vector
which is the concatenation of the text and structure learning representations.
Phraseformer takes the advantages of recent researches such as BERT and ExEm to
preserve both representations. Also, the Phraseformer treats the key-phrase
extraction task as a sequence labeling problem solved using classification
task. Results: We analyze the performance of Phraseformer on three datasets
including Inspec, SemEval2010 and SemEval 2017 by F1-score. Also, we
investigate the performance of different classifiers on Phraseformer method
over Inspec dataset. Experimental results demonstrate the effectiveness of
Phraseformer method over the three datasets used. Additionally, the Random
Forest classifier gain the highest F1-score among all classifiers. Conclusions: Due to the fact that the combination of BERT and ExEm is more
meaningful and can better represent the semantic of words. Hence, Phraseformer
significantly outperforms single-modality methods.
中文翻译:
Phraseformer:使用 Transformer 和图嵌入的多模态关键短语提取
背景:关键词提取是自然语言处理领域的热门研究课题。关键字是描述文档中最相关信息的术语。研究人员面临的主要问题是如何高效准确地从文档中提取核心关键词。然而,以前的关键字提取方法利用了文本和图形特征,缺乏能够以最佳方式正确学习和组合这些特征的模型。方法:在本文中,我们开发了一种多模态关键短语提取方法,即 Phraseformer,使用变换器和图嵌入技术。在 Phraseformer 中,每个候选关键字都由一个向量表示,该向量是文本和结构学习表示的串联。Phraseformer 利用 BERT 和 ExEm 等近期研究的优势来保留这两种表示。此外,短语形成器将关键短语提取任务视为使用分类任务解决的序列标记问题。结果:我们通过 F1 分数分析了 Phraseformer 在三个数据集上的性能,包括 Inspec、SemEval2010 和 SemEval 2017。此外,我们研究了不同分类器在 Inspec 数据集上的 Phraseformer 方法的性能。实验结果证明了 Phraseformer 方法对使用的三个数据集的有效性。此外,随机森林分类器在所有分类器中获得最高的 F1 分数。结论:由于BERT和ExEm的组合更有意义,可以更好地表示单词的语义。因此,
更新日期:2021-06-10
中文翻译:
Phraseformer:使用 Transformer 和图嵌入的多模态关键短语提取
背景:关键词提取是自然语言处理领域的热门研究课题。关键字是描述文档中最相关信息的术语。研究人员面临的主要问题是如何高效准确地从文档中提取核心关键词。然而,以前的关键字提取方法利用了文本和图形特征,缺乏能够以最佳方式正确学习和组合这些特征的模型。方法:在本文中,我们开发了一种多模态关键短语提取方法,即 Phraseformer,使用变换器和图嵌入技术。在 Phraseformer 中,每个候选关键字都由一个向量表示,该向量是文本和结构学习表示的串联。Phraseformer 利用 BERT 和 ExEm 等近期研究的优势来保留这两种表示。此外,短语形成器将关键短语提取任务视为使用分类任务解决的序列标记问题。结果:我们通过 F1 分数分析了 Phraseformer 在三个数据集上的性能,包括 Inspec、SemEval2010 和 SemEval 2017。此外,我们研究了不同分类器在 Inspec 数据集上的 Phraseformer 方法的性能。实验结果证明了 Phraseformer 方法对使用的三个数据集的有效性。此外,随机森林分类器在所有分类器中获得最高的 F1 分数。结论:由于BERT和ExEm的组合更有意义,可以更好地表示单词的语义。因此,