当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TeKET : a Tree-Based Unsupervised Keyphrase Extraction Technique
Cognitive Computation ( IF 5.4 ) Pub Date : 2020-03-05 , DOI: 10.1007/s12559-019-09706-3
Gollam Rabby , Saiful Azad , Mufti Mahmud , Kamal Z. Zamli , Mohammed Mostafizur Rahman

Automatic keyphrase extraction techniques aim to extract quality keyphrases for higher level summarization of a document. Majority of the existing techniques are mainly domain-specific, which require application domain knowledge and employ higher order statistical methods, and computationally expensive and require large train data, which is rare for many applications. Overcoming these issues, this paper proposes a new unsupervised keyphrase extraction technique. The proposed unsupervised keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, is a domain-independent technique that employs limited statistical knowledge and requires no train data. This technique also introduces a new variant of a binary tree, called KeyPhrase Extraction (KePhEx) tree, to extract final keyphrases from candidate keyphrases. In addition, a measure, called Cohesiveness Index or CI, is derived which denotes a given node’s degree of cohesiveness with respect to the root. The CI is used in flexibly extracting final keyphrases from the KePhEx tree and is co-utilized in the ranking process. The effectiveness of the proposed technique and its domain and language independence are experimentally evaluated using available benchmark corpora, namely SemEval-2010 (a scientific articles dataset), Theses100 (a thesis dataset), and a German Research Article dataset, respectively. The acquired results are compared with other relevant unsupervised techniques belonging to both statistical and graph-based techniques. The obtained results demonstrate the improved performance of the proposed technique over other compared techniques in terms of precision, recall, and F1 scores.

中文翻译:

TeKET:基于树的无监督关键字提取技术

自动关键词提取技术旨在提取高质量的关键词,以对文档进行更高级别的汇总。现有技术的大多数主要是特定于领域的,这需要应用程序领域知识并采用高阶统计方法,并且计算量大且需要大量的火车数据,这对于许多应用程序来说是很少见的。克服这些问题,本文提出了一种新的无监督的关键短语提取技术。拟议的无监督密钥短语提取技术,称为TeKET基于树的密钥短语提取技术,是一种领域独立的技术,其使用的统计知识有限,不需要火车数据。该技术还引入了二叉树的新变种,称为KeyPhrase提取KePhEx)树,以从候选关键短语中提取最终关键短语。另外,得出一种度量,称为内聚性指数CI,它表示给定节点相对于根的内聚性程度。CI用于灵活地从KePhEx树中提取最终的关键短语,并在排名过程中共同使用。使用可用的基准语料库,即SemEval-2010(科学论文数据集),Thess100(论文数据集)和德国研究论文,对提出的技术的有效性及其领域和语言独立性进行了实验评估。数据集。将获得的结果与属于统计和基于图的技术的其他相关无监督技术进行比较。获得的结果证明了在精度,查全率和F1得分方面,所提出的技术优于其他比较技术的性能。
更新日期:2020-03-05
down
wechat
bug