当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diverse feature set based Keyphrase extraction and indexing techniques
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2020-09-26 , DOI: 10.1007/s11042-020-09423-2
Saurabh Sharma , Vishal Gupta , Mamta Juneja

The internet changed the way that people communicate, and this has led to a vast amount of Text that is available in electronic format. It includes things like e-mail, technical and scientific reports, tweets, physician notes and military field reports. Providing key-phrases for these extensive text collections thus allows users to grab the essence of the lengthy contents quickly and helps to locate information with high efficiency. While designing a Keyword Extraction and Indexing system, it is essential to pick unique properties, called features. In this article, we proposed different unsupervised keyword extraction approaches, which is independent of the structure, size and domain of the documents. The proposed method relies on the novel and cognitive inspired set of standard, phrase, word embedding and external knowledge source features. The individual and selected feature results are reported through experimentation on four different datasets viz. SemEval, KDD, Inspec, and DUC. The selected (feature selection) and word embedding based features are the best features set to be used for keywords extraction and indexing among all mentioned datasets. That is the proposed distributed word vector with additional knowledge improves the results significantly over the use of individual features, combined features after feature selection and state-of-the-art. After successfully achieving the objective of developing various keyphrase extraction methods we also experimented it for document classification task.



中文翻译:

基于多样特征集的关键词提取和索引技术

互联网改变了人们的交流方式,这导致产生了大量以电子格式提供的文本。它包括诸如电子邮件,技术和科学报告,推文,医师说明和军事领域报告之类的内容。因此,为这些大量的文本集提供关键短语可以使用户快速掌握冗长内容的本质,并有助于高效地查找信息。在设计关键字提取和索引系统时,必须选择称为功能的独特属性。在本文中,我们提出了不同的无监督关键字提取方法,这些方法与文档的结构,大小和域无关。所提出的方法依赖于新颖,受认知启发的标准,短语,单词嵌入和外部知识源特征集。通过在四个不同的数据集上进行实验来报告单个和选定的特征结果。SemEval,KDD,Inspec和DUC。所选(功能选择)和基于单词嵌入的功能是用于所有提到的数据集之间的关键字提取和索引的最佳功能集。就是说,所提出的具有附加知识的分布式单词向量比单个特征,特征选择后的组合特征和最新技术的使用显着改善了结果。在成功实现开发各种关键词提取方法的目标之后,我们还针对文档分类任务进行了实验。所选(功能选择)和基于单词嵌入的功能是用于所有提到的数据集之间的关键字提取和索引的最佳功能集。就是说,所提出的具有附加知识的分布式单词向量比单个特征,特征选择后的组合特征和最新技术的使用显着改善了结果。在成功实现开发各种关键词提取方法的目标之后,我们还针对文档分类任务进行了实验。所选(功能选择)和基于单词嵌入的功能是用于所有提到的数据集之间的关键字提取和索引的最佳功能集。就是说,所提出的具有附加知识的分布式单词向量比单个特征,特征选择后的组合特征和最新技术的使用显着改善了结果。在成功实现开发各种关键词提取方法的目标之后,我们还针对文档分类任务进行了实验。

更新日期:2020-09-26
down
wechat
bug