当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Keyword extraction: Issues and methods
Natural Language Engineering ( IF 2.5 ) Pub Date : 2019-11-11 , DOI: 10.1017/s1351324919000457
Nazanin Firoozeh , Adeline Nazarenko , Fabrice Alizon , Béatrice Daille

Due to the considerable growth of the volume of text documents on the Internet and in digital libraries, manual analysis of these documents is no longer feasible. Having efficient approaches to keyword extraction in order to retrieve the ‘key’ elements of the studied documents is now a necessity. Keyword extraction has been an active research field for many years, covering various applications in Text Mining, Information Retrieval, and Natural Language Processing, and meeting different requirements. However, it is not a unified domain of research. In spite of the existence of many approaches in the field, there is no single approach that effectively extracts keywords from different data sources. This shows the importance of having a comprehensive review, which discusses the complexity of the task and categorizes the main approaches of the field based on the features and methods of extraction that they use. This paper presents a general introduction to the field of keyword/keyphrase extraction. Unlike the existing surveys, different aspects of the problem along with the main challenges in the field are discussed. This mainly includes the unclear definition of ‘keyness’, complexities of targeting proper features for capturing desired keyness properties and selecting efficient extraction methods, and also the evaluation issues. By classifying a broad range of state-of-the-art approaches and analysing the benefits and drawbacks of different features and methods, we provide a clearer picture of them. This review is intended to help readers find their way around all the works related to keyword extraction and guide them in choosing or designing a method that is appropriate for the application they are targeting.

中文翻译:

关键字提取:问题和方法

由于 Internet 和数字图书馆中文本文档量的显着增长,人工分析这些文档不再可行。现在有必要采用有效的关键字提取方法来检索所研究文档的“关键”元素。关键词提取多年来一直是一个活跃的研究领域,涵盖了文本挖掘、信息检索和自然语言处理中的各种应用,满足了不同的需求。但是,它不是一个统一的研究领域。尽管该领域存在许多方法,但没有一种方法可以有效地从不同的数据源中提取关键字。这表明进行全面审查的重要性,它讨论了任务的复杂性,并根据他们使用的提取特征和方法对该领域的主要方法进行了分类。本文介绍了关键字/关键短语提取领域的一般介绍。与现有的调查不同,讨论了该问题的不同方面以及该领域的主要挑战。这主要包括“关键性”的不明确定义、针对适当特征以捕获所需关键性属性和选择有效提取方法的复杂性,以及评估问题。通过对各种最先进的方法进行分类并分析不同特征和方法的优缺点,我们可以更清晰地了解它们。
更新日期:2019-11-11
down
wechat
bug