GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports,Earth and Space Science

当前位置： X-MOL 学术 › Earth Space Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports
Earth and Space Science ( IF 2.9 ) Pub Date : 2021-03-11 , DOI: 10.1029/2020ea001602
Qinjun Qiu _{1,

2} , Zhong Xie _{1,

2} , Hong Xie ₃ , Bin Wang ₂

Affiliation

As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships.

中文翻译：

GKEEP：具有错误反馈传播功能的增强型基于图的关键字提取器，用于地球科学报告

随着已发表的地球科学文献数量的增加，阅读和总结大量文献的文本已成为一项具有挑战性的任务。发布关键字可以被认为是知识结构表示的基本组成部分，并已被用来揭示有关研究领域的知识。与其他研究领域使用的数据相比，涉及关键词提取的文本地球科学数据的工作是有限的。在本文中，我们提出了一种无监督算法，即具有错误反馈传播的基于图的关键字提取器（GKEEP），该算法通过使用类似于反向传播概念的错误反馈机制来增强基于图的关键字提取方法。所提出的方法包括以下步骤。预处理的文档用作所提出模型的输入，并表示为加权无向图，其中顶点表示单词，而边缘表示受窗口大小约束的单词之间的共现关系。随后，通过基于图的排名算法计算出的重要性得分对节点进行排名。因此，所有单词都有其自己的分数，它们被用来计算关键字候选者的分数。随后，使用Word2Vec方法重新计算候选关键字的分数，并对候选关键字进行排名，以选择最终关键字。它还利用错误反馈来提高否则将被认为不那么重要的最显着术语的排名。通过对两个真实数据集（包括我们新建的数据集）进行的实验，建议的GKEEP模型优于最新的无监督模型和现有的基于图的排名模型。所提出的方法可以有效地反映内在关键词的语义和相互关系。

更新日期：2021-05-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11