EdgeSumm: Graph-based framework for automatic text summarization,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EdgeSumm: Graph-based framework for automatic text summarization
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-06-30 , DOI: 10.1016/j.ipm.2020.102264
Wafaa S. El-Kassas , Cherif R. Salama , Ahmed A. Rafea , Hoda K. Mohamed

Searching the Internet for a certain topic can become a daunting task because users cannot read and comprehend all the resulting texts. Automatic Text summarization (ATS) in this case is clearly beneficial because manual summarization is expensive and time-consuming. To enhance ATS for single documents, this paper proposes a novel extractive graph-based framework “EdgeSumm” that relies on four proposed algorithms. The first algorithm constructs a new text graph model representation from the input document. The second and third algorithms search the constructed text graph for sentences to be included in the candidate summary. When the resulting candidate summary still exceeds a user-required limit, the fourth algorithm is used to select the most important sentences. EdgeSumm combines a set of extractive ATS methods (namely graph-based, statistical-based, semantic-based, and centrality-based methods) to benefit from their advantages and overcome their individual drawbacks. EdgeSumm is general for any document genre (not limited to a specific domain) and unsupervised so it does not require any training data. The standard datasets DUC2001 and DUC2002 are used to evaluate EdgeSumm using the widely used automatic evaluation tool: Recall-Oriented Understudy for Gisting Evaluation (ROUGE). EdgeSumm gets the highest ROUGE scores on DUC2001. For DUC2002, the evaluation results show that the proposed framework outperforms the state-of-the-art ATS systems by achieving improvements of 1.2% and 4.7% over the highest scores in the literature for the metrics of ROUGE-1 and ROUGE-L respectively. In addition, EdgeSumm achieves very competitive results for the metrics of ROUGE-2 and ROUGE-SU4.

中文翻译：

EdgeSumm：用于自动文本摘要的基于图的框架

在Internet上搜索某个主题可能会成为一项艰巨的任务，因为用户无法阅读和理解所有生成的文本。在这种情况下，自动文本摘要（ATS）显然是有利的，因为手动摘要既昂贵又费时。为了增强单个文档的ATS，本文提出了一种新颖的基于提取图的框架“ EdgeSumm”，该框架依赖于所提出的四种算法。第一种算法从输入文档构造一个新的文本图形模型表示。第二和第三种算法在构造的文本图中搜索要包含在候选摘要中的句子。当结果候选摘要仍超过用户要求的限制时，将使用第四种算法选择最重要的句子。EdgeSumm结合了一组提取式ATS方法（即基于图形的，基于统计的，基于语义的和基于中心性的方法）以受益于它们的优势并克服其各自的缺点。EdgeSumm适用于所有类型的文档（不限于特定领域）并且不受监督，因此不需要任何培训数据。标准数据集DUC2001和DUC2002用于使用广泛使用的自动评估工具来评估EdgeSumm：面向召回评估的面向召回的研究不足（ROUGE）。EdgeSumm在DUC2001上获得最高的ROUGE分数。对于DUC2002，评估结果表明，所提出的框架通过将ROUGE-1和ROUGE-L的指标分别比文献中的最高分数提高了1.2％和4.7％，从而胜过了最新的ATS系统。。此外，EdgeSumm在ROUGE-2和ROUGE-SU4的度量标准上获得了非常具有竞争力的结果。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>