当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text summarization using topic-based vector space model and semantic measure
Information Processing & Management ( IF 8.6 ) Pub Date : 2021-02-09 , DOI: 10.1016/j.ipm.2021.102536
Ramesh Chandra Belwal , Sawan Rai , Atul Gupta

The primary shortcoming associated with extractive text summarization is redundancy, where more than one sentence representing a similar type of information are incorporated in summary. In the last two decades, a lot of extractive text summarization methods have been proposed, but less attention was paid to the redundancy issue. In this paper, we propose a text summarization technique that incorporates topic modeling and semantic measure within the vector space model to find the extractive summary of the given text. Our main objective is to address the redundancy problem associated with summarization methods and include only those sentences in summary, which represent the maximum of the topics embedded in the given text document. We generate the topic vector of the given document by representing the sentences in an intermediate form using a vector space model and topic modeling. Moreover, to make the proposed method efficient, we incorporate the semantic similarity measure to find the relevance of the sentence. We introduce two different ways to create the topic vector from the given document, i.e., Combined topic vector and Individual topic vector approach. Evaluation results on two datasets show that the summaries generated by both variants (Combined and Individual topic vector techniques) of the proposed method are found to be closer to the human-generated summaries when compared with the existing text summarization methods.



中文翻译:

使用基于主题的向量空间模型和语义度量进行文本摘要

与抽取式文本摘要相关的主要缺点是冗余,其中摘要中包含了一个以上的代表相似信息类型的句子。在过去的二十年中,已经提出了很多提取文本摘要的方法,但是对冗余问题的关注却很少。在本文中,我们提出了一种文本摘要技术,该技术将主题建模和语义度量结合到向量空间模型中,以找到给定文本的提取摘要。我们的主要目标是解决与摘要方法相关的冗余问题,并且仅在摘要中包括那些句子,这些句子代表了嵌入给定文本文档中的最大主题。我们通过使用向量空间模型和主题建模以中间形式表示句子,从而生成给定文档的主题向量。此外,为了使所提出的方法高效,我们结合了语义相似性度量来找到句子的相关性。我们介绍了两种从给定文档创建主题向量的方法,即组合主题向量和单个主题向量方法。对两个数据集的评估结果表明,与现有的文本摘要方法相比,发现该方法的两种变体(组合和个体主题向量技术)生成的摘要都更接近于人工生成的摘要。我们采用语义相似性度量来找到句子的相关性。我们介绍了两种从给定文档创建主题向量的方法,即组合主题向量和单个主题向量方法。对两个数据集的评估结果表明,与现有的文本摘要方法相比,发现该方法的两种变体(组合和单个主题向量技术)生成的摘要都更接近于人类生成的摘要。我们采用语义相似性度量来找到句子的相关性。我们介绍了两种从给定文档创建主题向量的方法,即组合主题向量和单个主题向量方法。对两个数据集的评估结果表明,与现有的文本摘要方法相比,发现该方法的两种变体(组合和单个主题向量技术)生成的摘要都更接近于人类生成的摘要。

更新日期:2021-02-09
down
wechat
bug