当前位置: X-MOL 学术Egypt. Inform. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extractive multi-document text summarization based on graph independent sets
Egyptian Informatics Journal ( IF 5.0 ) Pub Date : 2020-01-03 , DOI: 10.1016/j.eij.2019.12.002
Taner Uçkan , Ali Karcı

We propose a novel methodology for extractive, generic summarization of text documents. The Maximum Independent Set, which has not been used previously in any summarization study, has been utilized within the context of this study. In addition, a text processing tool, which we named KUSH, is suggested in order to preserve the semantic cohesion between sentences in the representation stage of introductory texts. Our anticipation was that the set of sentences corresponding to the nodes in the independent set should be excluded from the summary. Based on this anticipation, the nodes forming the Independent Set on the graphs are identified and removed from the graph. Thus, prior to quantification of the effect of the nodes on the global graph, a limitation is applied on the documents to be summarized. This limitation prevents repetition of word groups to be included in the summary. Performance of the proposed approach on the Document Understanding Conference (DUC-2002 and DUC-2004) datasets was calculated using ROUGE evaluation metrics. The developed model achieved a 0.38072 ROUGE performance value for 100-word summaries, 0.51954 for 200-word summaries, and 0.59208 for 400-word summaries. The values reported throughout the experimental processes of the study reveal the contribution of this innovative method.



中文翻译:

基于图独立集的抽取式多文档文本摘要

我们提出了一种新颖的方法,用于文本文件的提取,通用摘要。在任何摘要研究中都未使用过的最大独立集已在本研究的范围内使用。另外,建议使用一种文本处理工具(我们称为KUSH),以便在介绍性文本的表示阶段保持句子之间的语义衔接。我们的预期是,应将与独立集中的节点相对应的句子集从摘要中排除。基于此预期,确定在图上形成独立集的节点,并将其从图上删除。因此,在量化节点对全局图的影响之前,对要汇总的文档施加了限制。此限制阻止了要在摘要中包含的单词组的重复。使用ROUGE评估指标计算了拟议方法在文档理解会议(DUC-2002和DUC-2004)数据集上的性能。开发的模型对于100个单词的摘要达到0.38072 ROUGE性能值,对于200个单词的摘要达到0.51954,对于400个单词的摘要达到0.59208。在整个研究实验过程中报告的值揭示了这种创新方法的贡献。

更新日期:2020-01-03
down
wechat
bug