当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge-guided unsupervised rhetorical parsing for text summarization
Information Systems ( IF 3.7 ) Pub Date : 2020-08-03 , DOI: 10.1016/j.is.2020.101615
Shengluan Hou , Ruqian Lu

Automatic text summarization (ATS) has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale corpora. However, there is still no guarantee that the generated summaries are grammatical, concise, and convey all salient information as the original documents have. To make the summarization results more faithful, this paper presents an unsupervised approach that combines rhetorical structure theory, deep neural model, and domain knowledge concern for ATS. This architecture mainly contains three components: domain knowledge base construction based on representation learning, the attentional encoder–decoder model for rhetorical parsing, and subroutine-based model for text summarization. Domain knowledge can be effectively used for unsupervised rhetorical parsing thus rhetorical structure trees for each document can be derived. In the unsupervised rhetorical parsing module, the idea of translation was adopted to alleviate the problem of data scarcity. The subroutine-based summarization model purely depends on the derived rhetorical structure trees and can generate content-balanced results. To evaluate the summary results without golden standard, we proposed an unsupervised evaluation metric, whose hyper-parameters were tuned by supervised learning. Experimental results show that, on a large-scale Chinese dataset, our proposed approach can obtain comparable performances compared with existing methods.



中文翻译:

知识指导的无监督修辞解析,用于文本摘要

得益于深度学习的最新进展和大规模语料库的使用,自动文本摘要(ATS)最近取得了骄人的成绩。但是,仍然不能保证所生成的摘要是语法上的,简明的,并且能够传达原始文档所具有的所有重要信息。为了使总结结果更加真实,本文提出了一种无监督的方法,该方法结合了针对ATS的修辞结构理论,深层神经模型和领域知识关注。该体系结构主要包含三个部分:基于表示学习的领域知识库构建,用于修辞解析的注意编码器-解码器模型以及用于文本摘要的基于子例程的模型。领域知识可以有效地用于无监督的修辞分析,因此可以导出每个文档的修辞结构树。在无监督的修辞分析模块中,采用了翻译的思想来减轻数据短缺的问题。基于子例程的摘要模型仅依赖于派生的修辞结构树,并且可以生成内容平衡的结果。为了在没有黄金标准的情况下评估汇总结果,我们提出了一种无监督的评估指标,该指标的超参数通过有监督的学习进行了调整。实验结果表明,在大规模的中文数据集上,我们提出的方法与现有方法相比可以获得可比的性能。翻译的思想被用来减轻数据短缺的问题。基于子例程的摘要模型仅依赖于派生的修辞结构树,并且可以生成内容平衡的结果。为了在没有黄金标准的情况下评估汇总结果,我们提出了一种无监督的评估指标,该指标的超参数通过有监督的学习进行了调整。实验结果表明,在大规模的中文数据集上,我们提出的方法与现有方法相比可以获得可比的性能。翻译的思想被用来减轻数据短缺的问题。基于子例程的摘要模型仅依赖于派生的修辞结构树,并且可以生成内容平衡的结果。为了在没有黄金标准的情况下评估汇总结果,我们提出了一种无监督的评估指标,该指标的超参数通过有监督的学习进行了调整。实验结果表明,在大规模的中文数据集上,我们提出的方法与现有方法相比可以获得可比的性能。其超参数是通过监督学习进行调整的。实验结果表明,在大规模的中文数据集上,我们提出的方法与现有方法相比可以获得可比的性能。其超参数是通过监督学习进行调整的。实验结果表明,在大规模的中文数据集上,我们提出的方法与现有方法相比可以获得可比的性能。

更新日期:2020-08-03
down
wechat
bug