FAR-ASS: Fact-aware reinforced abstractive sentence summarization
Introduction
With the unprecedented growth of textual information on the Internet, how to mine useful knowledge from a large amount of redundant information efficiently is a great challenge (Dybala et al., 2017; Nowakowski et al., 2019; Rzepka, Takishita, & Araki, 2020), which has necessitated the development of highly efficient automatic summarization systems (Barros et al., 2019; Gambhir & Gupta, 2017; Mohamed & Oussalah, 2019). The most essential purpose of a summarization system is to generate a concise, readable and factual summary of the input text while keeping its gist (Dong et al., 2018; Jadhav & Rajan, 2018; Li et al., 2018). At present, there are two main types of summarization systems: extractive (Dong et al., 2018; Jadhav & Rajan, 2018; Zhang et al., 2018) and abstractive (Chen et al., 2016; Deng et al., 2020; Takase et al., 2016; Zheng et al., 2020). Extractive systems directly copy a few significant keywords from the source text to form a summary, which is actually a simple compression of the source sentences. Abstractive systems can automatically generate new words and linguistic phrases that are not present in the input sentences. Compared with extractive methods, abstractive summarization is considered much closer to the way humans make a summary, but it also causes more challenges, such as poor readability and factual discrepancy (Li et al., 2020; Zhang et al., 2020).
In this paper, we focus on the task of abstractive sentence summarization, which generates a shorter sentence while maintaining the original meaning of the input sentences. Unlike document-level summarization, the original text of the sentence-level summarization task is shorter, so it is impossible to directly extract the existing sentences to form a summary. Recently, neural network models based on the encoder-decoder architecture have demonstrated powerful capabilities in the sentence summarization task. These models can generate summaries with very high ROUGE scores (Cao et al., 2017). However, sentence summarization inevitably needs to tailor, modify, reorganize and fuse the input sentences. Therefore, the generated sentences often do not match the original relations, resulting in factual errors. Several researchers have conducted research on the factual consistency in summaries (Falke, Ribeiro, Utama, Dagan, & Gurevych, 2019; Goodrich et al., 2019; Kryściński, McCann, Xiong, & Socher, 2019), and they have found that nearly 30% of summaries generated using abstractive models contain fake facts.
In fact, for downstream tasks, such as data mining and information retrieval, the generated abstractive summaries with excessive factual errors are almost useless in practice. However, previous researchers (Mehta & Majumder, 2018; Paulus, Xiong, & Socher, 2018) have focused on optimizing models to improve the informativeness of generated summaries, which leads to a high ROUGE score, but some facts are discrepant with the original text. As shown in Fig. 1, the seq2seq-baseline model (Nallapati et al., 2016) and the PG (pointer-generator) network (See, Liu, & Manning, 2017) produce the same fake fact, i.e., the subject of the verb “build” is “intel” instead of “vietnamese government”, which results in an entirely different fact from the original text. Consequently, although the summaries are highly informative (ROUGE-L = 0.49) and readable, they are useless due to being discrepant with the original facts.
Intuitively, for NLG (natural language generation) tasks, fact fabrication is a serious problem, which directly determines the usability of generated text. Nevertheless, existing abstractive summarization models rarely pay attention to improving the factual correctness of generated summaries. Some sporadic attempts have limited success. For example, in 2017, Cao et al. (2017) used the OpenIE (Open Information Extraction) (Angeli, Premkumar, & Manning, 2015) systems to extract the fact descriptions of the original input and then encoded them into the attention mechanism together with the original input. In 2019, Falke et al. (2019) used natural language inference systems to evaluate the factual consistency of generated summaries for the first time. On this basis, they reranked the summaries. In 2019, Kryściński et al. (2019) proposed a weakly supervised model to evaluate the factual correctness of generated summaries.
In this work, our goal is to optimize the factual correctness of existing neural abstractive summarization models. In order to maintain the factual consistency between the generated text and the original input, we must first extract fact descriptions. To this end, we take advantage of popular tools OpenIE and dependency parser. OpenIE represents a fact as a relation triple consisting of (subject; predicate; object). But, for different sentences, complete relation triples are not always available. Therefore, we utilize the dependency parser to mine suitable relation tuples to further expand the facts. On this basis, we design a fact extraction scheme that can extract complete structured relation tuples from text to describe the facts. Then, we define a factual correctness score by comparing the relation tuples between the original text and generated summary. Furthermore, we also develop a mixed-objective learning function by linearly combining a factual correctness objective, a textual overlap objective, and a language model objective. Finally, we utilize the reinforcement learning (RL) strategy to jointly optimizing them.
Our contributions are as follows:
- •
A fact extraction scheme. First, we utilize the popular OpenIE tool to dig out complete relation triples. Then, we use a dependency parser to extract suitable relation tuples to further expand the facts. We generate a complete structured set of fact descriptions by filtering, cleaning, and deduplicating the extracted tuples.
- •
An evaluation function. We design a scoring function to describe the factual correctness of generated summaries quantitatively. In this work, we consider the factual accuracy and factual redundancy of generated summaries and systematically quantify their factual correctness in the open domain.
- •
A reinforcement learning framework. We propose a complete framework and a training strategy for abstractive sentence summarization models to improve the informativeness and factual correctness by jointly optimizing a mixed-objective learning function via RL.
- •
Extensive experiments. We conduct extensive experiments on the English Gigaword, Google News, and DUC 2004 datasets, proving that our model remarkably improves the factual correctness of generated summaries compared with competitive methods and also reduces duplicates while enhancing the informativeness.
Section snippets
Neural abstractive summarization models
The seq2seq is one of the mainstream frameworks in generating abstractive summaries. In 2015, Rush, Chopra, and Weston (2015) proposed a Convolutional Neural Network (CNN) encoder and a neural network language model under the seq2seq framework, which was the first application of the seq2seq model to the abstractive sentence summarization task. After that, Zhou et al. (2017) and Chopra, Auli, and Rush (2016) further improved the RNN-based summarization model. In 2016, Gu et al. (2016) added a
Background
In this section, we introduce our baseline pointer-generator network and fact extraction scheme. The pointer-generator network is an extension of the seq2seq-baseline model, which adds a copy mechanism to the original network structure by directly copying words from the original text into the proper positions in the generated summaries. We utilize the popular OpenIE and dependency parser tools to mine the fact descriptions in the input and generated summaries. We generate a complete set of
Fact-aware reinforced neural summarization
As shown in Fig. 4, our model is mainly composed of three parts. The blue box represents the neural summarization model. In our experiments, we use the seq2seq-baseline and PG in Section 3.1 respectively as the summarization models, to demonstrate the effectiveness of our approach. The green part is the fact extractor, which utilizes our fact extraction scheme in Section 3.2 to extract the fact tuples from the input text and generated summaries. The yellow part is policy learning. In order to
Experiments
In this section, we introduce our experimental datasets, the main evaluation metrics, the implementation details, and the comparative methods.
Results
In this section, we prove that our model significantly performs better than the competitive methods. We first present the results of informativeness and factual correctness evaluation. Then, we also perform a manual evaluation on 100 random samples to ensure that our increases in ROUGE and factual F1 scores are also followed by enhancements in human readability and quality.
Conclusion and future work
In this paper, we focus on the task of abstractive sentence summarization. We present a general framework and a hybrid learning strategy to improve the factual correctness of neural abstractive summarization models. We employ the popular OpenIE and dependency parser tools to extract structured fact tuples. In order to evaluate the factual correctness of generated summaries quantitatively, we define a factual correctness score function that considers the factual accuracy and factual redundancy.
CRediT authorship contribution statement
Mengli Zhang: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Gang Zhou: Conceptualization, Writing - review & editing, Supervision. Wanting Yu: Conceptualization, Writing - review & editing, Supervision. Wenfen Liu: Conceptualization, Writing - review & editing, Supervision.
Declaration of Competing Interest
All authors declare that they have no conflict of interest.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (61862011), Guangxi Science and Technology Foundation (2018GXNSFAA138116, 2019GXNSFGA245004).
References (46)
- et al.
NATSUM: Narrative abstractive summarization through cross-document timeline generation
Information Processing & Management
(2019) Rouge: A package for automatic evaluation of summaries
- et al.
Effective aggregation of various summarization techniques
Information Processing & Management
(2018) - et al.
SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis
Information Processing & Management
(2019) - et al.
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
Information Processing & Management
(2007) - et al.
Abstractive meeting summarization by hierarchical adaptive segmental network learning with multiple revising steps
Neurocomputing
(2020) - et al.
Leveraging linguistic structure for open domain information extraction
- et al.
Neural machine translation by jointly learning to align and translate
- et al.
Faithful to the original: Fact aware neural abstractive summarization
- et al.
Distraction-based neural networks for modeling documents
Fast abstractive summarization with reinforce-selected sentence rewriting
Abstractive sentence summarization with attentive recurrent neural networks
Global inference for sentence compression: An integer linear programming approach
Journal of Artificial Intelligence Research
Sentence compression beyond word deletion
A two-stage Chinese text summarization algorithm using keyword information and adversarial learning
Neurocomputing
BanditSum: Extractive summarization as a contextual bandit
Towards joking, humor sense equipped and emotion aware conversational Systems
Advances in Affective and Pleasurable Design
Ranking generated summaries by correctness: An interesting but challenging application for natural language inference
Sentence compression by deletion with LSTMs
Overcoming the lack of parallel data in sentence compression
Recent automatic text summarization techniques: A survey
Artifcial Intelligence Review
Assessing the factual accuracy of generated text
Incorporating copying mechanism in sequence-to-sequence learning
Cited by (30)
Multi-task Hierarchical Heterogeneous Fusion Framework for multimodal summarization
2024, Information Processing and ManagementFrom coarse to fine: Enhancing multi-document summarization with multi-granularity relationship-based extractor
2024, Information Processing and ManagementIntegrity verification for scientific papers: The first exploration of the text
2024, Expert Systems with ApplicationsImproving the performance of graph based dependency parsing by guiding bi-affine layer with augmented global and local features
2023, Intelligent Systems with ApplicationsTurkish abstractive text document summarization using text to text transfer transformer
2023, Alexandria Engineering JournalWikiDes: A Wikipedia-based dataset for generating short descriptions from paragraphs
2023, Information FusionCitation Excerpt :Table 15 shows a few false descriptions and their errors, which are highlighted in orange with corresponding explanations. Although our research did not design any mechanism to control the repetitive texts and factual information, these problems were addressed by some approaches such as Pointer–Generator Networks [39], Global Encoding [108], reinforcement learning [109], rule-based/heuristic transformations [110,111], and graph attention [112]. In this paper, we introduced WikiDes, a novel summarization dataset with over 80k samples on 6987 topics created by collecting data from Wikipedia and Wikidata.