FAR-ASS: Fact-aware reinforced abstractive sentence summarization

doi:10.1016/j.ipm.2020.102478

Information Processing & Management

Volume 58, Issue 3, May 2021, 102478

https://doi.org/10.1016/j.ipm.2020.102478 Get rights and content

Highlights

•
For natural language generation tasks, fact fabrication is a serious problem.
•
An automatic fact extraction scheme leveraging open information extraction and dependency parser tools to extract the structured fact tuples.
•
A factual correctness score function that takes into account the factual accuracy and the factual redundancy.
•
A framework that improves the informativeness and the factual correctness by jointly optimize a mixed-objective learning function via reinforcement learning.

Abstract

Automatic summarization systems provide an effective solution to today's unprecedented growth of textual data. For real-world tasks, such as data mining and information retrieval, the factual correctness of generated summary is critical. However, existing models usually focus on improving the informativeness rather than optimizing factual correctness. In this work, we present a Fact-Aware Reinforced Abstractive Sentence Summarization framework to improve the factual correctness of neural abstractive summarization models, denoted as FAR-ASS. Specifically, we develop an automatic fact extraction scheme leveraging OpenIE (Open Information Extraction) and dependency parser tools to extract structured fact tuples. Then, to quantitatively evaluate the factual correctness, we define a factual correctness score function that considers the factual accuracy and factual redundancy. We further propose to adopt reinforcement learning to improve readability and factual correctness by jointly optimizing a mixed-objective learning function. We use the English Gigaword and DUC 2004 datasets to evaluate our model. Experimental results show that compared with competitive models, our model significantly improves the factual correctness and readability of generated summaries, and also reduces duplicates while improving the informativeness.

Introduction

With the unprecedented growth of textual information on the Internet, how to mine useful knowledge from a large amount of redundant information efficiently is a great challenge (Dybala et al., 2017; Nowakowski et al., 2019; Rzepka, Takishita, & Araki, 2020), which has necessitated the development of highly efficient automatic summarization systems (Barros et al., 2019; Gambhir & Gupta, 2017; Mohamed & Oussalah, 2019). The most essential purpose of a summarization system is to generate a concise, readable and factual summary of the input text while keeping its gist (Dong et al., 2018; Jadhav & Rajan, 2018; Li et al., 2018). At present, there are two main types of summarization systems: extractive (Dong et al., 2018; Jadhav & Rajan, 2018; Zhang et al., 2018) and abstractive (Chen et al., 2016; Deng et al., 2020; Takase et al., 2016; Zheng et al., 2020). Extractive systems directly copy a few significant keywords from the source text to form a summary, which is actually a simple compression of the source sentences. Abstractive systems can automatically generate new words and linguistic phrases that are not present in the input sentences. Compared with extractive methods, abstractive summarization is considered much closer to the way humans make a summary, but it also causes more challenges, such as poor readability and factual discrepancy (Li et al., 2020; Zhang et al., 2020).

In this paper, we focus on the task of abstractive sentence summarization, which generates a shorter sentence while maintaining the original meaning of the input sentences. Unlike document-level summarization, the original text of the sentence-level summarization task is shorter, so it is impossible to directly extract the existing sentences to form a summary. Recently, neural network models based on the encoder-decoder architecture have demonstrated powerful capabilities in the sentence summarization task. These models can generate summaries with very high ROUGE scores (Cao et al., 2017). However, sentence summarization inevitably needs to tailor, modify, reorganize and fuse the input sentences. Therefore, the generated sentences often do not match the original relations, resulting in factual errors. Several researchers have conducted research on the factual consistency in summaries (Falke, Ribeiro, Utama, Dagan, & Gurevych, 2019; Goodrich et al., 2019; Kryściński, McCann, Xiong, & Socher, 2019), and they have found that nearly 30% of summaries generated using abstractive models contain fake facts.

In fact, for downstream tasks, such as data mining and information retrieval, the generated abstractive summaries with excessive factual errors are almost useless in practice. However, previous researchers (Mehta & Majumder, 2018; Paulus, Xiong, & Socher, 2018) have focused on optimizing models to improve the informativeness of generated summaries, which leads to a high ROUGE score, but some facts are discrepant with the original text. As shown in Fig. 1, the seq2seq-baseline model (Nallapati et al., 2016) and the PG (pointer-generator) network (See, Liu, & Manning, 2017) produce the same fake fact, i.e., the subject of the verb “build” is “intel” instead of “vietnamese government”, which results in an entirely different fact from the original text. Consequently, although the summaries are highly informative (ROUGE-L  = 0.49) and readable, they are useless due to being discrepant with the original facts.

Intuitively, for NLG (natural language generation) tasks, fact fabrication is a serious problem, which directly determines the usability of generated text. Nevertheless, existing abstractive summarization models rarely pay attention to improving the factual correctness of generated summaries. Some sporadic attempts have limited success. For example, in 2017, Cao et al. (2017) used the OpenIE (Open Information Extraction) (Angeli, Premkumar, & Manning, 2015) systems to extract the fact descriptions of the original input and then encoded them into the attention mechanism together with the original input. In 2019, Falke et al. (2019) used natural language inference systems to evaluate the factual consistency of generated summaries for the first time. On this basis, they reranked the summaries. In 2019, Kryściński et al. (2019) proposed a weakly supervised model to evaluate the factual correctness of generated summaries.

In this work, our goal is to optimize the factual correctness of existing neural abstractive summarization models. In order to maintain the factual consistency between the generated text and the original input, we must first extract fact descriptions. To this end, we take advantage of popular tools OpenIE and dependency parser. OpenIE represents a fact as a relation triple consisting of (subject; predicate; object). But, for different sentences, complete relation triples are not always available. Therefore, we utilize the dependency parser to mine suitable relation tuples to further expand the facts. On this basis, we design a fact extraction scheme that can extract complete structured relation tuples from text to describe the facts. Then, we define a factual correctness score by comparing the relation tuples between the original text and generated summary. Furthermore, we also develop a mixed-objective learning function by linearly combining a factual correctness objective, a textual overlap objective, and a language model objective. Finally, we utilize the reinforcement learning (RL) strategy to jointly optimizing them.

Our contributions are as follows:

•
A fact extraction scheme. First, we utilize the popular OpenIE tool to dig out complete relation triples. Then, we use a dependency parser to extract suitable relation tuples to further expand the facts. We generate a complete structured set of fact descriptions by filtering, cleaning, and deduplicating the extracted tuples.
•
An evaluation function. We design a scoring function to describe the factual correctness of generated summaries quantitatively. In this work, we consider the factual accuracy and factual redundancy of generated summaries and systematically quantify their factual correctness in the open domain.
•
A reinforcement learning framework. We propose a complete framework and a training strategy for abstractive sentence summarization models to improve the informativeness and factual correctness by jointly optimizing a mixed-objective learning function via RL.
•
Extensive experiments. We conduct extensive experiments on the English Gigaword, Google News, and DUC 2004 datasets, proving that our model remarkably improves the factual correctness of generated summaries compared with competitive methods and also reduces duplicates while enhancing the informativeness.

Section snippets

Neural abstractive summarization models

The seq2seq is one of the mainstream frameworks in generating abstractive summaries. In 2015, Rush, Chopra, and Weston (2015) proposed a Convolutional Neural Network (CNN) encoder and a neural network language model under the seq2seq framework, which was the first application of the seq2seq model to the abstractive sentence summarization task. After that, Zhou et al. (2017) and Chopra, Auli, and Rush (2016) further improved the RNN-based summarization model. In 2016, Gu et al. (2016) added a

Background

In this section, we introduce our baseline pointer-generator network and fact extraction scheme. The pointer-generator network is an extension of the seq2seq-baseline model, which adds a copy mechanism to the original network structure by directly copying words from the original text into the proper positions in the generated summaries. We utilize the popular OpenIE and dependency parser tools to mine the fact descriptions in the input and generated summaries. We generate a complete set of

Fact-aware reinforced neural summarization

As shown in Fig. 4, our model is mainly composed of three parts. The blue box represents the neural summarization model. In our experiments, we use the seq2seq-baseline and PG in Section 3.1 respectively as the summarization models, to demonstrate the effectiveness of our approach. The green part is the fact extractor, which utilizes our fact extraction scheme in Section 3.2 to extract the fact tuples from the input text and generated summaries. The yellow part is policy learning. In order to

Experiments

In this section, we introduce our experimental datasets, the main evaluation metrics, the implementation details, and the comparative methods.

Results

In this section, we prove that our model significantly performs better than the competitive methods. We first present the results of informativeness and factual correctness evaluation. Then, we also perform a manual evaluation on 100 random samples to ensure that our increases in ROUGE and factual F1 scores are also followed by enhancements in human readability and quality.

Conclusion and future work

In this paper, we focus on the task of abstractive sentence summarization. We present a general framework and a hybrid learning strategy to improve the factual correctness of neural abstractive summarization models. We employ the popular OpenIE and dependency parser tools to extract structured fact tuples. In order to evaluate the factual correctness of generated summaries quantitatively, we define a factual correctness score function that considers the factual accuracy and factual redundancy.

CRediT authorship contribution statement

Mengli Zhang: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Gang Zhou: Conceptualization, Writing - review & editing, Supervision. Wanting Yu: Conceptualization, Writing - review & editing, Supervision. Wenfen Liu: Conceptualization, Writing - review & editing, Supervision.

Declaration of Competing Interest

All authors declare that they have no conflict of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61862011), Guangxi Science and Technology Foundation (2018GXNSFAA138116, 2019GXNSFGA245004).

References (46)

C. Barros et al.
NATSUM: Narrative abstractive summarization through cross-document timeline generation
Information Processing & Management
(2019)
C.Y. Lin
Rouge: A package for automatic evaluation of summaries
P. Mehta et al.
Effective aggregation of various summarization techniques
Information Processing & Management
(2018)
M. Mohamed et al.
SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis
Information Processing & Management
(2019)
D. Zajic et al.
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
Information Processing & Management
(2007)
J. Zheng et al.
Abstractive meeting summarization by hierarchical adaptive segmental network learning with multiple revising steps
Neurocomputing
(2020)
G. Angeli et al.
Leveraging linguistic structure for open domain information extraction
D. Bahdanau et al.
Neural machine translation by jointly learning to align and translate
Z. Cao et al.
Faithful to the original: Fact aware neural abstractive summarization
Q. Chen et al.
Distraction-based neural networks for modeling documents

Y.C. Chen et al.

Fast abstractive summarization with reinforce-selected sentence rewriting

S. Chopra et al.

Abstractive sentence summarization with attentive recurrent neural networks

J. Clarke et al.

Global inference for sentence compression: An integer linear programming approach

Journal of Artificial Intelligence Research

(2008)

T. Cohn et al.

Sentence compression beyond word deletion

Z. Deng et al.

A two-stage Chinese text summarization algorithm using keyword information and adversarial learning

Neurocomputing

(2020)

Y. Dong et al.

BanditSum: Extractive summarization as a contextual bandit

P. Dybala et al.

Towards joking, humor sense equipped and emotion aware conversational Systems

Advances in Affective and Pleasurable Design

(2017)

T. Falke et al.

Ranking generated summaries by correctness: An interesting but challenging application for natural language inference

K. Filippova et al.

Sentence compression by deletion with LSTMs

K. Filippova et al.

Overcoming the lack of parallel data in sentence compression

M. Gambhir et al.

Recent automatic text summarization techniques: A survey

Artifcial Intelligence Review

(2017)

B. Goodrich et al.

Assessing the factual accuracy of generated text

J. Gu et al.

Incorporating copying mechanism in sequence-to-sequence learning

Cited by (30)

Multi-task Hierarchical Heterogeneous Fusion Framework for multimodal summarization
2024, Information Processing and Management
With the rise of multimedia content on the internet, Multimodal Summarization has become a challenging task to help individuals grasp vital information fast. However, previous methods mainly learn the different modalities indistinguishably, which is ineffective in capturing the fine-grained content and hierarchical correlation in multimodal articles. To resolve the present problem, this paper proposes a Multi-task Hierarchical Heterogeneous Fusion Framework (MHHF) to learn the hierarchical structure and heterogeneous correlation existing in the multimodal data. Specifically, we propose a Hierarchical Cross-modality Feature Fusion module to progressively explore the different levels of interaction from object-word features to sentence-scene features. Besides, a Multi-task Cross-modality Decoder is constructed to coalesce different levels of features with three sub-tasks, i.e., Abstractive Summary Generation, Relevant Image Selection, and Extractive Summary Generation. We conduct extensive experiments on three datasets, i.e., MHHF-dataset, CNN, and Daily Mail, which consist of 62880, 1970, and 203 multimodal articles, respectively. Our method achieves state-of-the-art performance on most metrics. Moreover, MHHF consistently outperforms the baseline model on MHHF-dataset by 5.88%, 4.41%, and 0.4% of Rouge-1, Rouge-2, and Rouge-L for the abstractive summarization task. Ablation studies show that both Hierarchical Cross-modality Feature Fusion and Multi-task Cross-modality Decoder can improve the quality of multimodal summarization output. Further diversity analysis and human evaluation also demonstrate that MHHF can generate more informative and fluent summaries.
From coarse to fine: Enhancing multi-document summarization with multi-granularity relationship-based extractor
2024, Information Processing and Management
Multi-Document Summarization (MDS) is a challenging task due to the fact that multiple documents not only have extremely long inputs but may also be overlapping, complementary, or contradictory to each other. In this paper, we propose to capture complex cross-document interactions to handle lengthy inputs for better multi-document summarization. Specifically, we present MDS-MGRE, a coarse-to-fine MDS framework that introduces Multi-Granularity Relationships into an Extract-then-summarize pipeline. In the coarse-grained stage, multi-granularity embedding, heterogeneous graph construction, and MGRExtractor work together to convert redundant multi-documents into compact meta-documents. We first utilize pre-trained language model BERT to obtain semantically rich embeddings for documents at different granularities, including documents, paragraphs, sentence-sets, and sentences. Then, we construct a heterogeneous graph with 4 types of nodes (document nodes, paragraph nodes, sentence-set nodes, and sentence nodes) and corresponding connecting edges to model rich document relationships. Furthermore, we propose a novel Multi-Granularity Relationship-based Extractor (MGRExtractor) to produce meta-documents by efficiently pruning heterogeneous graphs. More precisely, it consists of 4 main modules: noise removal, redundancy removal, multi-granularity scoring, and sentence-set selection. In the fine-grained stage, we employ the large configuration of BART as our abstractive summarizer to generate system summaries from the extracted meta-documents. Experimental results on two benchmark datasets show that our framework significantly outperforms strong baselines with comparable parameters, and slightly underperforms methods with a maximum encoding length of 16,384 tokens. For Multi-News and WCEP, automatic evaluation results show that MDS-MGRE achieves an average performance improvement of 1.75% and 8.77% compared to the state-of-the-art systems with comparable parameters, respectively. Such positive results demonstrate the benefits of generating high-quality meta-documents to enhance MDS by modeling rich document relationships.
Integrity verification for scientific papers: The first exploration of the text
2024, Expert Systems with Applications
Scientific papers, as pivotal tools for academic communication, should be articulated with clarity and precision to ensure the effective conveyance of scholarly ideas and to prevent reader confusion. Yet, many such papers conspicuously lack in-depth research, and their core content is often ambiguously presented. This pattern poses a significant impediment to the progressive evolution of science and technology. While numerous researchers have recognized this widespread challenge, a holistic theoretical or methodological solution remains elusive in the academic realm. To bridge this gap, we introduce the INTEGrity vERification (INTEGER) task. This task aids researchers in assessing the integrity of their papers by verifying the clarity of each knowledge unit. To implement this task on text, we propose a multi-task learning model that utilizes the Tucker decomposition and span-level attention mechanism to identify terms and their integrity precisely. More specifically, to provide insights into the INTEGER task and validate the effectiveness of the proposed model, we collect 8076 sentences and construct three new datasets containing various types of terms and descriptions in different domains. Extensive experimental results show that our proposed model has an average performance improvement of 1.1% F1 over the three datasets compared to a series of state-of-the-art baseline methods.
Improving the performance of graph based dependency parsing by guiding bi-affine layer with augmented global and local features
2023, Intelligent Systems with Applications
The growing interaction between humans and machines raises the necessity to more sophisticated tools for natural language understanding. Dependency parsing is crucial for capturing the semantics of a sentence. Although graph-based dependency parsing approaches outperform transition-based methods because they are not exposed to error propagation as their compeer, their feature space is comparatively limited. Thus, the main issue with graph-based parsing is how to expand the set of features to improve performance. In this research, we propose to expand the feature space of graph-based parsers. To benefit from the global meaning of the entire sentence content, we employee the sentence representation as an additional token feature. Also, to highlight local word collaborations that build sub-tree structures, we use convolutional neural network layers over token embeddings. We achieve the state-of-art results for Turkish, English, Hungarian, and Korean by getting the unlabeled and labeled attachment scores respectively on the test sets; 82.64% and 76.35% on Turkish IMST, 93.36% and 91.34% on English EWT, 90.85% and 87.39% on Hungarian Szeged, 92.44% and 89.58% on Korean GSD treebanks. Our experimental findings show that augmented global and local features empower the performance of graph-based dependency parsers.
Turkish abstractive text document summarization using text to text transfer transformer
2023, Alexandria Engineering Journal
Text summarization is the process of reducing text size while preserving its key points. Thanks to this process, the reading time of the text is also reduced which contributes to reaching the desired information quickly, especially in today's world where time is much more important. In addition, summarization can be used to create a solution to extract outstanding information from the text. In this study, we focus on abstract summarization, which can draw more human like conclusions from the text. A summarization study was carried out on the data set that was collected from online Turkish news sources. Rouge and Bert-score performance metrics were used to compare the performance of this study using the text to text transfer transformer (T5) method. The precision values of the Rouge-1, Rouge-2, Rouge-L and Bert-score performance metrics obtained in this study were found to be 0.6913, 0.6623, 0.7528 and 0.8718, respectively. Recall values were 0.9210, 0.8917, 0.9183 and 0.9138, respectively. F measure values were 0.7649, 0.7338, 0.8084 and 0.8913 respectively. Considering the success of the results obtained in the study, a method that can obtain successful results for Turkish text summarization is presented and the original dataset is made available to other researchers.
WikiDes: A Wikipedia-based dataset for generating short descriptions from paragraphs
2023, Information Fusion
Citation Excerpt :
Table 15 shows a few false descriptions and their errors, which are highlighted in orange with corresponding explanations. Although our research did not design any mechanism to control the repetitive texts and factual information, these problems were addressed by some approaches such as Pointer–Generator Networks [39], Global Encoding [108], reinforcement learning [109], rule-based/heuristic transformations [110,111], and graph attention [112]. In this paper, we introduced WikiDes, a novel summarization dataset with over 80k samples on 6987 topics created by collecting data from Wikipedia and Wikidata.
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of Wikipedia articles for the problem of text summarization. The dataset consists of over 80k English samples on 6987 topics. We set up a two-phase summarization method — description generation (Phase I) and candidate ranking (Phase II) — as a strong approach that relies on transfer and contrastive learning. For description generation, T5 and BART show their superiority compared to other small-scale pre-trained models. By applying contrastive learning with the diverse input from beam search, the metric fusion-based ranking models outperform the direct description generation models significantly up to $\approx$ 22 ROUGE in topic-exclusive split and topic-independent split. Furthermore, the outcome descriptions in Phase II are supported by human evaluation in over 45.33% chosen compared to 23.66% in Phase I against the gold descriptions. In the aspect of sentiment analysis, the generated descriptions cannot effectively capture all sentiment polarities from paragraphs while doing this task better from the gold descriptions. The automatic generation of new descriptions reduces the human efforts in creating them and enriches Wikidata-based knowledge graphs. Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions. Finally, we expect WikiDes to be a useful dataset for related works in capturing salient information from short paragraphs. The curated dataset is publicly available at: https://github.com/declare-lab/WikiDes.

View all citing articles on Scopus

View full text

FAR-ASS: Fact-aware reinforced abstractive sentence summarization

Highlights

Abstract

Introduction

Section snippets

Neural abstractive summarization models

Background

Fact-aware reinforced neural summarization

Experiments

Results

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Information Processing & Management

Information Processing & Management

Information Processing & Management

Information Processing & Management

Neurocomputing

Leveraging linguistic structure for open domain information extraction

Neural machine translation by jointly learning to align and translate

Faithful to the original: Fact aware neural abstractive summarization

Distraction-based neural networks for modeling documents

Fast abstractive summarization with reinforce-selected sentence rewriting

Abstractive sentence summarization with attentive recurrent neural networks

Global inference for sentence compression: An integer linear programming approach

Journal of Artificial Intelligence Research

Sentence compression beyond word deletion

A two-stage Chinese text summarization algorithm using keyword information and adversarial learning

Neurocomputing

BanditSum: Extractive summarization as a contextual bandit

Towards joking, humor sense equipped and emotion aware conversational Systems

Advances in Affective and Pleasurable Design

Ranking generated summaries by correctness: An interesting but challenging application for natural language inference

Sentence compression by deletion with LSTMs

Overcoming the lack of parallel data in sentence compression

Recent automatic text summarization techniques: A survey

Artifcial Intelligence Review

Assessing the factual accuracy of generated text

Incorporating copying mechanism in sequence-to-sequence learning