当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Preserve Integrity in Realtime Event Summarization
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2021-05-03 , DOI: 10.1145/3442344
Chen Lin 1 , Zhichao Ouyang 1 , Xiaoli Wang 1 , Hui Li 1 , Zhenhua Huang 2
Affiliation  

Online text streams such as Twitter are the major information source for users when they are looking for ongoing events. Realtime event summarization aims to generate and update coherent and concise summaries to describe the state of a given event. Due to the enormous volume of continuously coming texts, realtime event summarization has become the de facto tool to facilitate information acquisition. However, there exists a challenging yet unexplored issue in current text summarization techniques: how to preserve the integrity, i.e., the accuracy and consistency of summaries during the update process. The issue is critical since online text stream is dynamic and conflicting information could spread during the event period. For example, conflicting numbers of death and injuries might be reported after an earthquake. Such misleading information should not appear in the earthquake summary at any timestamp. In this article, we present a novel realtime event summarization framework called IAEA (i.e., Integrity-Aware Extractive-Abstractive realtime event summarization). Our key idea is to integrate an inconsistency detection module into a unified extractive–abstractive framework. In each update, important new tweets are first extracted in an extractive module, and the extraction is refined by explicitly detecting inconsistency between new tweets and previous summaries. The extractive module is able to capture the sentence-level attention which is later used by an abstractive module to obtain the word-level attention. Finally, the word-level attention is leveraged to rephrase words. We conduct comprehensive experiments on real-world datasets. To reduce efforts required for building sufficient training data, we also provide automatic labeling steps of which the effectiveness has been empirically verified. Through experiments, we demonstrate that IAEA can generate better summaries with consistent information than state-of-the-art approaches.

中文翻译:

在实时事件汇总中保持完整性

Twitter 等在线文本流是用户查找正在进行的事件时的主要信息来源。实时事件摘要旨在生成和更新连贯且简洁的摘要,以描述给定事件的状态。由于不断涌现的大量文本,实时事件摘要已成为促进信息获取的事实上的工具。然而,当前文本摘要技术中存在一个具有挑战性但尚未探索的问题:如何在更新过程中保持完整性,即摘要的准确性和一致性。这个问题很关键,因为在线文本流是动态的,并且在活动期间可能会传播相互冲突的信息。例如,地震后可能报告的死亡和受伤人数相互矛盾。此类误导性信息不应出现在任何时间戳的地震摘要中。在本文中,我们提出了一种称为 IAEA 的新颖的实时事件汇总框架(即 Integrity-Aware Extractive-Abstractive 实时事件汇总)。我们的关键思想是将不一致检测模块集成到统一的提取-抽象框架中。在每次更新中,首先在提取模块中提取重要的新推文,并通过显式检测新推文与先前摘要之间的不一致来优化提取。提取模块能够捕获句子级别的注意力,然后抽象模块使用它来获得单词级别的注意力。最后,利用单词级别的注意力来改写单词。我们对真实世界的数据集进行了全面的实验。为了减少构建足够训练数据所需的工作量,我们还提供了自动标记步骤,其有效性已经过经验验证。通过实验,我们证明,与最先进的方法相比,原子能机构可以生成具有一致信息的更好的摘要。
更新日期:2021-05-03
down
wechat
bug