Global Encoding for Long Chinese Text Summarization,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Global Encoding for Long Chinese Text Summarization
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2020-10-06 , DOI: 10.1145/3407911
Xuefeng Xi ₁ , Zhou Pi ₁ , Guodong Zhou ₂

Affiliation

Text summarization is one of the significant tasks of natural language processing, which automatically converts text into a summary. Some summarization systems, for short/long English, and short Chinese text, benefit from advances in the neural encoder-decoder model because of the availability of large datasets. However, the long Chinese text summarization research has been limited to datasets of a couple of hundred instances. This article aims to explore the long Chinese text summarization task. To begin with, we construct a first large-scale, long Chinese text summarization corpus, the Long Chinese Summarization of Police Inquiry Record Text (LCSPIRT). Based on this corpus, we propose a sequence-to-sequence (Seq2Seq) model that incorporates a global encoding process with an attention mechanism. Our model achieves a competitive result on the LCSPIRT corpus compared with several benchmark methods.

中文翻译：

长中文文本摘要的全局编码

文本摘要是自然语言处理的重要任务之一，它自动将文本转换为摘要。由于大数据集的可用性，一些用于短/长英文和短中文文本的摘要系统受益于神经编码器-解码器模型的进步。然而，长篇中文文本摘要研究仅限于几百个实例的数据集。本文旨在探索长篇中文文本摘要任务。首先，我们构建了第一个大规模的长中文文本摘要语料库，即警务调查记录文本长中文摘要（LCSPIRT）。基于这个语料库，我们提出了一个序列到序列（Seq2Seq）模型，该模型将全局编码过程与注意力机制相结合。

更新日期：2020-10-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>