当前位置: X-MOL 学术Poznan Studies in Contemporary Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Three-step coreference-based summarizer for Polish news texts
Poznan Studies in Contemporary Linguistics ( IF 0.400 ) Pub Date : 2019-06-26 , DOI: 10.1515/psicl-2019-0015
Mateusz Kopeć

Abstract This article addresses the problem of automatic summarization of press articles in Polish. The main novelty of this research lays in the proposal of a three-step summarization algorithm which benefits from using coreference information. In related work section, all coreference-based approaches to summarization are presented. Then we describe in detail all publicly available summarization tools developed for Polish language. We state the problem of single-document press article summarization for Polish, describing the training and evaluation dataset: the POLISH SUMMARIES CORPUS. Next, a new coreference-based extractive summarization system NICOLAS is introduced. Its algorithm utilises advanced third-party preprocessing tools to extract the coreference information from the text to be summarized. This information is transformed into a complex set of features related to coreference concepts (mentions and coreference clusters) that are used for training the summarization system (on the basis of a manually prepared gold summaries corpus). The proposed solution is compared to the best publicly available summarization systems for Polish language and two state-of-the-art tools, developed for English language, but adapted to Polish for this article. NICOLAS summarization system obtains best scores, for selected metrics outperforming other systems in a statistically significant way. The evaluation also contains calculation of interesting upper-bounds: human performance and theoretical upper-bound.

中文翻译:

基于三步基于共指的波兰新闻文本摘要器

摘要本文解决了波兰语新闻文章的自动汇总问题。该研究的主要新颖之处在于提出了三步汇总算法的建议,该算法得益于使用共指信息。在相关工作部分,介绍了所有基于共指的总结方法。然后,我们详细介绍为波兰语开发的所有公开可用的摘要工具。我们陈述波兰语的单文档新闻文章摘要问题,描述培训和评估数据集:POLISH SUMMARIES CORPUS。接下来,介绍了一种新的基于共指的提取摘要系统NICOLAS。它的算法利用先进的第三方预处理工具从要摘要的文本中提取共指信息。该信息被转换成与共指概念(提要和共指簇)有关的复杂功能集,这些特征用于训练汇总系统(基于手动准备的黄金汇总语料库)。将该提议的解决方案与波兰语中最好的公开摘要系统进行了比较,并为英语开发了两种最新的工具,这些工具针对本文适用于波兰语。NICOLAS汇总系统获得最佳分数,其选定的指标以统计学上显着的方式优于其他系统。评估还包含有趣的上限的计算:人员绩效和理论上限。
更新日期:2019-06-26
down
wechat
bug