Optimizing Data-Driven Models for Summarization as Parallel Tasks,Journal of Computational Science

当前位置： X-MOL 学术 › Int. J. Comput. Sci. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing Data-Driven Models for Summarization as Parallel Tasks
Journal of Computational Science ( IF 3.1 ) Pub Date : 2020-04-09 , DOI: 10.1016/j.jocs.2020.101101
Aleš Zamuda , Elena Lloret

This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where the information increases exponentially, optimization becomes essential in selection of the most representative sentences for generating the best summaries. Therefore, a data-driven summarization model is proposed and optimized during a run of Differential Evolution (DE).

Different DE runs are distributed to a grid in parallel as optimization tasks, seeking high processing throughput despite the demanding complexity of the linguistic model, especially on longer multi-documents where DE improves results given more iterations. Namely, parallelization and the grid enable, running several independent DE runs at same time within fixed real-time budget. Such approach results in improving a Document Understanding Conference (DUC) benchmark recall metric over a previous setting.

中文翻译：

优化数据驱动模型以将其汇总为并行任务

本文提出了使用网格计算解决计算语言学的一个硬优化问题，特别是自动多文档文本摘要的方法。多文档摘要的主要挑战是从一组与主题相关的文档中有效地，有效地提取最相关和唯一的信息，并将其限制在指定的长度内。在大数据/文本时代，信息呈指数级增长，在选择最具代表性的句子以生成最佳摘要时，优化变得至关重要。因此，在差分演化（DE）运行期间，提出并优化了数据驱动的摘要模型。

尽管语言模型要求很高，但不同的DE运行作为优化任务并行分配到网格，以寻求较高的处理吞吐量，特别是在较长的多文档中，其中DE可以通过多次迭代来改善结果。即，并行化和网格使能，在固定的实时预算内同时运行多个独立的DE。这种方法可以改善文档理解会议（DUC）基准召回度量标准。

更新日期：2020-04-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11