当前位置: X-MOL 学术Int. J. Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing Data-Driven Models for Summarization as Parallel Tasks
Journal of Computational Science ( IF 3.1 ) Pub Date : 2020-04-09 , DOI: 10.1016/j.jocs.2020.101101
Aleš Zamuda , Elena Lloret

This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where the information increases exponentially, optimization becomes essential in selection of the most representative sentences for generating the best summaries. Therefore, a data-driven summarization model is proposed and optimized during a run of Differential Evolution (DE).

Different DE runs are distributed to a grid in parallel as optimization tasks, seeking high processing throughput despite the demanding complexity of the linguistic model, especially on longer multi-documents where DE improves results given more iterations. Namely, parallelization and the grid enable, running several independent DE runs at same time within fixed real-time budget. Such approach results in improving a Document Understanding Conference (DUC) benchmark recall metric over a previous setting.



中文翻译:

优化数据驱动模型以将其汇总为并行任务

本文提出了使用网格计算解决计算语言学的一个硬优化问题,特别是自动多文档文本摘要的方法。多文档摘要的主要挑战是从一组与主题相关的文档中有效地,有效地提取最相关和唯一的信息,并将其限制在指定的长度内。在大数据/文本时代,信息呈指数级增长,在选择最具代表性的句子以生成最佳摘要时,优化变得至关重要。因此,在差分演化(DE)运行期间,提出并优化了数据驱动的摘要模型。

尽管语言模型要求很高,但不同的DE运行作为优化任务并行分配到网格,以寻求较高的处理吞吐量,特别是在较长的多文档中,其中DE可以通过多次迭代来改善结果。即,并行化和网格使能,在固定的实时预算内同时运行多个独立的DE。这种方法可以改善文档理解会议(DUC)基准召回度量标准。

更新日期:2020-04-09
down
wechat
bug