To what extent does content selection affect surface realization in the context of headline generation?,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

To what extent does content selection affect surface realization in the context of headline generation?
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-12-15 , DOI: 10.1016/j.csl.2020.101179
Cristina Barros , Marta Vicente , Elena Lloret

Headline generation is a task where the most important information of a news article is condensed and embodied into a single short sentence. This task is normally addressed by summarization techniques, ideally combining extractive and abstractive methods together with sentence compression or fusion techniques. Although Natural Language Generation (NLG) techniques have not been directly exploited for headline generation, they may provide better mechanisms than summarization techniques to paraphrase the information of a text. Therefore, this paper analyzes and evaluates the effectiveness of NLG techniques for generating headlines. In NLG, both content selection and surface realization are equally important—there is no point in generating text without knowing the topic. Considering this premise, we therefore take HanaNLG—a hybrid surface realization approach—as a basis, and we analyze the effect in the generated text when different content selection strategies are integrated at macroplanning stage. The experiments conducted show that, despite not using any sophisticated summarization method, the proposed approach provided the following benefits: i) it generated a coherent, linguistically structured headline; ii) it obtained results on standard datasets (i.e., DUC 2003 and DUC 2004) that were comparable to several competitive systems, in terms of the content of the generated headline; and, iii) the headlines generated by the whole approach (PLM-HanaNLG) were preferred by human assessors compared to those generated by the best performing system in DUC 2003.

中文翻译：

在标题生成的上下文中，内容选择在多大程度上影响表面实现？

标题生成是一项任务，其中新闻文章的最重要信息被压缩并体现为一个简短的句子。通常，通过摘要技术来解决此任务，最好将提取和抽象方法与句子压缩或融合技术结合在一起。尽管自然语言生成（NLG）技术尚未直接用于标题生成，但与摘要技术相比，它们可以提供更好的机制来解释文本信息。因此，本文分析并评估了NLG技术产生头条新闻的有效性。在NLG中，内容选择和表面实现都同等重要-在不了解主题的情况下生成文本毫无意义。考虑到这个前提，因此，我们以HanaNLG（一种混合表面实现方法）为基础，并且在宏计划阶段集成了不同的内容选择策略时，我们分析了生成的文本中的效果。进行的实验表明，尽管未使用任何复杂的汇总方法，但所提出的方法仍具有以下优点：i）它产生了一个连贯的，语言结构清晰的标题；ii）就产生的标题内容而言，它在标准数据集（即DUC 2003和DUC 2004）上获得了可与几个竞争系统相媲美的结果；iii）与DUC 2003中性能最好的系统生成的标题相比，人类评估员更喜欢采用整个方法生成的标题（PLM-HanaNLG）。并在宏观规划阶段整合了不同的内容选择策略后，分析了生成文本的效果。进行的实验表明，尽管未使用任何复杂的汇总方法，但所提出的方法仍具有以下优点：i）它产生了一个连贯的，语言结构清晰的标题；ii）就产生的标题内容而言，它在标准数据集（即DUC 2003和DUC 2004）上获得了可与几个竞争系统相媲美的结果；iii）与DUC 2003中性能最好的系统生成的标题相比，人类评估员更喜欢采用整个方法生成的标题（PLM-HanaNLG）。并在宏观规划阶段整合了不同的内容选择策略后，分析了生成文本的效果。进行的实验表明，尽管未使用任何复杂的汇总方法，但所提出的方法仍具有以下优点：i）它产生了一个连贯的，语言结构清晰的标题；ii）就产生的标题内容而言，它在标准数据集（即DUC 2003和DUC 2004）上获得了可与几个竞争系统相媲美的结果；iii）与DUC 2003中性能最好的系统生成的标题相比，人类评估员更喜欢采用整个方法生成的标题（PLM-HanaNLG）。该提议的方法具有以下优点：i）它产生了一个连贯的，语言结构清晰的标题；ii）就产生的标题内容而言，它在标准数据集（即DUC 2003和DUC 2004）上获得了可与几个竞争系统相媲美的结果；iii）与DUC 2003中性能最好的系统生成的标题相比，人类评估员更喜欢采用整个方法生成的标题（PLM-HanaNLG）。该提议的方法具有以下优点：i）它产生了一个连贯的，语言结构清晰的标题；ii）就产生的标题内容而言，它在标准数据集（即DUC 2003和DUC 2004）上获得了可与几个竞争系统相媲美的结果；iii）与DUC 2003中性能最好的系统生成的标题相比，人类评估员更喜欢采用整个方法生成的标题（PLM-HanaNLG）。

更新日期：2020-12-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文