Controlling contents in data-to-document generation with human-designed topic labels,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Controlling contents in data-to-document generation with human-designed topic labels
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-09-29 , DOI: 10.1016/j.csl.2020.101154
Kasumi Aoki , Akira Miyazawa , Tatsuya Ishigaki , Tatsuya Aoki , Hiroshi Noji , Keiichi Goshima , Hiroya Takamura , Yusuke Miyao , Ichiro Kobayashi

We propose a data-to-document generator that can easily control the contents of output texts based on a neural language model. Conventional data-to-text model is useful when a reader seeks a global summary of data because it has only to describe an important part that has been extracted beforehand. However, since it differs from users to users what they are interested in, it is necessary to develop a method to generate various summaries according to users’ requests. We develop a model to generate various summaries and to control their contents by providing the explicit targets for a reference to the model as controllable factors. In the experiments, we used five-minute or one-hour charts of 9 indicators (e.g., Nikkei 225), as time-series data, and daily summaries of Nikkei Quick News as textual data. We conducted comparative experiments using two pieces of information: human-designed topic labels indicating the contents of a sentence and automatically extracted keywords as the referential information for generation. Experiments show that both models using additional information of target document achieved higher performance in terms of BLEU and human evaluation. We found that human-designed topic labels are superior to extracted keywords in terms of controllability.

中文翻译：

使用人工设计的主题标签控制数据到文档生成中的内容

我们提出了一种数据到文档生成器，它可以基于神经语言模型轻松控制输出文本的内容。当读者寻求数据的全局摘要时，常规的数据到文本模型很有用，因为它仅需描述预先提取的重要部分。但是，由于不同的用户对他们感兴趣的内容不同，因此有必要开发一种根据用户的请求生成各种摘要的方法。我们通过提供明确的目标供模型参考作为可控因素，从而开发模型以生成各种摘要并控制其内容。在实验中，我们使用9分钟指标的五分钟或一小时图表（例如Nikkei 225）作为时间序列数据，并使用Nikkei Quick News的每日摘要作为文本数据。我们使用两种信息进行了对比实验：人为设计的主题标签指示了句子的内容，并自动提取了关键字作为生成的参考信息。实验表明，两种模型都使用目标文档的附加信息，在BLEU和人工评估方面均取得了较高的性能。我们发现，人为设计的主题标签在可控性方面优于提取的关键字。

更新日期：2020-10-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文