Using reinforcement learning with external rewards for open-domain natural language generation,Journal of Intelligent Information Systems

当前位置： X-MOL 学术 › J. Intell. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using reinforcement learning with external rewards for open-domain natural language generation
Journal of Intelligent Information Systems ( IF 3.4 ) Pub Date : 2020-11-06 , DOI: 10.1007/s10844-020-00626-5
Vidhushini Srinivasan , Sashank Santhanam , Samira Shaikh

We propose a new approach towards emotional natural language generation using bidirectional seq2seq model. Our goal is to generate emotionally relevant language that accommodates the emotional tone of the prior context. To incorporate emotional information, we train our own embeddings appended with emotion values through valence, arousal and dominance scores. We use a reinforcement-learning framework, which is tuned using policy gradient method. Two of the internal rewards in our reinforcement learning framework, viz. Ease of Answering and Semantic Coherence are based on prior state-of-the-art. We propose a new internal reward, Emotional Intelligence, computed by minimizing the affective dissonance between the source and generated text. We also train a separate external reward analyzer to predict the rewards as well as to maximize the expected rewards (both internal and external). We evaluate the system on two common corpora used for Natural Language Generation tasks: the Cornell Movie Dialog and Yelp Restaurant Review Corpus. We report standard evaluation metrics including BLEU, ROUGE-L and perplexity as well as human evaluation to validate our approach. We demonstrate the ability of proposed model to generate emotionally appropriate responses on both corpora.

中文翻译：

使用具有外部奖励的强化学习来生成开放域自然语言

我们提出了一种使用双向 seq2seq 模型生成情感自然语言的新方法。我们的目标是生成情感相关的语言，以适应先前上下文的情感基调。为了结合情感信息，我们通过效价、唤醒和支配分数来训练我们自己的嵌入附加情感值。我们使用强化学习框架，该框架使用策略梯度方法进行调整。我们强化学习框架中的两个内部奖励，即。易于回答和语义连贯性基于现有技术。我们提出了一种新的内部奖励，即情商，通过最小化源文本和生成文本之间的情感不协调来计算。我们还训练了一个单独的外部奖励分析器来预测奖励以及最大化预期奖励（内部和外部）。我们在用于自然语言生成任务的两个常见语料库上评估系统：康奈尔电影对话和 Yelp 餐厅评论语料库。我们报告了标准评估指标，包括 BLEU、ROUGE-L 和困惑度以及人工评估以验证我们的方法。我们证明了所提出的模型能够在两个语料库上产生情感上适当的反应。

更新日期：2020-11-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>