AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses
arXiv - CS - Computation and Language Pub Date : 2020-01-15 , DOI: arxiv-2001.05467
Tong Niu, Mohit Bansal

Many sequence-to-sequence dialogue models tend to generate safe, uninformative responses. There have been various useful efforts on trying to eliminate them. However, these approaches either improve decoding algorithms during inference, rely on hand-crafted features, or employ complex models. In our work, we build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. Specifically, we start with a simple yet effective automatic metric, AvgOut, which calculates the average output probability distribution of all time steps on the decoder side during training. This metric directly estimates which tokens are more likely to be generated, thus making it a faithful evaluation of the model diversity (i.e., for diverse models, the token probabilities should be more evenly distributed rather than peaked at a few dull tokens). We then leverage this novel metric to propose three models that promote diversity without losing relevance. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch; the second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level; the third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal. Moreover, we experiment with a hybrid model by combining the loss terms of MinAvgOut and RL. All four models outperform their base LSTM-RNN model on both diversity and relevance by a large margin, and are comparable to or better than competitive baselines (also verified via human evaluation). Moreover, our approaches are orthogonal to the base model, making them applicable as an add-on to other emerging better dialogue models in the future.

中文翻译：

AvgOut：消除呆板响应的简单输出概率度量

许多序列到序列的对话模型往往会产生安全的、无信息的响应。已经有各种有益的努力试图消除它们。然而，这些方法要么改进推理过程中的解码算法，要么依赖手工制作的特征，要么采用复杂的模型。在我们的工作中，我们构建了对话模型，这些模型在没有任何特征工程的情况下动态地意识到哪些话语或标记是乏味的。具体来说，我们从一个简单而有效的自动度量 AvgOut 开始，它计算训练期间解码器端所有时间步的平均输出概率分布。该指标直接估计哪些令牌更有可能被生成，从而使其成为对模型多样性的忠实评估（即，对于不同的模型，令牌概率应该更均匀地分布，而不是在几个乏味的令牌上达到峰值）。然后，我们利用这个新颖的指标提出三个模型，在不失去相关性的情况下促进多样性。第一个模型 MinAvgOut 通过每批的输出分布直接最大化多样性得分；第二个模型，标签微调 (LFT)，在源序列前面添加一个标签，该标签由多样性分数连续缩放以控制多样性水平；第三个模型 RL 采用强化学习，并将多样性分数视为奖励信号。此外，我们通过结合 MinAvgOut 和 RL 的损失项来试验混合模型。所有四个模型在多样性和相关性方面都大大优于其基础 LSTM-RNN 模型，并且与竞争基准相当或更好（也通过人工评估进行验证）。此外，我们的方法与基础模型是正交的，这使得它们可以作为未来其他新兴更好对话模型的附加组件。

更新日期：2020-01-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文