Generating unambiguous and diverse referring expressions,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating unambiguous and diverse referring expressions
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-12-31 , DOI: 10.1016/j.csl.2020.101184
Nikolaos Panagiaris , Emma Hart , Dimitra Gkatzia

Neural Referring Expression Generation (REG) models have shown promising results in generating expressions which uniquely describe visual objects. However, current REG models still lack the ability to produce diverse and unambiguous referring expressions (REs). To address the lack of diversity, we propose generating a set of diverse REs, rather than one-shot REs. To reduce the ambiguity of referring expressions, we directly optimise non-differentiable test metrics using reinforcement learning (RL), and we show that our approaches achieve better results under multiple different settings. Specifically, we initially present a novel RL approach to REG training, which instead of drawing one sample per input, it averages over multiple samples to normalize the reward during RL training. Secondly, we present an innovative REG model that utilizes an object attention mechanism that explicitly incorporates information about the target object and is optimised using our proposed RL approach. Thirdly, we propose a novel transformer model optimised with RL that exploits different levels of visual information. Our human evaluation demonstrates the effectiveness of this model, where we improve the state-of-the-art results in RefCOCO testA and testB in terms of task success from $76.95 %$ to $81.66 %$ and from $78.10 %$ to $83.33 %$ respectively. While in RefCOCO+ testA we show improvements from $58.85 %$ to $83.33 %$ . Finally, we present a thorough comparison of diverse decoding strategies (sampling and maximisation-based) and how they control the trade-off between the quality and diversity.

中文翻译：

生成明确而多样的引用表达

神经参照表达生成（REG）模型在生成唯一描述视觉对象的表达方面显示出令人鼓舞的结果。但是，当前的REG模型仍然缺乏生成各种且明确的引用表达式（RE）的能力。为了解决缺乏多样性的问题，我们建议生成一组不同的RE，而不是一次性的RE。为了减少引用表达式的歧义，我们使用强化学习（RL）直接优化了不可微分的测试指标，并且我们证明了我们的方法在多个不同的设置下可以获得更好的结果。具体来说，我们最初介绍了一种新的REG训练RL方法，该方法不是为每个输入抽取一个样本，而是对多个样本取平均，以在RL训练中标准化奖励。其次，我们提出了一种创新的REG模型，该模型利用了一种对象关注机制，该机制显式合并了有关目标对象的信息，并使用我们提出的RL方法进行了优化。第三，我们提出了一种利用RL优化的新型变压器模型，该模型利用了不同级别的视觉信息。我们的人工评估证明了该模型的有效性，在此方面，我们从RefCOCO testA和testB的任务成功方面改进了最新的结果 $76.95 ％$ 至 $81.66 ％$ 和从 $78.10 ％$ 至 $83.33 ％$ 分别。在RefCOCO + testA中，我们展示了 $58.85 ％$ 至 $83.33 ％$ 。最后，我们对各种解码策略（基于采样和最大化）以及它们如何控制质量和多样性之间的权衡进行了全面的比较。

更新日期：2021-01-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文