当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coupled-dynamic learning for vision and language: Exploring Interaction between different tasks
Pattern Recognition ( IF 8 ) Pub Date : 2021-01-19 , DOI: 10.1016/j.patcog.2021.107829
Ning Xu , Hongshuo Tian , Yanhui Wang , Weizhi Nie , Dan Song , An-An Liu , Wu Liu

Intensive research interests have been paid for the vision and language communities. Especially, image captioning task aims to generate natural language descriptions from the image content. Oppositely, image synthesis task aims to generate realistic images from natural language descriptions. Moreover, both of them can achieve promising results by using Long Short-Term Memory (LSTM), which models the sequence dynamics at each time step as hidden state. Nevertheless, the research on dynamics is often limited in the individual task, while there is no progress exploring the mutual relationship between dynamics in different tasks. In this work, we present a novel coupled-dynamic formulation that can iteratively reduce the distance between task-dependent dynamics in the training process. To embed adverse information into individual network, we construct dual-loss architectures to interactively align dynamics. We evaluate the proposed framework on Flickr8k, Flickr30k and MSCOCO datasets. Experimental results show that our approach can boost dual tasks together and achieve competing performances against state-of-the-art methods.



中文翻译:

视觉和语言的动态耦合学习:探索不同任务之间的相互作用

视觉和语言社区已经引起了广泛的研究兴趣。特别地,图像字幕任务旨在从图像内容生成自然语言描述。相反,图像合成任务旨在根据自然语言描述生成逼真的图像。而且,它们都可以通过使用长期短期记忆(LSTM)来获得令人满意的结果,该记忆将每个时间步的序列动态建模为隐藏状态。然而,动力学的研究通常局限于单个任务,而探索不同任务的动力学之间的相互关系却没有进展。在这项工作中,我们提出了一种新颖的耦合动力学公式,该公式可以迭代地减小训练过程中与任务相关的动力学之间的距离。要将不良信息嵌入单个网络,我们构建了双重损失架构,以交互方式调整动力学。我们在Flickr8k,Flickr30k和MSCOCO数据集上评估提出的框架。实验结果表明,我们的方法可以共同完成双重任务,并与最先进的方法相媲美。

更新日期:2021-01-28
down
wechat
bug