当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing multi-modal fusion in visual dialog via sample debiasing and feature interaction
Information Fusion ( IF 18.6 ) Pub Date : 2024-02-14 , DOI: 10.1016/j.inffus.2024.102302
Chenyu Lu , Jun Yin , Hao Yang , Shiliang Sun

Visual dialog aims to accomplish multiple rounds of dialog by fusing information extracted from images, captions, and previous question–answer pairs. As a vision-language task, visual dialog encounters challenges related to language bias and vision bias. These biases create an imbalance in multi-modal fusion, resulting in shortcut learning and significantly compromising the model’s robustness. Moreover, existing multi-modal fusion methods in visual dialog exhibit a low data interaction frequency, leading to insufficient fusion. To overcome the balance and sufficiency issues in multi-modal fusion, we propose a novel (CS-PAF). Specifically, CS-PAF consists of two core ingredients: () a counterfactual sample generation module for model debiasing; and () a parallel attention fusion network that enhances sufficiency in multi-modal data interaction. Notably, in contrast to other debiasing methods, our counterfactual sample generation applies contrastive learning to circumvent the high cost of manual annotations and ensure seamless integration with other models. Extensive comparisons with state-of-the-art approaches, along with comprehensive ablation and transferability studies across multiple datasets, substantiate the superiority and effectiveness of our CS-PAF. Our implement code is available at .

中文翻译:

通过样本去偏和特征交互增强视觉对话中的多模态融合

视觉对话旨在通过融合从图像、标题和先前的问答对中提取的信息来完成多轮对话。作为一项视觉语言任务,视觉对话遇到了与语言偏差和视觉偏差相关的挑战。这些偏差造成多模态融合的不平衡,导致捷径学习并显着损害模​​型的稳健性。此外,现有的视觉对话多模态融合方法数据交互频率较低,导致融合不足。为了克服多模态融合中的平衡和充分性问题,我们提出了一种新颖的方法(CS-PAF)。具体来说,CS-PAF 由两个核心组成部分组成:() 用于模型去偏的反事实样本生成模块; () 并行注意力融合网络,增强多模式数据交互的充分性。值得注意的是,与其他去偏差方法相比,我们的反事实样本生成应用对比学习来规避手动注释的高成本,并确保与其他模型的无缝集成。与最先进方法的广泛比较,以及跨多个数据集的全面消融和可转移性研究,证实了我们的 CS-PAF 的优越性和有效性。我们的实施代码可在 处获得。
更新日期:2024-02-14
down
wechat
bug