当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A semantic-driven coupled network for infrared and visible image fusion
Information Fusion ( IF 14.7 ) Pub Date : 2024-03-11 , DOI: 10.1016/j.inffus.2024.102352
Xiaowen Liu , Hongtao Huo , Jing Li , Shan Pang , Bowen Zheng

In order to be adapted to high-level vision tasks, several infrared and visible image fusion methods cascade with the downstream network to enhance the semantic information of fusion results. However, due to the feature-level heterogeneities between fusion and downstream tasks, these methods suffer from the loss of pixel-level information and incomplete reconstruction of semantic-level information. To further improve the performance of fusion images in high-level vision tasks, we propose a semantic-driven coupled network for infrared and visible image fusion, terms as SDCFusion. Firstly, to address feature heterogeneity, we couple the segmentation and fusion networks into a joint framework such that both networks share the multi-level cross-modality coupled features. Based on the joint optimization of dual tasks, a joint action between fusion and downstream tasks is formed to force the cross-modality coupled features modeled on both pixel domain and semantic domain. Subsequently, to guide the semantic information reconstruction, we cascade two networks to form the semantic-based driven action, which continuously optimizes the fusion image to achieve semantic representation capacity. In addition, we introduce an adaptive training strategy to reduce the complexity of dual-task training. Specifically, an mIoU-based semantic measurement weight is designed to balance the joint action and driven action throughout the training process. We evaluate our method at both pixel information and semantic information levels, respectively. The qualitative and quantitative experiments verify the superiority of SDCFusion in terms of visual effects and metrics. The object detection and semantic segmentation experiments demonstrate that SDCFusion achieves superior performance in high-level vision tasks. The source code is available at .

中文翻译:

用于红外和可见光图像融合的语义驱动耦合网络

为了适应高级视觉任务,几种红外和可见光图像融合方法与下游网络级​​联,以增强融合结果的语义信息。然而,由于融合和下游任务之间的特征级异质性,这些方法遭受像素级信息的丢失和语义级信息的不完整重建。为了进一步提高高级视觉任务中融合图像的性能,我们提出了一种用于红外和可见光图像融合的语义驱动耦合网络,术语为 SDCFusion。首先,为了解决特征异构性,我们将分割和融合网络耦合到一个联合框架中,以便两个网络共享多级跨模态耦合特征。基于双任务的联合优化,形成融合和下游任务之间的联合动作,以强制在像素域和语义域上建模的跨模态耦合特征。随后,为了指导语义信息重建,我们级联两个网络形成基于语义的驱动动作,不断优化融合图像以实现语义表示能力。此外,我们引入了自适应训练策略来降低双任务训练的复杂性。具体来说,基于 mIoU 的语义测量权重旨在平衡整个训练过程中的联合动作和驱动动作。我们分别在像素信息和语义信息级别评估我们的方法。定性和定量实验验证了SDCFusion在视觉效果和指标方面的优越性。对象检测和语义分割实验表明 SDCFusion 在高级视觉任务中实现了卓越的性能。源代码可在 处获得。
更新日期:2024-03-11
down
wechat
bug