Elsevier

Information Fusion

Volume 69, May 2021, Pages 128-141
Information Fusion

Full length article
RXDNFuse: A aggregated residual dense network for infrared and visible image fusion

https://doi.org/10.1016/j.inffus.2020.11.009Get rights and content

Highlights

  • We propose a novel method based on aggregated residual dense networks for IR/VIS fusion.

  • It can keep both the texture details and the thermal radiation information in the source images.

  • It is an end-to-end model that does not need to design fusion rules manually.

  • Two loss function strategies are designed to optimize the model similarity constraint.

  • We generalize it to fuse RGB scale images and images with different resolutions.

Abstract

This study proposes a novel unsupervised network for IR/VIS fusion task, termed as RXDNFuse, which is based on the aggregated residual dense network. In contrast to conventional fusion networks, RXDNFuse is designed as an end-to-end model that combines the structural advantages of ResNeXt and DenseNet. Hence, it overcomes the limitations of the manual and complicated design of activity-level measurement and fusion rules. Our method establishes the image fusion problem into the structure and intensity proportional maintenance problem of the IR/VIS images. Using comprehensive feature extraction and combination, RXDNFuse automatically estimates the information preservation degrees of corresponding source images, and extracts hierarchical features to achieve effective fusion. Moreover, we design two loss function strategies to optimize the similarity constraint and the network parameter training, thus further improving the quality of detailed information. We also generalize RXDNFuse to fuse images with different resolutions and RGB scale images. Extensive qualitative and quantitative evaluations reveal that our results can effectively preserve the abundant textural details and the highlighted thermal radiation information. In particular, our results form a comprehensive representation of scene information, which is more in line with the human visual perception system.

Introduction

Image fusion is an important modern enhancement technique that aims to fuse multiple input images into a robust and informative image, and offers more complex and detailed scene representation, which can facilitate subsequent processing or help in decision making [1], [2]. Image fusion technology has been used to enhance the performance in terms of human visual perception, object detection, and target recognition [3], [4], [5], [6], [7]. In particular, infrared and visible image fusion plays an especially significant role in the scene detection and the tracking performance of the video surveillance system. These two types of images can provide complementary scene information from different aspects. The infrared images can easily distinguish targets from the background due to their significantly discriminative thermal radiations, which can work well at all day/night time and under all weather conditions. However, the infrared images usually lack texture and thus ineffectively describe the details. By contrast, visible images contain textural details with high spatial resolution, which is conducive to enhance the ability of target recognition and conforms to the human visual system. Therefore, how to effectively combine complementary information becomes the main focus of the fusion methods.

The image fusion task has been developed with different schemes in recent years. Existing fusion methods can be roughly divided into two categories. (i) traditional methods. Most typically, multiscale transform methods have been applied to extract image salient features, such as the discrete wavelet transform (DWT) [8], [9], [10]. The representation learning-based methods have also attracted great attention, such as sparse representation (SR) [11] and joint sparse representation (JSR) [12]. The subspace-based methods [13], [14], saliency-based methods [15], [16] and hybrid models [17], [18], [19] have also been applied to the image fusion task. (ii) Deep learning-based methods. Given the rapid advances in deep learning technology, the convolutional neural network(CNN) [20], [21] is used to obtain the image features and reconstruct the fused image. The CNN-based [22], [23] methods have obtained better performance in image processing, due to the strong fitting ability of neural networks. Thus, it has been widely used in the field of fusion tasks. Ma et al. [24] proposed an unsupervised network to generate the decision map for fusion. Ma et al. [25] proposed an end-to-end model called FusionGAN, which generates a fused image with dominant infrared intensities and additional visible gradients.

Although existing methods can achieve better results in corresponding fusion tasks, they still have some drawbacks which affect the image fusion performance. First, the current fusion rules in most current methods are increasingly complex and designed in a manual way, these rules introduce certain artifacts into the fusion results. Second, in CNN-based fusion methods, only the output of the last feature extraction layer is used as the image fusion component. This approach undoubtedly discards large amounts of useful information obtained by the middle convolutional extraction layer, which directly affects the final fusion performance. Third, the existing fusion methods usually lack competitiveness in terms of time and storage space due to their computational complexity and large amount of parameters.

To overcome the abovementioned challenges, we propose an end-to-end network, namely RXDNFuse, to perform infrared and visible image fusion task. This network does not require manually designed fusion rules and can effectively utilize the deep features extracted from the source images. More specifically, infrared thermal radiation information is characterized by pixel intensities, while textural detail information in visible images is typically characterized by edges and gradients [26], the preservation of details in the source image often determines the clarity of the fusion image. To further improve this performance, we design two loss functions strategies, namely pixel-wise strategy and feature-wise strategy, to force the fused image to have more texture details. Furthermore, a new feature extraction module RXDB is designed to further lighten the fusion framework, so as to improve the time efficiency of image fusion. A schematic illustration of different image fusion methods is shown in Fig. 1.

The characteristics and contributions of our work can be summarized in the following four aspects. First, we propose an end-to-end fusion architecture based on the aggregated residual dense network to solve the infrared and visible image fusion problem. Our approach effectively avoids the need for manually designing complicated image decomposition measurement and fusion rules and adequately utilizes the hierarchical features from source images. Second, we propose two loss function strategies to optimize the model similarity constraint and the quality of detailed information, where the pixel-wise strategy directly exploits the original information from the source images, and the feature-wise strategy calculates a more detailed loss function based on a pre-trained VGG-19 network [28]. Third, we conduct experiments on public infrared and visible image fusion datasets with qualitative and quantitative comparisons to state-of-the-art methods. Compared to five existing methods, the fusion results of the proposed RXDNFuse obtains excellent visual quality in the background information, while also containing highlighted thermal radiation target information. Finally, we generalize RXDNFuse to fuse images with different resolutions and RGB scale images, enabling it to generate clear and natural fused images.

The remainder of this paper is structured as follows. In Section 2, we briefly review related works on deep learning frameworks. Section 3 presents the details of the proposed RXDNFuse network for infrared and visible image fusion. Abundant experimental results and analysis are illustrated in Section 4. Finally, Section 5 provides a discussion and summarizes the paper.

Section snippets

Related works

In this section, we briefly review the advances and relevant works in the image fusion field, including traditional infrared and visible image fusion methods, VGG network deep learning models, typical deep learning-based image process techniques, and their improved variants.

Traditional fusion methods. With the fast-growing demand and progress of image representation, there are numerous infrared and visible image fusion methods have been proposed. As reconstruction is usually an inverse process

Method

This section describes the proposed RXDNFuse for infrared and visible image fusion in detail. We start by presenting the problem formulation to describe the details of feature processing, and then discuss the detailed structure of our RXDNFuse. Finally, we design two strategy options for the loss function of our network, and elaborate on other details for model training.

Experimental results and analysis

In this section, we first briefly analyze the datasets and metrics in experiments. Subsequently, the effectiveness of RXDNFuse is verified through qualitative and quantitative evaluations of RXDNFuse and five state-of-the-art fusion methods on the TNO dataset,1 INO dataset2 and OTCBVS dataset.3

Conclusions

In this paper, we propose an effective infrared and visible image fusion method based on the aggregated residual dense network. The proposed RXDNFuse is an end-to-end model, which can effectively avoid the manual design of image decomposition measurement and fusion rules. It can simultaneously keep better the thermal radiation information in infrared images and the texture detail information in visible images. Specifically, our fused results look like detailed visible images with clear

CRediT authorship contribution statement

Yongzhi Long: Conceptualization, Methodology, Validation, Formal analysis, Visualization, Software, Writing - original draft. Haitao Jia: Conceptualization, Methodology, Validation, Formal analysis, Visualization. Yida Zhong: Resources, Writing - review & editing, Supervision, Data curation. Yadong Jiang: Writing - review & editing. Yuming Jia: Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Sichuan Science and Technology Project (Grant nos. 2018GZDZX003, 2020YFG0306, 2020YFG0055 and 2020YFG0327), and the Science and Technology Program of Hebei (Grant nos. 19255901D and 20355901D).

References (51)

  • MaJ. et al.

    Infrared and visible image fusion via gradient transfer and total variation minimization

    Inf. Fusion

    (2016)
  • XiangT. et al.

    A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking pcnn in nsct domain

    Infrared Phys. Technol.

    (2015)
  • LiuY. et al.

    Multi-focus image fusion with a deep convolutional neural network

    Inf. Fusion

    (2017)
  • LiH. et al.

    Infrared and visible image fusion with resnet and zero-phase component analysis

    Infrared Phys. Technol.

    (2019)
  • MaJ. et al.

    Fusiongan: A generative adversarial network for infrared and visible image fusion

    Inf. Fusion

    (2019)
  • MaJ. et al.

    Infrared and visible image fusion via detail preserving adversarial learning

    Inf. Fusion

    (2020)
  • PiellaG.

    A general framework for multiresolution image fusion: from pixels to regions

    Inf. Fusion

    (2003)
  • ZhangQ. et al.

    Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review

    Inf. Fusion

    (2018)
  • ZhaoJ. et al.

    Fusion of visible and infrared images using global entropy and gradient constrained regularization

    Infrared Phys. Technol.

    (2017)
  • AslantasV. et al.

    A new image quality metric for image fusion: the sum of the correlations of differences

    AEU-Int. J. Electron. Commun.

    (2015)
  • LiuX. et al.

    Remote sensing image fusion based on two-stream fusion network

    Inf. Fusion

    (2020)
  • DograA. et al.

    From multi-scale decomposition to non-multi-scale decomposition methods: a comprehensive survey of image fusion techniques and its applications

    IEEE Access

    (2017)
  • ChanA.L. et al.

    Fusing concurrent visible and infrared videos for improved tracking performance

    Opt. Eng.

    (2013)
  • LIJ. et al.

    Infrared and visible image fusion based on saliency detection and infrared target segment

  • KumarP. et al.

    Fusion of thermal infrared and visible spectrum video for robust surveillance

  • Cited by (84)

    • Multi-focus image fusion via interactive transformer and asymmetric soft sharing

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text