Elsevier

Signal Processing

Volume 189, December 2021, 108282
Signal Processing

DCKN: Multi-focus image fusion via dynamic convolutional kernel network

https://doi.org/10.1016/j.sigpro.2021.108282Get rights and content

Highlights

  • Context-aware convolutional kernels are exploited for the multi-focus image fusion task.

  • The dynamic kernels used in the proposed architecture are not only position-varying but also sample-varying.

  • Proposed network can work in both supervised and unsupervised learning. The bright channel metric is introduced for unsupervised learning. Proposed approach is an end-to-end architecture that does not require that does not require any postprocessing algorithms.

Abstract

In current multi-focus image fusion approaches with convolutional neural network (CNN), the same set of convolutional kernels is used to multi-focus images for feature extraction of all regions. However, the same kernels may not be optimal for all regions in multi-focus images, incurring artifacts in textureless and edge regions of the fused image. To address these problems, this paper proposes a dynamic convolutional kernel network (DCKN) for multi-focus image fusion, in which the convolutional kernels are dynamically generated from region context conditioned on input images. The kernels in the proposed architecture are not only position-varying but also sample-varying, which can adapt accurately to spatially variant blur caused by depth and texture variations in multi-focus images. Moreover, our DCKN works not only in supervised learning, but also in unsupervised learning. For supervised learning, the ground-truth fusion image is utilized to supervise the output fused image. For unsupervised learning, we introduce bright channel and total variation loss function to constraint the DCKN jointly. Bright channel metric can determine roughly whether source pixels are focused or not, which is utilized to guide the training process for the unsupervised network. Extensive experiments on popular multi-focus images show that our DCKN without any post-processing algorithms is comparable to state-of-the-art approaches, and our unsupervised model obtains high fusion quality.

Introduction

Most imaging systems have a limited depth-of-field. Objects within a limited distance from the imaging plane remain in focus, while objects closer or further than that are blurred in the image. Usually, imaging systems need to capture multiple images of the same scene to obtain different focused objects. Multi-focus image fusion is an effective technique to gather the focused objects from multiple images of the same scene into a single image. It has been applied in many varieties of applications, such as medical diagnosis, photography applications, computer vision, and remote sensing [1], [2], [3], [4]. Though multi-focus image fusion technique has been studied for a long time, it still is a challenging task.

For decades, researchers have done vast work for multi-focus image fusion and one of the most effective solutions at present is employing the powerful convolutional neural network (CNN) models. Liu et al. [5] trained a classifier with the CNN to classify focused and defocused patches, generating a decision map for fusion. Subsequently, Tang et al. [6] created a more robust training dataset for training the CNN by labeling the original images as focused, unknown, and defocused. Liu et al. [5] and Tang et al. [6] trained the network by dividing input images into small patches then feeding them to the CNNs, which is prone to suffer from the influence of patch size, and the networks are not the end-to-end models. To address these issues, some researchers proposed pixel-level dense decision map prediction. For instance, Lai et al. [7] designed a visual attention fully convolutional neural network to directly produce the dense decision map. The reference [45] introduces a cascade network to generate decision maps. Guo et al. [8] introduced a generative adversarial network (GAN) to generate the decision map, and then refined it by the convolutional conditional random fields technique. In these methods, the parameters of convolutional kernels are same for all positions in multi-focus images regardless of the difference among image contents. As a result, these CNN models applied same kernels to every position, are easy to distinguish focused and defocused information in well-textured regions, whereas they have a weak capability in textureless regions, such as in the sky, face, and other smooth areas. Some such exemplar textures are shown in Fig. 1. They usually incur some artifacts in textureless regions, so vast post-processing algorithms are required to remove the artifacts in some methods.

To address problems mentioned above, we propose a dynamic convolutional kernel network for multi-focus image fusion, which can be used for both supervised and unsupervised learning. The proposed network exploits context-aware convolutional kernels to detect focused information. The convolutional kernels are dynamically generated from region context conditioned on input images. Compared with fixed kernels used in previous CNN models, the dynamic kernels are not only position-varying but also sample-varying, which allows them to adapt accurately to spatially variant blur caused by depth and texture variations in multi-focus images. By using the dynamic kernels, the proposed architecture has a strong capability to estimate focused information in both well-textured and textureless regions. Thus, our architecture can obtain satisfactory fusion results without any post-processing algorithms. Concretely, our method estimates the specific convolutional kernels from multi-focus images using a convolutional encoder-decoder neural network, and convolves the kernels with the input multi-focus images to classify focused and defocused pixels in input images, generating pixel-level dense prediction.

In order to realize unsupervised training, we introduce bright channel [9] and total variation function. Bright channel principle is to determine whether the pixel is focused. Concretely, the bright channel is used to describe the maximum intensity of the pixels in an image patch. We can consider the pixel values of the defocused image as the weighted sum of pixel values in the corresponding focused image. Obviously, the maximum intensity of pixels in the defocused image patch after the weighted sum operation is not more than the maximum intensity of the pixels in the focused image patch. So we can compare the bright channel matrix of focused and defocused images to generate score maps, guiding network training. Further, we introduce the total variation function to consider spatial consistency among pixels of the decision map.

The main contributions of this work are as follows:

  • 1.

    We propose a dynamic convolutional kernel network architecture, namely DCKN, for multi-focus image fusion. It is an end-to-end architecture, generating fusion results without applying any post-processing algorithms.

  • 2.

    Context-aware convolutional kernels are exploited for the multi-focus image fusion task. The kernels are utilized for adapting to spatially variant blur in multi-focus images.

  • 3.

    We introduce the bright channel metric for the unsupervised learning, which effectively performs focus detection in pixel units, thereby guiding the network training.

The paper is organized as follows. Section 2 reviews works related to multi-focus image fusion and dynamic kernel. Section 3 describes the proposed DCKN and loss functions in detail. Section 4 presents experimental results of the proposed supervised and unsupervised model, and finally Section 5 concludes this work.

Section snippets

Transform domain based multi-focus image fusion

The core idea of transform domain based methods is to transform the source image to other domains where the fusion task can be completed more efficiently. Specifically, the source images first are decomposed into a specific multi-scale domain, then the corresponding decomposed coefficients are integrated in accordance with certain fusion criteria. Finally, the fusion results can be reconstructed by utilizing an inverse transform. Common used transforms for image fusion contain discrete wavelet

The proposed method

As mentioned above, the conventional CNN-based methods use the same convolutional kernels for all regions in multi-focus images to detect focused information, which usually leads to false detections in textureless regions. Our proposed architecture exploits context-aware kernels that vary with different image contents. In our architecture, different kernels are applied to different positions of input multi-focus images to detect accurately focused information in both well-textured and

Experimental results and discussions

In this section, we evaluate the proposed model on the most commonly used multi-focus image sets by comparing it with state-of-the-art methods. Furthermore, we discuss the proposed unsupervised model to show its effectiveness.

Conclusion

In this work, we propose a dynamic convolutional kernel network for multi-focus image fusion, which exploits context-aware convolutional kernels to detect focused information. The convolutional kernels are dynamically generated from region context conditioned on input images, which are not only position-varying but also sample-varying. Our convolutional kernels have a stronger capability to distinguish focused and defocused information in both well-textured and textureless regions. As a result,

CRediT authorship contribution statement

Zhao Duan: Conceptualization, Methodology, Writing – original draft. Taiping Zhang: Conceptualization, Writing – review & editing, Supervision. Xiaoliu Luo: Formal analysis, Writing – review & editing, Supervision. Jin Tan: Data curation, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (55)

  • Y. Zhang et al.

    Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure

    Inf. Fusion

    (2017)
  • W. Huang et al.

    Multi-focus image fusion using pulse coupled neural network

    Pattern Recognit. Lett.

    (2007)
  • Y. Zhang et al.

    IFCNN: a general image fusion framework based on convolutional neural network

    Inf. Fusion

    (2020)
  • M. Nejati et al.

    Multi-focus image fusion using dictionary-based sparse representation

    Inf. Fusion

    (2015)
  • C. Yang et al.

    A novel similarity based quality metric for image fusion

    Inf. Fusion

    (2008)
  • Y. Chen et al.

    A new automated quality assessment algorithm for image fusion

    Image Vis. Comput.

    (2009)
  • Y. Han et al.

    A new image fusion performance metric based on visual information fidelity

    Inf. Fusion

    (2013)
  • Q. Wang et al.

    Performance evaluation of image fusion techniques

    Image Fusion Algorithms and Applications

    (2008)
  • Y. Zhang et al.

    Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure

    Inf. Fusion

    (2017)
  • Y. Liu et al.

    A general framework for image fusion based on multi-scale transform and sparse representation

    Inf. Fusion

    (2015)
  • H. Zhang et al.

    MFF-GAN: an unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion

    Inf. Fusion

    (2021)
  • M. Amin-Naji et al.

    Ensemble of CNN for multi-focus image fusion

    Inf. Fusion

    (2019)
  • X. Li et al.

    Multi-focus image fusion based on nonsubsampled contourlet transform and residual removal

    Signal Process.

    (2021)
  • J. Ma et al.

    Multi-focus image fusion using boosted random walks-based algorithm with two-scale focus maps

    Neurocomputing

    (2019)
  • S. Li et al.

    Remote sensing image fusion via sparse representations over learned dictionaries

    IEEE Trans. Geosci. Remote Sens.

    (2013)
  • S. Zheng et al.

    Remote sensing image fusion using multiscale mapped LS-SVM

    IEEE Trans. Geosci. Remote Sens.

    (2008)
  • R. Lai et al.

    Multi-scale visual attention deep convolutional neural network for multi-focus image fusion

    IEEE Access

    (2019)
  • Cited by (8)

    • Integrated MPCAM: Multi-PSF learning for large depth-of-field computational imaging

      2023, Information Fusion
      Citation Excerpt :

      Specifically, it has been successfully applied to improve the imaging quality in optical microscopy, holographic imaging, integral imaging, laser speckle contrast imaging, thermal imaging, etc. Existing image fusion algorithms can be classified into two main categories: traditional methods [2–6] and deep learning based methods [7–20]. As is well known, desirable image fusion performance relies on appropriate activity-level measurement and ingenious fusion rules.

    • A Deep Convolutional Neural Network-Based Method for Self-Piercing Rivet Joint Defect Detection

      2024, Journal of Computing and Information Science in Engineering
    View all citing articles on Scopus
    View full text