DCKN: Multi-focus image fusion via dynamic convolutional kernel network
Introduction
Most imaging systems have a limited depth-of-field. Objects within a limited distance from the imaging plane remain in focus, while objects closer or further than that are blurred in the image. Usually, imaging systems need to capture multiple images of the same scene to obtain different focused objects. Multi-focus image fusion is an effective technique to gather the focused objects from multiple images of the same scene into a single image. It has been applied in many varieties of applications, such as medical diagnosis, photography applications, computer vision, and remote sensing [1], [2], [3], [4]. Though multi-focus image fusion technique has been studied for a long time, it still is a challenging task.
For decades, researchers have done vast work for multi-focus image fusion and one of the most effective solutions at present is employing the powerful convolutional neural network (CNN) models. Liu et al. [5] trained a classifier with the CNN to classify focused and defocused patches, generating a decision map for fusion. Subsequently, Tang et al. [6] created a more robust training dataset for training the CNN by labeling the original images as focused, unknown, and defocused. Liu et al. [5] and Tang et al. [6] trained the network by dividing input images into small patches then feeding them to the CNNs, which is prone to suffer from the influence of patch size, and the networks are not the end-to-end models. To address these issues, some researchers proposed pixel-level dense decision map prediction. For instance, Lai et al. [7] designed a visual attention fully convolutional neural network to directly produce the dense decision map. The reference [45] introduces a cascade network to generate decision maps. Guo et al. [8] introduced a generative adversarial network (GAN) to generate the decision map, and then refined it by the convolutional conditional random fields technique. In these methods, the parameters of convolutional kernels are same for all positions in multi-focus images regardless of the difference among image contents. As a result, these CNN models applied same kernels to every position, are easy to distinguish focused and defocused information in well-textured regions, whereas they have a weak capability in textureless regions, such as in the sky, face, and other smooth areas. Some such exemplar textures are shown in Fig. 1. They usually incur some artifacts in textureless regions, so vast post-processing algorithms are required to remove the artifacts in some methods.
To address problems mentioned above, we propose a dynamic convolutional kernel network for multi-focus image fusion, which can be used for both supervised and unsupervised learning. The proposed network exploits context-aware convolutional kernels to detect focused information. The convolutional kernels are dynamically generated from region context conditioned on input images. Compared with fixed kernels used in previous CNN models, the dynamic kernels are not only position-varying but also sample-varying, which allows them to adapt accurately to spatially variant blur caused by depth and texture variations in multi-focus images. By using the dynamic kernels, the proposed architecture has a strong capability to estimate focused information in both well-textured and textureless regions. Thus, our architecture can obtain satisfactory fusion results without any post-processing algorithms. Concretely, our method estimates the specific convolutional kernels from multi-focus images using a convolutional encoder-decoder neural network, and convolves the kernels with the input multi-focus images to classify focused and defocused pixels in input images, generating pixel-level dense prediction.
In order to realize unsupervised training, we introduce bright channel [9] and total variation function. Bright channel principle is to determine whether the pixel is focused. Concretely, the bright channel is used to describe the maximum intensity of the pixels in an image patch. We can consider the pixel values of the defocused image as the weighted sum of pixel values in the corresponding focused image. Obviously, the maximum intensity of pixels in the defocused image patch after the weighted sum operation is not more than the maximum intensity of the pixels in the focused image patch. So we can compare the bright channel matrix of focused and defocused images to generate score maps, guiding network training. Further, we introduce the total variation function to consider spatial consistency among pixels of the decision map.
The main contributions of this work are as follows:
- 1.
We propose a dynamic convolutional kernel network architecture, namely DCKN, for multi-focus image fusion. It is an end-to-end architecture, generating fusion results without applying any post-processing algorithms.
- 2.
Context-aware convolutional kernels are exploited for the multi-focus image fusion task. The kernels are utilized for adapting to spatially variant blur in multi-focus images.
- 3.
We introduce the bright channel metric for the unsupervised learning, which effectively performs focus detection in pixel units, thereby guiding the network training.
The paper is organized as follows. Section 2 reviews works related to multi-focus image fusion and dynamic kernel. Section 3 describes the proposed DCKN and loss functions in detail. Section 4 presents experimental results of the proposed supervised and unsupervised model, and finally Section 5 concludes this work.
Section snippets
Transform domain based multi-focus image fusion
The core idea of transform domain based methods is to transform the source image to other domains where the fusion task can be completed more efficiently. Specifically, the source images first are decomposed into a specific multi-scale domain, then the corresponding decomposed coefficients are integrated in accordance with certain fusion criteria. Finally, the fusion results can be reconstructed by utilizing an inverse transform. Common used transforms for image fusion contain discrete wavelet
The proposed method
As mentioned above, the conventional CNN-based methods use the same convolutional kernels for all regions in multi-focus images to detect focused information, which usually leads to false detections in textureless regions. Our proposed architecture exploits context-aware kernels that vary with different image contents. In our architecture, different kernels are applied to different positions of input multi-focus images to detect accurately focused information in both well-textured and
Experimental results and discussions
In this section, we evaluate the proposed model on the most commonly used multi-focus image sets by comparing it with state-of-the-art methods. Furthermore, we discuss the proposed unsupervised model to show its effectiveness.
Conclusion
In this work, we propose a dynamic convolutional kernel network for multi-focus image fusion, which exploits context-aware convolutional kernels to detect focused information. The convolutional kernels are dynamically generated from region context conditioned on input images, which are not only position-varying but also sample-varying. Our convolutional kernels have a stronger capability to distinguish focused and defocused information in both well-textured and textureless regions. As a result,
CRediT authorship contribution statement
Zhao Duan: Conceptualization, Methodology, Writing – original draft. Taiping Zhang: Conceptualization, Writing – review & editing, Supervision. Xiaoliu Luo: Formal analysis, Writing – review & editing, Supervision. Jin Tan: Data curation, Writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (55)
- et al.
Medical image fusion using m-PCNN
Inf. Fusion
(2008) Medical image fusion using multi-level local extrema
Inf. Fusion
(2014)- et al.
Multi-focus image fusion with a deep convolutional neural network
Inf. Fusion
(2017) - et al.
Pixel convolutional neural network for multi-focus image fusion
Inf. Sci.
(2018) - et al.
Adaptive multi-focus image fusion using a wavelet-based statistical sharpness measure
Signal Process.
(2012) - et al.
Multifocus image fusion using the nonsubsampled contourlet transform
Signal Process.
(2009) - et al.
Image fusion by using steerable pyramid
Pattern Recognit. Lett.
(2001) - et al.
A novel algorithm of image fusion using shearlets
Opt. Commun.
(2011) - et al.
Multi-focus image fusion with dense SIFT
Inf. Fusion
(2015) - et al.
Image matting for fusion of multi-focus images in dynamic scenes
Inf. Fusion
(2013)
Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure
Inf. Fusion
Multi-focus image fusion using pulse coupled neural network
Pattern Recognit. Lett.
IFCNN: a general image fusion framework based on convolutional neural network
Inf. Fusion
Multi-focus image fusion using dictionary-based sparse representation
Inf. Fusion
A novel similarity based quality metric for image fusion
Inf. Fusion
A new automated quality assessment algorithm for image fusion
Image Vis. Comput.
A new image fusion performance metric based on visual information fidelity
Inf. Fusion
Performance evaluation of image fusion techniques
Image Fusion Algorithms and Applications
Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure
Inf. Fusion
A general framework for image fusion based on multi-scale transform and sparse representation
Inf. Fusion
MFF-GAN: an unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion
Inf. Fusion
Ensemble of CNN for multi-focus image fusion
Inf. Fusion
Multi-focus image fusion based on nonsubsampled contourlet transform and residual removal
Signal Process.
Multi-focus image fusion using boosted random walks-based algorithm with two-scale focus maps
Neurocomputing
Remote sensing image fusion via sparse representations over learned dictionaries
IEEE Trans. Geosci. Remote Sens.
Remote sensing image fusion using multiscale mapped LS-SVM
IEEE Trans. Geosci. Remote Sens.
Multi-scale visual attention deep convolutional neural network for multi-focus image fusion
IEEE Access
Cited by (8)
Divide-and-conquer model based on wavelet domain for multi-focus image fusion
2023, Signal Processing: Image CommunicationIntegrated MPCAM: Multi-PSF learning for large depth-of-field computational imaging
2023, Information FusionCitation Excerpt :Specifically, it has been successfully applied to improve the imaging quality in optical microscopy, holographic imaging, integral imaging, laser speckle contrast imaging, thermal imaging, etc. Existing image fusion algorithms can be classified into two main categories: traditional methods [2–6] and deep learning based methods [7–20]. As is well known, desirable image fusion performance relies on appropriate activity-level measurement and ingenious fusion rules.
Elliptical convolution kernel: More real visual field
2022, NeurocomputingMulti-image fusion: optimal decomposition strategy with heuristic-assisted non-subsampled shearlet transform for multimodal image fusion
2024, Signal, Image and Video ProcessingA Deep Convolutional Neural Network-Based Method for Self-Piercing Rivet Joint Defect Detection
2024, Journal of Computing and Information Science in Engineering