Abstract
In this paper, a squeeze-and-decomposition network (SDNet) is proposed to realize multi-modal and digital photography image fusion in real time. Firstly, we generally transform multiple fusion problems into the extraction and reconstruction of gradient and intensity information, and design a universal form of loss function accordingly, which is composed of intensity term and gradient term. For the gradient term, we introduce an adaptive decision block to decide the optimization target of the gradient distribution according to the texture richness at the pixel scale, so as to guide the fused image to contain richer texture details. For the intensity term, we adjust the weight of each intensity loss term to change the proportion of intensity information from different images, so that it can be adapted to multiple image fusion tasks. Secondly, we introduce the idea of squeeze and decomposition into image fusion. Specifically, we consider not only the squeeze process from source images to the fused result, but also the decomposition process from the fused result to source images. Because the quality of decomposed images directly depends on the fused result, it can force the fused result to contain more scene details. Experimental results demonstrate the superiority of our method over the state-of-the-arts in terms of subjective visual effect and quantitative metrics in a variety of fusion tasks. Moreover, our method is much faster than the state-of-the-arts, which can deal with real-time fusion tasks.
Similar content being viewed by others
References
Ballester, C., Caselles, V., Igual, L., Verdera, J., & Rougé, B. (2006). A variational model for p+ xs image fusion. International Journal of Computer Vision, 69(1), 43–58.
Cai, J., Gu, S., & Zhang, L. (2018). Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing, 27(4), 2049–2062.
Fu, X., Lin, Z., Huang, Y., & Ding, X. (2019). A variational pan-sharpening with local gradient constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10,265–10,274
Goshtasby, A. A. (2005). Fusion of multi-exposure images. Image and Vision Computing, 23(6), 611–618.
Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., & Harada, T. (2017). Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: Proceedings of the International Conference on Intelligent Robots and Systems, pp. 5108–5115.
Haghighat, M., Razian, M.A. (2014). Fast-fmi: non-reference image fusion metric. In: Proceedings of the IEEE International Conference on Application of Information and Communication Technologies, pp. 1–3.
Hayat, N., & Imran, M. (2019). Ghost-free multi exposure image fusion technique using dense sift descriptor and guided filter. Journal of Visual Communication and Image Representation, 62, 295–308.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K.Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708.
Kong, S. G., Heo, J., Boughorbel, F., Zheng, Y., Abidi, B. R., Koschan, A., et al. (2007). Multiscale fusion of visible and thermal ir images for illumination-invariant face recognition. International Journal of Computer Vision, 71(2), 215–233.
Kumar, B. S. (2013). Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform. Signal, Image and Video Processing, 7(6), 1125–1143.
Lai, S.H., Fang, M. (1998). Adaptive medical image visualization based on hierarchical neural networks and intelligent decision fusion. In: Proceedings of the IEEE Neural Networks for Signal Processing Workshop, pp. 438–447.
Lee, S.h., Park, J.S., Cho, N.I. (2018). A multi-exposure image fusion based on the adaptive weights reflecting the relative pixel intensity and global gradient. In: Proceedings of the IEEE International Conference on Image Processing, pp. 1737–1741.
Li, H., & Wu, X. J. (2018). Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5), 2614–2623.
Li, H., Wu, X. J., & Kittler, J. (2020). Mdlatlrr: A novel decomposition method for infrared and visible image fusion. IEEE Transactions on Image Processing, 29, 4733–4746.
Li, S., Kang, X., & Hu, J. (2013). Image fusion with guided filtering. IEEE Transactions on Image Processing, 22(7), 2864–2875.
Li, S., Yin, H., & Fang, L. (2012). Group-sparse representation with dictionary learning for medical image denoising and fusion. IEEE Transactions on Biomedical Engineering, 59(12), 3450–3459.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755.
Liu, Y., Chen, X., Cheng, J., & Peng, H. (2017). A medical image fusion method based on convolutional neural networks. In: Proceedings of the International Conference on Information Fusion, pp. 1–7.
Liu, Y., Chen, X., Peng, H., & Wang, Z. (2017). Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36, 191–207.
Liu, Y., Liu, S., & Wang, Z. (2015). Multi-focus image fusion with dense sift. Information Fusion, 23, 139–155.
Liu, Y., & Wang, Z. (2014). Simultaneous image fusion and denoising with adaptive sparse representation. IET Image Processing, 9(5), 347–357.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Ma, B., Zhu, Y., Yin, X., Ban, X., Huang, H., & Mukeshimana, M. (2020). Sesf-fuse: An unsupervised deep model for multi-focus image fusion. Neural Computing and Applications pp. 1–12.
Ma, J., Chen, C., Li, C., & Huang, J. (2016). Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, 31, 100–109.
Ma, J., Jiang, X., Fan, A., Jiang, J., & Yan, J. (2021). Image matching from handcrafted to deep features: A survey. International Journal of Computer Vision, 129(1), 23–79.
Ma, J., Liang, P., Yu, W., Chen, C., Guo, X., Wu, J., et al. (2020). Infrared and visible image fusion via detail preserving adversarial learning. Information Fusion, 54, 85–98.
Ma, J., Xu, H., Jiang, J., Mei, X., & Zhang, X. P. (2020). Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29, 4980–4995.
Ma, J., Yu, W., Chen, C., Liang, P., Guo, X., & Jiang, J. (2020). Pan-gan: An unsupervised pan-sharpening method for remote sensing image fusion. Information Fusion, 62, 110–120.
Ma, J., Yu, W., Liang, P., Li, C., & Jiang, J. (2019). Fusiongan: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48, 11–26.
Ma, K., Li, H., Yong, H., Wang, Z., Meng, D., & Zhang, L. (2017). Robust multi-exposure image fusion: A structural patch decomposition approach. IEEE Transactions on Image Processing, 26(5), 2519–2532.
Naidu, V., & Raol, J. R. (2008). Pixel-level image fusion using wavelets and principal component analysis. Defence Science Journal, 58(3), 338–352.
Nejati, M., Samavi, S., & Shirani, S. (2015). Multi-focus image fusion using dictionary-based sparse representation. Information Fusion, 25, 72–84.
Paul, S., Sevcenco, I. S., & Agathoklis, P. (2016). Multi-exposure and multi-focus image fusion in gradient domain. Journal of Circuits, Systems and Computers, 25(10), 1650123.
Piella, G. (2003). A general framework for multiresolution image fusion: From pixels to regions. Information Fusion, 4(4), 259–280.
Piella, G. (2009). Image fusion for enhanced visualization: A variational approach. International Journal of Computer Vision, 83(1), 1–11.
Prabhakar, K.R., Srikar, V.S., & Babu, R.V. (2017). Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4714–4722.
Roberts, J. W., Van Aardt, J. A., & Ahmed, F. B. (2008). Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1),
Shen, J., Zhao, Y., Yan, S., Li, X., et al. (2014). Exposure fusion using boosting laplacian pyramid. IEEE Transactions on Cybernetics, 44(9), 1579–1590.
Shen, X., Yan, Q., Xu, L., Ma, L., & Jia, J. (2015). Multispectral joint image restoration via optimizing a scale map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(12), 2518–2530.
Szeliski, R., Uyttendaele, M., & Steedly, D. (2011). Fast poisson blending using multi-splines. In: Proceedings of the IEEE International Conference on Computational Photography, pp. 1–8.
Vedaldi, A., Fulkerson, B. (2010). Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the ACM International Conference on Multimedia, pp. 1469–1472.
Xing, L., Cai, L., Zeng, H., Chen, J., Zhu, J., & Hou, J. (2018). A multi-scale contrast-based image quality assessment model for multi-exposure image fusion. Signal Processing, 145, 233–240.
Xu, H., Ma, J., Jiang, J., Guo, X., & Ling, H. (2020). U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Xu, H., Ma, J., & Zhang, X. P. (2020). Mef-gan: Multi-exposure image fusion via generative adversarial networks. IEEE Transactions on Image Processing, 29, 7203–7216.
Zhang, H., Xu, H., Xiao, Y., Guo, X., & Ma, J. (2020). Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12,797–12,804.
Zhao, F., Xu, G., & Zhao, W. (2019). Ct and mr image fusion based on adaptive structure decomposition. IEEE Access, 7, 44002–44009.
Zhou, F., Hang, R., Liu, Q., & Yuan, X. (2019). Pyramid fully convolutional network for hyperspectral and multispectral image fusion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(5), 1549–1558.
Zhu, Z., Zheng, M., Qi, G., Wang, D., & Xiang, Y. (2019). A phase congruency and local laplacian energy based multi-modality medical image fusion method in nsct domain. IEEE Access, 7, 20811–20824.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Ioannis Gkioulekas.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, H., Ma, J. SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion. Int J Comput Vis 129, 2761–2785 (2021). https://doi.org/10.1007/s11263-021-01501-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01501-8