Research paperGlobal context guided hierarchically residual feature refinement network for defocus blur detection
Introduction
As a common phenomenon in nature images, defocus blur occurs when objects are not within the camera’s depth of focus during the imaging process. As an important pre-processing step, the task of defocus blur detection is to distinguish the in-focus and out-of-focus areas in an image. Due to its widely potential applications in computer vision filed such as depth detection [1], image quality assessment [2], [3], image refocusing [4], [5], [6], salient object detection [7], [8], defocus magnification [1], [9], image deblurring [10], [11], [12], medical image analysis [13], [14], and multi-focus image fusion [15], [16], defocus blur detection has gained broad interests from researchers.
During the past years, a variety of methods designed for defocus blur detection have been put forward. Based on the extracted image features used for this task, the methods can be roughly classified into two categories, i.e., traditional methods which rely on hand-crafted features, and deep learning methods which rely on deep neural network. As to traditional hand-crafted feature based methods, they usually exploit low-level cues such as gradient and frequency since the edges of image contents can be obviously affected by defocus blur[11], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. Although many previous methods have obtained great success by using some learning techniques such as sparse coding [11], singular value decomposition [19] and image matting [18], they are mainly based on hand-crafted features which are still easily affected by some challenges. First, as shown in the red rectangular region of Fig. 1(a), when a certain image region is in-focus but lacks plenty of structural information, traditional hand-crafted features based methods often wrongly regard it as blurry region since their is no high-level semantic information. Second, as shown in the blue rectangular region of Fig. 1(a), when the image contains cluttered background, the result is affected obviously. In addition, the sharp edge between different kinds of regions cannot be well preserved (as shown in the green rectangular region of Fig. 1a). Another challenge is that the different image scales greatly influence the sharpness of an image, as shown in Fig. 2. For a certain image patch, it gives different blurry effect under different image scales.
In the past few years, Deep Convolutional Neural Networks (DCNNs), which automatically extract hierarchical features for natural images, have pushed computer vision into a new era and achieved superior performance in many tasks [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. As a result, many DCNNs based defocus blur detection methods have also been proposed [39], [40], [41], [42], [43]. In the early years, Yan et al. [44] used a pre-trained deep neural network to initialize the model parameters and proposed a general regression model embedded network to classify different blur types. In [26], Park et al. combined hand-crafted features and deep features extracted from an image patches which contain sparse strong edges for blur amount estimation. However, focal regions with low-contrast and without sufficient structure information still cannot well be differentiated. In addition, due to consecutive convolution and spatial pooling operations during the deep feature extraction process, the fine details of image contents and the boundaries between blur and non-blur regions are lost. Zhao et al. [40] first proposed an end-to-end fully convolutional network with multiple streams in a bottom-top-bottom manner (BTBNet) for defocus blur detection, which can be trained by manually annotated image pairs. In BTBNet, both the low-level details extracted from shallow layers and high-level semantic information extracted from deep layers are aggregated to improve the final detection score maps. In addition, in order to handle the sensitivity of defocus degree to image scales, a multi-stream strategy which uses original image with different scales is leveraged. In order to better utilize the deep features, domain adaptation [45], feature aggregation network [41], feature refining network [43], and cross-ensemble network [42] are also designed to boost defocus blur detection performance.
Although the results of blur detection have been promoted by a big step by using DCNNs, there are still several problems that make the detected results not satisfactory enough for some subsequent tasks. First, the high-level global context information and low-level details cannot combined sufficiently, i.e., the semantic information could lost gradually during the top to bottom feature refining process while the fine details cannot be well preserved during the bottom to top feature refining process [43]. Second, in the fusion step, original deep features are not exploited to guide the fusing process. Third but not last, the redundancy existed in the extracted deep features is not sufficiently suppressed, which could degenerate the feature aggregation. In this paper, we introduce a brand new pixel-wise fully convolutional network for defocus blur detection via global context guided hierarchically residual feature refining, referred as HRFRNet briefly, which is both efficient and effective. Specifically, we design a multi-scale dilation convolution based global context generation (GCG) module to capture the global context information and use it to guide the feature refining process in a hierarchical manner. By considering that the human eye’s perception of defocus is sensitive to different image scales, we propose a deep features guided fusion module (DFGF) to integrate the side outputs of different feature refining stages for generating the final score map. In our network, different levels of features including the low-level appearance details, global context features, and high-level semantic information are aggregated in a hierarchical manner to boost the final detection performance.
To a nutshell, we summarize the contributions of this paper as follows:
- •
We introduce a novel HRFRNet for image defocus blur detection, which can effectively aggregate the low-level appearance details, global context features and high-level semantic information.
- •
In order to sufficiently exploit the context information for promoting the defocus blur detection results, we design a GCGM to capture the global context information for guiding the deep feature refining process in a hierarchical manner.
- •
In order to conquer the human eye’s defocus sensitivity to image scales, we design a DFGF module to integrate the side outputs of different feature refining stages for generating the final score map.
- •
Extensive experiments with ablation analysis on two commonly used datasets are carried to validate the superiority of the proposed network when compared with other state-of-the-art methods.
Section snippets
Related work
Defocus blur detection acts as one of the sub-fields of computer vision, and has attracted more and more attention due to its widely practical applications such as image de-blurring, image refocusing, defocus magnification, image quality assessment and multi-focus image fusion. To this end, many kinds of methods designed for defocus blur detection have been proposed during the last few years. In general, previous methods can be roughly categorized into two classes. The first class of methods
Proposed HRFRNet
The proposed HRFRNet takes a natural image with size as input and generates a score map with the same size which differentiates the defocus and in-focus regions of the image. In Fig. 3, we give the entire architecture of HRFRNet.
As an effective deep defocus blur detection neural network, it should be flexible to exploit deep feature extracted from different layers to boot the final detection results. On one hand, the low-level deep features should be used to refine the scattered and
Datasets
In order to validate the efficacy of our proposed network, we use two public datasets to evaluate its performance, which are as follows:
Shi et al.’s dataset [11]: This is the first dataset collected for blur detection evaluation, which contains 704 images with partially defocus blur and 296 images with motion blur. The images are with manually annotated ground truths. In this work, we use the 704 images with defocus blur for experiments. The first 604 defocus blur images are used for training
Conclusions
In this paper, we present a global context guided hierarchically residual feature refinement network (HRFRNet) for defocus blur detection for an natural image. In our network, we first generate a initial coarse defocus blur score map from the deepest feature maps from the backbone feature extraction network, then we design a hierarchical residual feature refining module (HRFRM) to refine the initial result step by step in a top-down manner. During the refining process, we develop a global
CRediT authorship contribution statement
Yongping Zhai: Conceptualization, Methodology, Writing - original draft. Junhua Wang: Data curation, Software, Writing - original draft. Jinsheng Deng: Visualization, Investigation. Guanghui Yue: Supervision, Writing - review & editing. Wei Zhang: Formal analysis, Validation. Chang Tang: Supervision, Writing - review & editing.
Declaration of Competing Interest
None.
Acknowledgment
This work was partly supported by the National Natural Science Foundation of China (NO. 62076228 and 61701451), and partly by Natural Natural Science Foundation of Hubei Province (NO. 2020CFB644), and partly by Key Laboratory of Information Perception and Systems for Public Security of MIIT (Nanjing University of Science and Technology) (No. 202007). We would also like to thank NVIDIA Corporation for the donation of Titan V and Titan Xp GPU cards used for this research, and we also sincerely
References (68)
- et al.
Blur image identification with ensemble convolution neural networks
Signal Processing
(2019) - et al.
Multifocus image fusion with enhanced linear spectral clustering and fast depth map estimation
Neurocomputing
(2018) - et al.
Joint blur kernel estimation and cnn for blind image restoration
Neurocomputing
(2020) - et al.
Multi-focus image fusion based on depth extraction with inhomogeneous diffusion equation
Signal Processing
(2016) - et al.
Image partial blur detection and classification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2008) - et al.
Blurred image region detection and classification
ACM International Conference on Multimedia
(2011) - et al.
Learning deconvolution network for semantic segmentation
IEEE International Conference on Computer Vision
(2015) - et al.
Attention-guided cnn for image denoising
Neural Networks
(2020) - et al.
Multiscale blur detection by learning discriminative deep features
Neurocomputing
(2018) - et al.
Defocus blur detection via multi-stream bottom-top-bottom fully convolutional network
IEEE Conference on Computer Vision and Pattern Recognition
(2018)
Lbp-based segmentation of defocus blur
IEEE Trans. Image Process.
Learning to understand image blur
IEEE Conference on Computer Vision and Pattern Recognition
Defocus map estimation from a single image via spectrum contrast
Opt Lett
Estimating defocus blur via rank of local patches
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Venice, Italy
Single image focus editing
Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on
Single-image refocusing and defocusing
IEEE Trans. Image Process.
Salient region detection by ufo: Uniqueness, focusness and objectness
Proceedings of the IEEE international conference on computer vision
Salient object detection via weighted low rank matrix recovery
IEEE Signal Process Lett
Defocus magnification
Comput. Graphics Forum
Coded apertures for defocus deblurring
Symposium Iberoamericano de Computacion Grafica
Discriminative blur detection features
IEEE Conference on Computer Vision and Pattern Recognition
Attention-related-functional changes induced by imposed myopia defocus from spectacle lens
Investigative Ophthalmology & Visual Science
A global approach to describe retinal defocus patterns
PLoS ONE
Multi-focus image fusion method based on two stage of convolutional neural network
Signal Processing
Defocus map estimation from a single image
Pattern Recognit
: A spectral and spatial measure of local perceived sharpness in natural images
IEEE Trans. Image Process.
Blur processing using double discrete wavelet transform
IEEE Conference on Computer Vision and Pattern Recognition
Estimating spatially varying defocus blur from a single image.
IEEE Trans. Image Process.
Classifying discriminative features for blur detection
IEEE Trans Cybern
A spectral and spatial approach of coarse-to-fine blurred image region detection
IEEE Signal Process Lett
Defocus blur-invariant scale-space feature extractions
IEEE Trans. Image Process.
A unified approach of multi-scale deep and hand-crafted features for defocus estimation
IEEE Conference on Computer Vision and Pattern Recognition
Imagenet classification with deep convolutional neural networks
NIPS
Deep visual tracking: review and experimental comparison
Pattern Recognit
Cited by (7)
Contrastive Learning based Multi-task Network for Image Manipulation Detection
2022, Signal ProcessingCitation Excerpt :According to whether the semantic information of an image has been changed or not, the image forgery techniques can be broadly divided into two categories [1]: (1) semantic-preserving, and (2) semantic-changing. Semantic-preserving manipulations (e.g., blurring [2,3], compression [4,5], and brightness/contrast change) barely alter any content of the image, but these operations are often used as post-processing to undermine manipulation traces and potentially hide the discrepancy between the tampered and non-tampered areas. On the other hand, semantic-changing manipulations (e.g., splicing [6], copy-move [7], and removal [8]) create disinformation and modify the original semantic of image, thus deceiving and misleading others.
GCNet: Grid-like context-aware network for RGB-thermal semantic segmentation
2022, NeurocomputingCitation Excerpt :In the second stage, the initial RGB features, semantic segmentation labels Gt, and initial thermal features are fed to the network simultaneously. Unlike the methods in [42,46], where the ASPP [47] structure is applied, the GCM consists of several convolutional blocks, with multi-scale receptive fields, connected to a grid. As shown in Fig. 4, the skip connections between every pair of parallel branches increases the pixel sampling density.
Efficient image blur detection via hierarchical edge guidance and region complementation
2023, Complex and Intelligent SystemsAnalysis of Image Matting Techniques Used for Generating Alpha-Matte
2023, Proceedings of the 2023 12th International Conference on System Modeling and Advancement in Research Trends, SMART 2023