Global context guided hierarchically residual feature refinement network for defocus blur detection

doi:10.1016/j.sigpro.2021.107996

Signal Processing

Volume 183, June 2021, 107996

https://doi.org/10.1016/j.sigpro.2021.107996 Get rights and content

Highlights

•
We propose a novel deep neural network for defocus blur detection.
•
A global context pooling module is designed to capture global context information.
•
A HRFRM module is designed to refine stage-wise outputs.
•
A feature fusion module is designed to integrate the outputs of different stages.

Abstract

As an important pre-processing step, defocus blur detection makes critical role in various computer vision tasks. However, previous methods cannot obtain satisfactory results due to the complex image background clutter, scale sensitivity and miss of region boundary details. In this paper, for addressing these issues, we introduce a global context guided hierarchically residual feature refinement network (HRFRNet) for defocus blur detection from a natural image. In our network, the low-level fine detail features, high-level semantic and global context information are aggregated in a hierarchical manner to boost the final detection performance. In order to reduce the affect of complex background clutter and smooth regions without enough textures on the final results, we design a multi-scale dilation convolution based global context pooling module to capture the global context information from the most deep feature layer of the backbone feature extraction network. Then, a global context guiding module is introduced to add the global context information into different feature refining stages for guiding the feature refining process. In addition, by considering that the defocus blur is sensitive to image scales, we add a deep features guided fusion module to integrate the outputs of different stages for generating the final score map. Extensive experiments with ablation studies on two commonly used datasets are carried out to validate the superiority of our proposed network when compared with other 11 state-of-the-art methods in terms of both efficiency and accuracy.

Introduction

As a common phenomenon in nature images, defocus blur occurs when objects are not within the camera’s depth of focus during the imaging process. As an important pre-processing step, the task of defocus blur detection is to distinguish the in-focus and out-of-focus areas in an image. Due to its widely potential applications in computer vision filed such as depth detection [1], image quality assessment [2], [3], image refocusing [4], [5], [6], salient object detection [7], [8], defocus magnification [1], [9], image deblurring [10], [11], [12], medical image analysis [13], [14], and multi-focus image fusion [15], [16], defocus blur detection has gained broad interests from researchers.

During the past years, a variety of methods designed for defocus blur detection have been put forward. Based on the extracted image features used for this task, the methods can be roughly classified into two categories, i.e., traditional methods which rely on hand-crafted features, and deep learning methods which rely on deep neural network. As to traditional hand-crafted feature based methods, they usually exploit low-level cues such as gradient and frequency since the edges of image contents can be obviously affected by defocus blur[11], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. Although many previous methods have obtained great success by using some learning techniques such as sparse coding [11], singular value decomposition [19] and image matting [18], they are mainly based on hand-crafted features which are still easily affected by some challenges. First, as shown in the red rectangular region of Fig. 1(a), when a certain image region is in-focus but lacks plenty of structural information, traditional hand-crafted features based methods often wrongly regard it as blurry region since their is no high-level semantic information. Second, as shown in the blue rectangular region of Fig. 1(a), when the image contains cluttered background, the result is affected obviously. In addition, the sharp edge between different kinds of regions cannot be well preserved (as shown in the green rectangular region of Fig. 1a). Another challenge is that the different image scales greatly influence the sharpness of an image, as shown in Fig. 2. For a certain image patch, it gives different blurry effect under different image scales.

In the past few years, Deep Convolutional Neural Networks (DCNNs), which automatically extract hierarchical features for natural images, have pushed computer vision into a new era and achieved superior performance in many tasks [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. As a result, many DCNNs based defocus blur detection methods have also been proposed [39], [40], [41], [42], [43]. In the early years, Yan et al. [44] used a pre-trained deep neural network to initialize the model parameters and proposed a general regression model embedded network to classify different blur types. In [26], Park et al. combined hand-crafted features and deep features extracted from an image patches which contain sparse strong edges for blur amount estimation. However, focal regions with low-contrast and without sufficient structure information still cannot well be differentiated. In addition, due to consecutive convolution and spatial pooling operations during the deep feature extraction process, the fine details of image contents and the boundaries between blur and non-blur regions are lost. Zhao et al. [40] first proposed an end-to-end fully convolutional network with multiple streams in a bottom-top-bottom manner (BTBNet) for defocus blur detection, which can be trained by manually annotated image pairs. In BTBNet, both the low-level details extracted from shallow layers and high-level semantic information extracted from deep layers are aggregated to improve the final detection score maps. In addition, in order to handle the sensitivity of defocus degree to image scales, a multi-stream strategy which uses original image with different scales is leveraged. In order to better utilize the deep features, domain adaptation [45], feature aggregation network [41], feature refining network [43], and cross-ensemble network [42] are also designed to boost defocus blur detection performance.

Although the results of blur detection have been promoted by a big step by using DCNNs, there are still several problems that make the detected results not satisfactory enough for some subsequent tasks. First, the high-level global context information and low-level details cannot combined sufficiently, i.e., the semantic information could lost gradually during the top to bottom feature refining process while the fine details cannot be well preserved during the bottom to top feature refining process [43]. Second, in the fusion step, original deep features are not exploited to guide the fusing process. Third but not last, the redundancy existed in the extracted deep features is not sufficiently suppressed, which could degenerate the feature aggregation. In this paper, we introduce a brand new pixel-wise fully convolutional network for defocus blur detection via global context guided hierarchically residual feature refining, referred as HRFRNet briefly, which is both efficient and effective. Specifically, we design a multi-scale dilation convolution based global context generation (GCG) module to capture the global context information and use it to guide the feature refining process in a hierarchical manner. By considering that the human eye’s perception of defocus is sensitive to different image scales, we propose a deep features guided fusion module (DFGF) to integrate the side outputs of different feature refining stages for generating the final score map. In our network, different levels of features including the low-level appearance details, global context features, and high-level semantic information are aggregated in a hierarchical manner to boost the final detection performance.

To a nutshell, we summarize the contributions of this paper as follows:

•
We introduce a novel HRFRNet for image defocus blur detection, which can effectively aggregate the low-level appearance details, global context features and high-level semantic information.
•
In order to sufficiently exploit the context information for promoting the defocus blur detection results, we design a GCGM to capture the global context information for guiding the deep feature refining process in a hierarchical manner.
•
In order to conquer the human eye’s defocus sensitivity to image scales, we design a DFGF module to integrate the side outputs of different feature refining stages for generating the final score map.
•
Extensive experiments with ablation analysis on two commonly used datasets are carried to validate the superiority of the proposed network when compared with other state-of-the-art methods.

Section snippets

Related work

Defocus blur detection acts as one of the sub-fields of computer vision, and has attracted more and more attention due to its widely practical applications such as image de-blurring, image refocusing, defocus magnification, image quality assessment and multi-focus image fusion. To this end, many kinds of methods designed for defocus blur detection have been proposed during the last few years. In general, previous methods can be roughly categorized into two classes. The first class of methods

Proposed HRFRNet

The proposed HRFRNet takes a natural image with size $W \times H$ as input and generates a score map with the same size which differentiates the defocus and in-focus regions of the image. In Fig. 3, we give the entire architecture of HRFRNet.

As an effective deep defocus blur detection neural network, it should be flexible to exploit deep feature extracted from different layers to boot the final detection results. On one hand, the low-level deep features should be used to refine the scattered and

Datasets

In order to validate the efficacy of our proposed network, we use two public datasets to evaluate its performance, which are as follows:

Shi et al.’s dataset [11]: This is the first dataset collected for blur detection evaluation, which contains 704 images with partially defocus blur and 296 images with motion blur. The images are with manually annotated ground truths. In this work, we use the 704 images with defocus blur for experiments. The first 604 defocus blur images are used for training

Conclusions

In this paper, we present a global context guided hierarchically residual feature refinement network (HRFRNet) for defocus blur detection for an natural image. In our network, we first generate a initial coarse defocus blur score map from the deepest feature maps from the backbone feature extraction network, then we design a hierarchical residual feature refining module (HRFRM) to refine the initial result step by step in a top-down manner. During the refining process, we develop a global

CRediT authorship contribution statement

Yongping Zhai: Conceptualization, Methodology, Writing - original draft. Junhua Wang: Data curation, Software, Writing - original draft. Jinsheng Deng: Visualization, Investigation. Guanghui Yue: Supervision, Writing - review & editing. Wei Zhang: Formal analysis, Validation. Chang Tang: Supervision, Writing - review & editing.

Declaration of Competing Interest

None.

Acknowledgment

This work was partly supported by the National Natural Science Foundation of China (NO. 62076228 and 61701451), and partly by Natural Natural Science Foundation of Hubei Province (NO. 2020CFB644), and partly by Key Laboratory of Information Perception and Systems for Public Security of MIIT (Nanjing University of Science and Technology) (No. 202007). We would also like to thank NVIDIA Corporation for the donation of Titan V and Titan Xp GPU cards used for this research, and we also sincerely

References (68)

R. Wang et al.
Blur image identification with ensemble convolution neural networks
Signal Processing
(2019)
J. Duan et al.
Multifocus image fusion with enhanced linear spectral clustering and fast depth map estimation
Neurocomputing
(2018)
L. Huang et al.
Joint blur kernel estimation and cnn for blind image restoration
Neurocomputing
(2020)
J. Xiao et al.
Multi-focus image fusion based on depth extraction with inhomogeneous diffusion equation
Signal Processing
(2016)
R. Liu et al.
Image partial blur detection and classification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2008)
B. Su et al.
Blurred image region detection and classification
ACM International Conference on Multimedia
(2011)
H. Noh et al.
Learning deconvolution network for semantic segmentation
IEEE International Conference on Computer Vision
(2015)
C. Tian et al.
Attention-guided cnn for image denoising
Neural Networks
(2020)
R. Huang et al.
Multiscale blur detection by learning discriminative deep features
Neurocomputing
(2018)
W. Zhao et al.
Defocus blur detection via multi-stream bottom-top-bottom fully convolutional network
IEEE Conference on Computer Vision and Pattern Recognition
(2018)

X. Yi et al.

Lbp-based segmentation of defocus blur

IEEE Trans. Image Process.

(2016)

S. Zhang et al.

Learning to understand image blur

IEEE Conference on Computer Vision and Pattern Recognition

(2018)

C. Tang et al.

Defocus map estimation from a single image via spectrum contrast

Opt Lett

(2013)

G. Xu et al.

Estimating defocus blur via rank of local patches

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Venice, Italy

(2017)

W. Zhang et al.

Single image focus editing

Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on

(2009)

W. Zhang et al.

Single-image refocusing and defocusing

IEEE Trans. Image Process.

(2012)

P. Jiang et al.

Salient region detection by ufo: Uniqueness, focusness and objectness

Proceedings of the IEEE international conference on computer vision

(2013)

C. Tang et al.

Salient object detection via weighted low rank matrix recovery

IEEE Signal Process Lett

(2016)

S. Bae et al.

Defocus magnification

Comput. Graphics Forum

(2007)

B. Masia et al.

Coded apertures for defocus deblurring

Symposium Iberoamericano de Computacion Grafica

(2011)

J. Shi et al.

Discriminative blur detection features

IEEE Conference on Computer Vision and Pattern Recognition

(2014)

M.T. Kang et al.

Attention-related-functional changes induced by imposed myopia defocus from spectacle lens

Investigative Ophthalmology & Visual Science

(2017)

M. García García et al.

A global approach to describe retinal defocus patterns

PLoS ONE

(2019)

D. Gai et al.

Multi-focus image fusion method based on two stage of convolutional neural network

Signal Processing

(2020)

S. Zhuo et al.

Defocus map estimation from a single image

Pattern Recognit

(2011)

C.T. Vu et al.

$s_{3}$ : A spectral and spatial measure of local perceived sharpness in natural images

IEEE Trans. Image Process.

(2012)

Y. Zhang et al.

Blur processing using double discrete wavelet transform

IEEE Conference on Computer Vision and Pattern Recognition

(2013)

X. Zhu et al.

Estimating spatially varying defocus blur from a single image.

IEEE Trans. Image Process.

(2013)

Y. Pang et al.

Classifying discriminative features for blur detection

IEEE Trans Cybern

(2015)

C. Tang et al.

A spectral and spatial approach of coarse-to-fine blurred image region detection

IEEE Signal Process Lett

(2016)

E. Saad et al.

Defocus blur-invariant scale-space feature extractions

IEEE Trans. Image Process.

(2016)

J. Park et al.

A unified approach of multi-scale deep and hand-crafted features for defocus estimation

IEEE Conference on Computer Vision and Pattern Recognition

(2017)

A. Krizhevsky et al.

Imagenet classification with deep convolutional neural networks

NIPS

(2012)

P. Li et al.

Deep visual tracking: review and experimental comparison

Pattern Recognit

(2018)

Cited by (7)

Contrastive Learning based Multi-task Network for Image Manipulation Detection
2022, Signal Processing
Citation Excerpt :
According to whether the semantic information of an image has been changed or not, the image forgery techniques can be broadly divided into two categories [1]: (1) semantic-preserving, and (2) semantic-changing. Semantic-preserving manipulations (e.g., blurring [2,3], compression [4,5], and brightness/contrast change) barely alter any content of the image, but these operations are often used as post-processing to undermine manipulation traces and potentially hide the discrepancy between the tampered and non-tampered areas. On the other hand, semantic-changing manipulations (e.g., splicing [6], copy-move [7], and removal [8]) create disinformation and modify the original semantic of image, thus deceiving and misleading others.
The popularity of image editing techniques and user-friendly editing software have seriously reduced the authenticity of the images. Detection and localization of image manipulations are becoming urgent problems to be solved. Although many existing solutions attempt to address these problems, most works can only solve one specific type of manipulations. Furthermore, some methods need heavy, time-consuming preprocessings and/or postprocessings to localize tampered region, resulting in disconnection and under-optimization of the model. In this paper, a contrastive learning based multi-task network is proposed for the localization of multiple image manipulations. Multi-scale tampered patch classifications and pixel-wise tampered region semantic segmentation are integrated into an end-to-end multi-task network. The consistency of different region statistical properties is measured by contrastive learning to enhance the feature representation ability of the proposed network, improving the performance of tampered patch detection. Various scale tampered patch detections cooperate to localize the tampered region boundaries from coarse to fine. Prediction Pyramid composed of different scale patch detection results provides comprehensive guidance for pixel-wise semantic segmentation of the tampered region. Experimental results on four standard image manipulation datasets demonstrate the better performance of the proposed model.
GCNet: Grid-like context-aware network for RGB-thermal semantic segmentation
2022, Neurocomputing
Citation Excerpt :
In the second stage, the initial RGB features, semantic segmentation labels Gt, and initial thermal features are fed to the network simultaneously. Unlike the methods in [42,46], where the ASPP [47] structure is applied, the GCM consists of several convolutional blocks, with multi-scale receptive fields, connected to a grid. As shown in Fig. 4, the skip connections between every pair of parallel branches increases the pixel sampling density.
Semantic segmentation methods can achieve satisfactory performance under poor lighting conditions by exploiting the complementary cues in RGB and thermal images. However, most methods employ straightforward fusion strategies, which may insufficiently explore complementary information and ignore cross-level information propagation, except spatial information. Further, high-level contextual information may be inadequately enhanced owing to the use of simple perceptive modules. To address these limitations, we introduce a grid-like context-aware network (GCNet) for the semantic segmentation of RGB-thermal images. We use a hybrid fusion module to integrate the complementary information across modalities while considering the propagation of fusion cues by incorporating previously fused features. Considering the significance of contextual cues in semantic segmentation, a grid-like context-aware module is designed to capture rich contextual information. A three-branch discriminator is constructed to evaluate the generated prediction maps and improve the performances of the parsing results. Experiments were performed on two RGB-thermal datasets, and the results show that the proposed network achieves state-of-the-art performance.
Efficient image blur detection via hierarchical edge guidance and region complementation
2023, Complex and Intelligent Systems
Single image defocus map estimation through patch blurriness classification and its applications
2023, Visual Computer
Analysis of Image Matting Techniques Used for Generating Alpha-Matte
2023, Proceedings of the 2023 12th International Conference on System Modeling and Advancement in Research Trends, SMART 2023
SINGLE IMAGE DEEP DEFOCUS ESTIMATION AND ITS APPLICATIONS
2021, arXiv

View all citing articles on Scopus

View full text

Research paperGlobal context guided hierarchically residual feature refinement network for defocus blur detection

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed HRFRNet

Datasets

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Signal Processing

Neurocomputing

Neurocomputing

Signal Processing

Neural Networks

Neurocomputing

IEEE Trans. Image Process.

Defocus map estimation from a single image via spectrum contrast

Opt Lett

Estimating defocus blur via rank of local patches

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Venice, Italy

Single image focus editing

Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on

Single-image refocusing and defocusing

IEEE Trans. Image Process.

Salient region detection by ufo: Uniqueness, focusness and objectness

Proceedings of the IEEE international conference on computer vision

Salient object detection via weighted low rank matrix recovery

IEEE Signal Process Lett

Defocus magnification

Comput. Graphics Forum

Coded apertures for defocus deblurring

Symposium Iberoamericano de Computacion Grafica

Discriminative blur detection features

IEEE Conference on Computer Vision and Pattern Recognition

Attention-related-functional changes induced by imposed myopia defocus from spectacle lens

Investigative Ophthalmology & Visual Science

A global approach to describe retinal defocus patterns

PLoS ONE

Multi-focus image fusion method based on two stage of convolutional neural network

Signal Processing

Defocus map estimation from a single image

Pattern Recognit

s3: A spectral and spatial measure of local perceived sharpness in natural images

IEEE Trans. Image Process.

Blur processing using double discrete wavelet transform

IEEE Conference on Computer Vision and Pattern Recognition

Estimating spatially varying defocus blur from a single image.

IEEE Trans. Image Process.

Classifying discriminative features for blur detection

IEEE Trans Cybern

A spectral and spatial approach of coarse-to-fine blurred image region detection

IEEE Signal Process Lett

Defocus blur-invariant scale-space feature extractions

IEEE Trans. Image Process.

A unified approach of multi-scale deep and hand-crafted features for defocus estimation

IEEE Conference on Computer Vision and Pattern Recognition

Imagenet classification with deep convolutional neural networks

NIPS

Deep visual tracking: review and experimental comparison

Pattern Recognit

Research paper
Global context guided hierarchically residual feature refinement network for defocus blur detection

$s_{3}$ : A spectral and spatial measure of local perceived sharpness in natural images