An attention-guided and prior-embedded approach with multi-task learning for shadow detection

doi:10.1016/j.knosys.2020.105540

Knowledge-Based Systems

Volume 194, 22 April 2020, 105540

https://doi.org/10.1016/j.knosys.2020.105540 Get rights and content

Abstract

Shadow detection is a fundamental and challenging task, requiring understanding accurately the visual semantic context of the shadow region and backgrounds. In this paper, we propose an attention-guided and prior-embedded approach with multi-task learning for shadow detection task. Different from most existing works, we introduce the effective multi-task learning into this target detection task to add the high-level prior into the detection process, instead of using the pertained weighting network as the front-end module and complex recurrent network. Especially, we also employ a channel attention-guided module to complement the high-level feature and low-level feature. Moreover, for the proposed approach with multi-task learning, we design the weighted loss function for effective training. Experimental results on two public available benchmarks demonstrate our approach achieves competitive results than the existing typical shadow detection approaches.

Introduction

Shadow is a common phenomenon generated by object blocking light in nature. And in daily life, shadow basically exists in the single image and video sequence obtained from the acquisition equipment [1], [2]. The existence of the shadow causes the interference on the useful target information, decreases the quality of the image, and brings up many difficulties to the related tasks, including the optical measurement [3] and image understanding [4]. Therefore, detecting the shadow accurately is significant for many computer vision tasks [5], [6], [7].

Shadow detection can be divided into shadow detection in videos and shadow detection in still image. The shadow detection in videos mainly detects the shadow region of the moving object in the video frame, such as the shadow of the pedestrian and vehicle in the surveillance scenes [8], [9]. Due to abundant information in the video, shadow detection in videos can take advantages of information difference between each frame in video sequences [10]. Comparing to detecting the shadow in videos, shadow detection in still images may not use enough information. Shadow is a dark region with unfixed shape formed by blocking light of objects, and would be affected by object shape and light angle. Moreover, due to different light intensity in the environment, the brightness of the shadow area is not the same [11]. Thus learning to detect the shadow region in still images is difficult and challenging. The shadow detection in this paper would refer to shadow detection in still images and the work in this paper is developing a robust and effective shadow detection approach.

For the shadow detection, in terms of the detection procedure, the existing work for shadow detection is classified into four categories: the intrinsic image-based approach, the model-based approach, the manual feature-based approach and the convolutional neural network (CNN)-based approach [12]. Finlayson et al. [13] proposed the intrinsic image theory, which used the entropy minimization method to calculate the gray-scale invariant image for shadow detection and removal, on the basis of uniform illumination assumption. And their approach processed the low-resolution image, the detection performance is restricted. For the model-based approach, Tian et al. [14] proposed the tricolor attenuation model to describe the attenuated relations between the shadow and background and their approach can detect the shadow in real images with complicated background without prior knowledge, but its accuracy was severely affected by the illumination intensity, which needs calculate with time. Salvador et al. [15] proposed the existence of some common illumination-insensitivity space and utilized it for shadow detection. The intrinsic image-based approach and the model-based approach mainly require high-quality images and are restricted in the application scenario. The manual feature-based approach refers that the feature related to the shadow is designed manually and based on handcrafting feature, a classifier is trained to detect shadow [16]. Zhu et al. [17] proposed a shadow detection approach for gray images, which extracted the brightness feature, textural feature, local maximum feature, then put them into the classifier, and combined the result with conditional random field, generating the final detection result. The manual feature-based approach mainly extracts the low-order information of the image and ignores the semantic information of the image, resulting to the inaccurate detection result in the complex scenes. With the rapid development and application of CNN, remarkable progress has occurred in the shadow detection task due to its mighty feature extraction ability [18]. Tomas et al. [19] proposed to train two stacked CNN structure on the large-scale dataset with noisily-annotated and to combine the semantic information of the image to detect the shadow region. Nguyen et al. [20] designed an extended conditional generative adversarial network to model the high-level relation and global scene feature. Hosseinzade et al. [21] proposed to combine the support vector machine and CNN to detect the shadow, which is also called patched-CNN. The existing CNN-based approaches mainly employed the pre-trained network, complex structure and training experiences. Moreover, they ignored the high-level prior information of the image, which would be beneficial for detection task. At present, multi-task learning [22], [23] has gained the remarkable progress and been widely applied in many research fields, especially in the signal processing [24], [25]. For example, the method in [26] proposed to exploit inter-subject information for constructing an efficient classification model in motor imagery-based brain–computer interface applications; the method in [27] combined individual functional connectivity information with the group sparse representation-based network construction framework to realize higher between-group separability while maintaining the merit of within-group consistency; the method [28] designed a novel framework simultaneously extracting common and individual feature exploit the linked nature of data. Motivation by the multi-task learning, we propose a novel attention-guided and prior-embedded approach based on multi-task learning for shadow detection task. For the multi-task learning in the proposed approach, the classification sub-network is designed to make the proposed approach more robust for various scenes in consideration of the imbalanced issue that the shadow pixel number in single image varies severely among the different scenes, inspired by the discussions about the imbalanced issue in [29], [30], [31]. The work in this paper is distinguished with the two-fold following contributions.

(1)
We introduce the multi-task learning into the shadow detection task. That is, we design a cascaded network for shadow detection: the classification sub-network is designed to extract the high-level prior into the detection procedure, and the main network is designed to generate the shadow detection result. The proposed approach in this paper has a competitive result with the existing complex method and also introduces the novel insight for the research on shadow detection and the similar low-level computer vision tasks. The proposed cascaded network is easy to implement and train, whose training fashion is end-to-end.
(2)
We design the channel attention-guided module in the main network for shadow detection task. The designed channel attention-guided module combines the deep semantic feature with shallow high-resolution feature with an attention weighting way. This module could preserve the shallow feature with visual details about the image effectively and thus enhance the final detection performance.

Section snippets

The proposed approach

In this section, we first describe the fundamental idea of the proposed approach briefly and the general network architecture. Then we introduce the shadow scale classification sub-network and the shadow detection map generation. Lastly, we explain some details of the approach.

Experiments and results

This paper first reports the performance of the proposed approach on the two publicly available benchmarks about the shadow detection task: UCF [17] and SBU [19] dataset, and then conducts the ablations study about the designed structure of proposed approach for shadow detection.

SBU is the largest public shadow dataset with 4727 images, 4089 images of which are for training and the other images are for testing. UCF contains 355 images with manually labeled pixel-based ground truth. The proposed

Conclusion

In this paper, we propose an attention-guided and prior-embedded approach with multi-task learning for shadow detection. Unlike the existing shadow detection approaches, the proposed approach adopts the multi-task learning way to accomplish the shadow detection task. The cascaded network in the proposed approach extracts the high-level prior from the shadow scale classification sub-network into the shadow detection process and sets a channel attention-guided module in the main network to

CRediT authorship contribution statement

Shihui Zhang: Funding acquistion, Conceptualization, Investigation, Methodology, Writing - original draft, Writing - review & editing. He Li: Data curation, Investigation, Methodology, Writing - original draft, Writing - review & editing. Weihang Kong: Project administration, Funding acquistion, Conceptualization, Investigation, Writing - original draft, Writing - review & editing. Xiaowei Zhang: Formal analysis, Resources, Validation. Weidong Ren: Formal analysis, Software, Validation.

Acknowledgments

This work was supported partly by the National Natural Science Foundation of China (No. 61379065), the Natural Science Foundation of Hebei province in China (No. F2019203285; No. F2019203526), the Project funded by China Postdoctoral Science Foundation (No. 2018M631763), Yanshan University, China Doctoral Foundation (BL18010), and Science and Technology Research & Development Program of Qinhuangdao City, China (No. 201902A215). The authors thank the editors and anonymous reviewers for their

References (35)

Al-NajdawiN. et al.
A survey of cast shadow detection algorithms
Pattern Recognit. Lett.
(2012)
WangC. et al.
Object-based change detection method for high-resolution remote sensing image combining shadow compensation and multi-scale fusion
J. Commun.
(2018)
WangJ.Q. et al.
Shadow extraction and application in pedestrian detection
Eurasip J. Image Video Process.
(2014)
SaninA. et al.
Shadow detection: A survey and comparative evaluation of recent methods
Pattern Recognit.
(2012)
SalvadorE. et al.
Cast shadow segmentation using invariant color features
Comput. Vis. Image Underst.
(2004)
ZhangY. et al.
A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE
Knowl.-Based Syst.
(2019)
WangH. et al.
A study of graph-based system for multi-view clustering
Knowl.-Based Syst.
(2019)
ZhangY. et al.
Strength and similarity guided group-level brain functional network construction for MCI diagnosis
Pattern Recognit.
(2019)
ZhouF.N. et al.
Deep learning fault diagnosis method based on global optimization gan for unbalanced data
Knowl.-Based Syst.
(2020)
DengT.Q. et al.
Low-rank local tangent space embedding for subspace clustering
Inform. Sci.
(2020)

PratiA. et al.

Detecting moving shadows: Algorithms and evaluation

IEEE Trans. Pattern Anal. Mach. Intell.

(2003)

DuanZ.G. et al.

Outdoor illumination shadow detection based on orthogonal decomposition

Acta Opt. Sin.

(2016)

MaxwellB.A. et al.

A bi-illuminant dichromatic reflection model for understanding images

SaravanakumarS. et al.

Multiple human object tracking using background subtraction and shadow removal techniques

BrahamM. et al.

Semantic background subtraction

BarcellosP. et al.

Shadow detection in camera-based vehicle detection: Survey and analysis

J. Electron. Imaging

(2016)

CoulonR. et al.

Shadow-shielding compensation for moving sources detection

IEEE Trans. Nucl. Sci.

(2017)

Cited by (6)

Occlusion-aware particle size distribution detection of gravel material based on the improved Bilayer Convolutional Network
2023, Construction and Building Materials
Particle size distribution (PSD) detection is an important part of the construction process, which may delay subsequent construction if it takes too much time. Image-based methods have been proposed for effective and inexpensive PSD detection based on digital image processing (DIP) or deep learning. However, current image-based PSD detection methods mostly ignore the occluded regions of gravel particles, which causes errors in the detection process. Hence, this study proposed a novel occlusion-aware PSD detection method, which adapted an amodal instance segmentation model, improved Bilayer Convolutional Network (BCNet) with an Optimized Feature Pyramid Network (FPN), and a backbone embedding with global context network (GCNet) module, for occlusion-aware particle image segmentation. The improved BCNet has achieved 6.7% AP₅₀ and 6% AP₇₅ improvement compared to the original BCNet. For the results of PSD analysis, the maximum and average absolute errors of the mass fraction of various particle size intervals were 3.38% and 1.27%, respectively, less than 4.65% and 2.05% of the way only considering visible regions of particles, which indicates the proposed method can reduce the error of image-based PSD detection method only considering visible regions of particles. Furthermore, the detection time of the proposed method is less than 120 s which is more rapid than mechanical sieving.
Exploring better target for shadow detection
2023, Knowledge-Based Systems
Shadow detection aims to identify shadow regions from images, which plays a significant role in scene understanding. Existing approaches tend to ignore the annotation noises in ground truths, which will be overfitted in the later training phase and potentially degrade detection performance. To alleviate the impact of such noisy labels, this work proposes a framework for robust shadow detection (RSD) by locating and correcting them. Specifically, we first introduce a noise-rate blind sample selection scheme based on the prediction-level stability to identify the reliable parts from all pixel-level samples. Next, we design a label correction strategy based on the graph convolutional network, which can propagate the label information between reliable and unreliable parts. Finally, we enable subsequent robust learning by using a new training target with fewer noisy labels for each image. Experimental results on public benchmarks (i.e., SBU, ISTD, UCF and CUHK-Shadow) show that our method can be favorable against SOTAs. Our source code is available at https://github.com/wuwen1994/RSD.
SEAT-YOLO: A squeeze-excite and spatial attentive you only look once architecture for shadow detection
2023, Optik
Recently, several deep learning-based architectures have been utilized for the recognition of shadow regions in images and videos with the aim of classification and segmentation of shadow instances. The present work aims towards solving the problem of shadow recognition as a detection and regression problem. To solve the problem of shadow detection in images and videos, we have developed SEAT-YOLO, a deep learning-based shadow detection architecture. The backbone network of the proposed architecture is based on convolution layers, squeeze-excite blocks, spatial attention module, and spatial pyramid pooling layer and the detection network utilizes YOLO detection heads. The proposed SEAT-YOLO architecture is trained and tested on the publicly available SBU shadow detection dataset. For shadow detection on the SBU dataset, it achieved a mAP value of 59.53 %. Furthermore, the proposed SEAT-YOLO is capable of detecting small shadow regions with high accuracy which is a challenge with most deep learning-based detection architectures. Moreover, the other advantage of the proposed SEAT-YOLO is its ability to detect multiple shadow regions in a single image by drawing precise bounding boxes for overlapping shadows. The proposed SEAT-YOLO architecture has high practical implications in driverless vehicles for detection of shadow regions and take effective driving decisions.
TSASNet: Tooth segmentation on dental panoramic X-ray images by Two-Stage Attention Segmentation Network
2020, Knowledge-Based Systems
Citation Excerpt :
3) Different patients have various teeth statuses and some dental instruments such as dental implants, root canal and metal rack are obstacles of accurate tooth segmentation, as shown as in Fig. 1. Inspired by the great success of the attention models on various computer vision tasks [13–17] and the two-stage strategy on the object detection [18], in this paper we propose a two-stage segmentation strategy to handle the great challenges suffered in various tooth segmentation scenarios on low-contrast dental panoramic X-ray images. We first adopt global and local attention modules to roughly localize the dental regions in the first stage and then use a fully convolutional network to further search for the exact dental region in the second stage.
Tooth segmentation acts as a crucial and fundamental role in dentistry for doctors to make diagnosis and treatment plans. In this paper, we propose a Two-Stage Attention Segmentation Network (TSASNet) on dental panoramic X-ray images to address the issues suffered in the tooth boundary and tooth root segmentation task which are caused by the low contrast and uneven intensity distribution. We firstly adopt an attention model which is embedded with global and local attention modules to roughly localize the tooth region in the first stage. Without any interactive operator, the attention model so constructed can automatically aggregate pixel-wise contextual information and identify coarse tooth boundaries. To better obtain final boundary information, we use a fully convolutional network as the second stage to further segment the real tooth area from the attention maps obtained from the first stage. The effectiveness of TSASNet is substantiated on the benchmark dataset containing 1,500 dental panoramic X-ray images, our proposed method achieves 96.94% of accuracy, 92.72% of dice and 93.77% of recall, significantly superior to the current state-of-the-art methods.
Accurate and Reliable Service Recommendation Based on Bilateral Perception in Multi-Access Edge Computing
2023, IEEE Transactions on Services Computing
Learning to Detect Shadows from Synthetic Data with Domain Adaption
2022, SSRN

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105540.

View full text

An attention-guided and prior-embedded approach with multi-task learning for shadow detection☆

Abstract

Introduction

Section snippets

The proposed approach

Experiments and results

Conclusion

CRediT authorship contribution statement

Acknowledgments

Pattern Recognit. Lett.

J. Commun.

Eurasip J. Image Video Process.

Pattern Recognit.

Comput. Vis. Image Underst.

Knowl.-Based Syst.

Knowl.-Based Syst.

Pattern Recognit.

Knowl.-Based Syst.

Inform. Sci.

Detecting moving shadows: Algorithms and evaluation

IEEE Trans. Pattern Anal. Mach. Intell.

Outdoor illumination shadow detection based on orthogonal decomposition

Acta Opt. Sin.

A bi-illuminant dichromatic reflection model for understanding images

Multiple human object tracking using background subtraction and shadow removal techniques

Semantic background subtraction

Shadow detection in camera-based vehicle detection: Survey and analysis

J. Electron. Imaging

Shadow-shielding compensation for moving sources detection

IEEE Trans. Nucl. Sci.