Elsevier

Knowledge-Based Systems

Volume 194, 22 April 2020, 105540
Knowledge-Based Systems

An attention-guided and prior-embedded approach with multi-task learning for shadow detection

https://doi.org/10.1016/j.knosys.2020.105540Get rights and content

Abstract

Shadow detection is a fundamental and challenging task, requiring understanding accurately the visual semantic context of the shadow region and backgrounds. In this paper, we propose an attention-guided and prior-embedded approach with multi-task learning for shadow detection task. Different from most existing works, we introduce the effective multi-task learning into this target detection task to add the high-level prior into the detection process, instead of using the pertained weighting network as the front-end module and complex recurrent network. Especially, we also employ a channel attention-guided module to complement the high-level feature and low-level feature. Moreover, for the proposed approach with multi-task learning, we design the weighted loss function for effective training. Experimental results on two public available benchmarks demonstrate our approach achieves competitive results than the existing typical shadow detection approaches.

Introduction

Shadow is a common phenomenon generated by object blocking light in nature. And in daily life, shadow basically exists in the single image and video sequence obtained from the acquisition equipment [1], [2]. The existence of the shadow causes the interference on the useful target information, decreases the quality of the image, and brings up many difficulties to the related tasks, including the optical measurement [3] and image understanding [4]. Therefore, detecting the shadow accurately is significant for many computer vision tasks [5], [6], [7].

Shadow detection can be divided into shadow detection in videos and shadow detection in still image. The shadow detection in videos mainly detects the shadow region of the moving object in the video frame, such as the shadow of the pedestrian and vehicle in the surveillance scenes [8], [9]. Due to abundant information in the video, shadow detection in videos can take advantages of information difference between each frame in video sequences [10]. Comparing to detecting the shadow in videos, shadow detection in still images may not use enough information. Shadow is a dark region with unfixed shape formed by blocking light of objects, and would be affected by object shape and light angle. Moreover, due to different light intensity in the environment, the brightness of the shadow area is not the same [11]. Thus learning to detect the shadow region in still images is difficult and challenging. The shadow detection in this paper would refer to shadow detection in still images and the work in this paper is developing a robust and effective shadow detection approach.

For the shadow detection, in terms of the detection procedure, the existing work for shadow detection is classified into four categories: the intrinsic image-based approach, the model-based approach, the manual feature-based approach and the convolutional neural network (CNN)-based approach [12]. Finlayson et al. [13] proposed the intrinsic image theory, which used the entropy minimization method to calculate the gray-scale invariant image for shadow detection and removal, on the basis of uniform illumination assumption. And their approach processed the low-resolution image, the detection performance is restricted. For the model-based approach, Tian et al. [14] proposed the tricolor attenuation model to describe the attenuated relations between the shadow and background and their approach can detect the shadow in real images with complicated background without prior knowledge, but its accuracy was severely affected by the illumination intensity, which needs calculate with time. Salvador et al. [15] proposed the existence of some common illumination-insensitivity space and utilized it for shadow detection. The intrinsic image-based approach and the model-based approach mainly require high-quality images and are restricted in the application scenario. The manual feature-based approach refers that the feature related to the shadow is designed manually and based on handcrafting feature, a classifier is trained to detect shadow [16]. Zhu et al. [17] proposed a shadow detection approach for gray images, which extracted the brightness feature, textural feature, local maximum feature, then put them into the classifier, and combined the result with conditional random field, generating the final detection result. The manual feature-based approach mainly extracts the low-order information of the image and ignores the semantic information of the image, resulting to the inaccurate detection result in the complex scenes. With the rapid development and application of CNN, remarkable progress has occurred in the shadow detection task due to its mighty feature extraction ability [18]. Tomas et al. [19] proposed to train two stacked CNN structure on the large-scale dataset with noisily-annotated and to combine the semantic information of the image to detect the shadow region. Nguyen et al. [20] designed an extended conditional generative adversarial network to model the high-level relation and global scene feature. Hosseinzade et al. [21] proposed to combine the support vector machine and CNN to detect the shadow, which is also called patched-CNN. The existing CNN-based approaches mainly employed the pre-trained network, complex structure and training experiences. Moreover, they ignored the high-level prior information of the image, which would be beneficial for detection task. At present, multi-task learning [22], [23] has gained the remarkable progress and been widely applied in many research fields, especially in the signal processing [24], [25]. For example, the method in [26] proposed to exploit inter-subject information for constructing an efficient classification model in motor imagery-based brain–computer interface applications; the method in [27] combined individual functional connectivity information with the group sparse representation-based network construction framework to realize higher between-group separability while maintaining the merit of within-group consistency; the method [28] designed a novel framework simultaneously extracting common and individual feature exploit the linked nature of data. Motivation by the multi-task learning, we propose a novel attention-guided and prior-embedded approach based on multi-task learning for shadow detection task. For the multi-task learning in the proposed approach, the classification sub-network is designed to make the proposed approach more robust for various scenes in consideration of the imbalanced issue that the shadow pixel number in single image varies severely among the different scenes, inspired by the discussions about the imbalanced issue in [29], [30], [31]. The work in this paper is distinguished with the two-fold following contributions.

  • (1)

    We introduce the multi-task learning into the shadow detection task. That is, we design a cascaded network for shadow detection: the classification sub-network is designed to extract the high-level prior into the detection procedure, and the main network is designed to generate the shadow detection result. The proposed approach in this paper has a competitive result with the existing complex method and also introduces the novel insight for the research on shadow detection and the similar low-level computer vision tasks. The proposed cascaded network is easy to implement and train, whose training fashion is end-to-end.

  • (2)

    We design the channel attention-guided module in the main network for shadow detection task. The designed channel attention-guided module combines the deep semantic feature with shallow high-resolution feature with an attention weighting way. This module could preserve the shallow feature with visual details about the image effectively and thus enhance the final detection performance.

Section snippets

The proposed approach

In this section, we first describe the fundamental idea of the proposed approach briefly and the general network architecture. Then we introduce the shadow scale classification sub-network and the shadow detection map generation. Lastly, we explain some details of the approach.

Experiments and results

This paper first reports the performance of the proposed approach on the two publicly available benchmarks about the shadow detection task: UCF [17] and SBU [19] dataset, and then conducts the ablations study about the designed structure of proposed approach for shadow detection.

SBU is the largest public shadow dataset with 4727 images, 4089 images of which are for training and the other images are for testing. UCF contains 355 images with manually labeled pixel-based ground truth. The proposed

Conclusion

In this paper, we propose an attention-guided and prior-embedded approach with multi-task learning for shadow detection. Unlike the existing shadow detection approaches, the proposed approach adopts the multi-task learning way to accomplish the shadow detection task. The cascaded network in the proposed approach extracts the high-level prior from the shadow scale classification sub-network into the shadow detection process and sets a channel attention-guided module in the main network to

CRediT authorship contribution statement

Shihui Zhang: Funding acquistion, Conceptualization, Investigation, Methodology, Writing - original draft, Writing - review & editing. He Li: Data curation, Investigation, Methodology, Writing - original draft, Writing - review & editing. Weihang Kong: Project administration, Funding acquistion, Conceptualization, Investigation, Writing - original draft, Writing - review & editing. Xiaowei Zhang: Formal analysis, Resources, Validation. Weidong Ren: Formal analysis, Software, Validation.

Acknowledgments

This work was supported partly by the National Natural Science Foundation of China (No. 61379065), the Natural Science Foundation of Hebei province in China (No. F2019203285; No. F2019203526), the Project funded by China Postdoctoral Science Foundation (No. 2018M631763), Yanshan University, China Doctoral Foundation (BL18010), and Science and Technology Research & Development Program of Qinhuangdao City, China (No. 201902A215). The authors thank the editors and anonymous reviewers for their

References (35)

  • PratiA. et al.

    Detecting moving shadows: Algorithms and evaluation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • DuanZ.G. et al.

    Outdoor illumination shadow detection based on orthogonal decomposition

    Acta Opt. Sin.

    (2016)
  • MaxwellB.A. et al.

    A bi-illuminant dichromatic reflection model for understanding images

  • SaravanakumarS. et al.

    Multiple human object tracking using background subtraction and shadow removal techniques

  • BrahamM. et al.

    Semantic background subtraction

  • BarcellosP. et al.

    Shadow detection in camera-based vehicle detection: Survey and analysis

    J. Electron. Imaging

    (2016)
  • CoulonR. et al.

    Shadow-shielding compensation for moving sources detection

    IEEE Trans. Nucl. Sci.

    (2017)
  • Cited by (6)

    • TSASNet: Tooth segmentation on dental panoramic X-ray images by Two-Stage Attention Segmentation Network

      2020, Knowledge-Based Systems
      Citation Excerpt :

      3) Different patients have various teeth statuses and some dental instruments such as dental implants, root canal and metal rack are obstacles of accurate tooth segmentation, as shown as in Fig. 1. Inspired by the great success of the attention models on various computer vision tasks [13–17] and the two-stage strategy on the object detection [18], in this paper we propose a two-stage segmentation strategy to handle the great challenges suffered in various tooth segmentation scenarios on low-contrast dental panoramic X-ray images. We first adopt global and local attention modules to roughly localize the dental regions in the first stage and then use a fully convolutional network to further search for the exact dental region in the second stage.

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105540.

    View full text