Detection and tracking of infrared small target by jointly using SSD and pipeline filter

https://doi.org/10.1016/j.dsp.2020.102949Get rights and content

Abstract

Infrared imaging has been an efficient anti-drone approach due to its low-cost, anti-interference and all-weather working characteristics. However, the detection of Unmanned Aerial Vehicle (UAV) through infrared camera is still a challenging issue because infrared targets in the field-of-view are usually small and lack of shape and texture features. In this paper, we propose an infrared small target detection and tracking method based on deep learning. We improve the network architecture of Single Shot MultiBox Detector (SSD) for infrared small target detection, called Single Shot MultiBox Detector for Small Target (SSD-ST), by dropping low-resolution layers and enhance high-resolution layer. In addition, in order to further reduce the false alarm rate and improve the precision, we also design an Adaptive Pipeline Filter (APF) based on the temporal correlation and motion information to correct the detection results. We have evaluated our method over a dataset with 16177 infrared images and 30 trajectories. The results show our method is more robust than traditional methods in complex scenes, and achieve a recall rate higher than 90% and a precision higher than 95%, which prove that our method can well complete the detection and tracking task of infrared small targets.

Introduction

With the development of UAV technology, UAV has been widely used in both military and civil fields. In the military field, UAV has the characteristics of good concealment and strong survivability, which make UAV become increasingly important in modern warfare. In the civil field, due to the advantages of high efficiency and low cost, UAV is widely used in aerial photography, land monitoring and express delivery, etc [1]. However, while UAV technology brings convenience, it also brings a series of problems such as illegal invasion and interference of civil aviation, which pose a great threat to public safety and territorial security [2]. In order to solve these problems, variety of anti-UAV systems have been developed around the world. One of the most important technologies in the anti-UAV system is target monitoring and tracking. Nowadays, radar, infrared imaging and visible light imaging have all been tested for anti-UAV applications [3]. Among these technologies, infrared imaging has attracted more and more attention due to its advantages of low price, anti-interference, and all-weather working characteristic.

However, affected by factors such as the working environment and resolution issue, the UAV target in an infrared image is often very small. In extreme cases, the infrared target can be only a bright point. Therefore, the structure and texture information contained in the infrared image for target detection is scarce, which leads to the invalidation of detection algorithms based on the characteristics of the target structure. In addition, infrared small targets are also susceptible to atmospheric cloud radiation and imaging noise [4], which result in relatively low signal to noise ratio (SNR). All these reasons make it very difficult to detect and track infrared small targets.

To solve this challenging issue, variety of infrared small target detection algorithms have been proposed in the past few decades. These algorithms can be divided into two categories: sequential detection methods and single-frame detection methods. Sequential detection methods mainly use the temporal correlation and motion information of the target for multi-frame joint detection. Qu et al. [5] proposed the Discontinuous Frame Difference to get rid of most stationary pixels and then applied Optical Flow Algorithm to detect the moving target. Based on wavelet packet transform and kurtosis, Wu et al. [6] proposed a new de-noising method and applied it to detect weak and moving point target from image sequences. In general, sequential detection methods are less affected by the environment and can be applied to detection with high SNR. However, when the target moves slowly, the detection performance of such methods is often poor. Therefore, the existing algorithms are mainly based on single-frame detection.

Conventional single-frame detection methods mainly use a series of filters to achieve the detection. Legacy target detection algorithms based on spatial filter, such as the median subtraction filter [7], max mean and max-median filter [8] and the morphological top-hat filter [9] rely much on manually designed parameters, so their robustness and accuracy is often poor. Inspired by the human visual system (HVS), many contrast-based algorithms have been proposed in recent years. Kim et al. [10] proposed a contrast mechanism based algorithm to achieve target enhancement and background clutter suppression by tuning and maximizing the signal-to-clutter ratio (TMSCR) in Laplacian scale-space. Shao et al. [11] make use of Kim's method to increase image contrast and then use the morphological method to further eliminate residual clutter. Chen [12] et al. proposed a two-stage Local Contrast Measure (LCM) to measure the dissimilarity between the current location and its neighborhoods. They first use LCM to get the local contrast map of the input image, and then segment the target through an adaptive threshold. Based on LCM, a series of improved infrared target detection algorithms have been developed, such as ILCM [13], NLCM [14] and WLCM [15]. Contrast-based detection algorithms can improve detection accuracy, compared with the algorithms based on spatial filter. However, when the background becomes complex, the performance of contrast-based detection algorithms decreases significantly.

In recent years, with the development of deep learning, a series of high-performance deep learning-based object detection algorithms have been proposed in the field of visible light target detection. Among these algorithms, the most famous algorithms are the two-stage schemes represented by R-CNN (Regions with Convolutional Neural Network features) [16], Fast R-CNN [17] and Faster R-CNN [18], and the one-stage scheme represented by SSD [19] and YOLO (You Only Look Once) [20], [21], [22]. Inspired by these algorithms, deep learning has been tried in the field of infrared target detection. Du at al. [23] proposed a two-stage infrared target detection algorithm. They first used a Convolutional Neural Network (CNN) to extract features, and then used a support vector machine (SVM) to complete the classification to achieve infrared target detection. Sommer et al. [24] also proposed a two-stage infrared target detection algorithm based on deep learning. They first used the Region Proposal Network (RPN) proposed in Faster R-CNN to generate a series of candidate regions, and then sent these candidate regions to a CNN to classify and determine whether they are real targets. In addition, infrared target detection based on CNN has also been used in [25], [26], [27]. There are also some works using improved methods based on YOLO to detect small targets, such as [28], [29]. In general, compared with traditional single-frame detection algorithms, deep learning-based detection algorithms are more adaptive and more accurate. However, when the size of the target becomes smaller, the false alarm rate of these algorithms is still high.

In this paper, we proposed a novel infrared small target detection and tracking method based on deep learning. Our method includes two stages: single-frame detection and multi-frame filtering. In the single frame detection stage, we proposed an detection algorithm called Single Shot Detector for Small Target (SSD-ST) by dropping the deep low-resolution detection layer and further mining the shallow high-resolution feature layer in SSD to adapt to the detection task of infrared small targets. In the multi-frame filtering stage, we proposed an Adaptive Pipeline Filter (APF) based on the temporal correlation and motion information to correct the detector's detection results and reduce the false detection. We have evaluated our method over a dataset with 16177 infrared images and 30 trajectories. The results show that our method is more robust than traditional methods in complex scenes, and can achieve a recall rate of 90% and a precision of 95%, which prove that our method can well complete the detection and tracking task of infrared small targets. The main contributions of this paper can be summarized as follows:

  • 1)

    A new object detection algorithm called Single Shot MultiBox Detector for Small Target (SSD-ST), has been proposed for infrared small target detection.

  • 2)

    A new temporal filter called Adaptive Pipeline Filter (APF) has been proposed to correct the detection results based on the temporal correlation and motion information, which can effectively remove false alarms and improve precision.

  • 3)

    A novel two-stage detection algorithm for infrared small target based on deep learning has been proposed, which can achieve a recall rate higher than 90% and a precision higher than 95%.

The rest of this paper is organized as follows: In Section 2, we will introduce the proposed two-stage infrared small target detection and tracking method in detail. In Section 3, we will first introduce the dataset and evaluation system used in the experiments and then give the experimental results. Finally, we will draw the conclusions in Section 4.

Section snippets

Method

In this section, we will introduce our infrared small target detection and tracking method, which combines the advantages of single-frame detection and multi-frame filtering. We will first introduce the overall framework of our method, and then introduce the details of each part.

Experiment

In this section, we will first introduce the Infrared small target dataset and evaluation system used in our experiments, and then design a series of experiments to evaluate the detection performance of the SSD-ST and APF proposed in this paper for small infrared targets.

Conclusions

In this paper, we have presented a infrared small target detection and tracking method based on deep learning. Our method includes two stages: single-frame detection and multi-frame filtering. In the single frame detection stage, we have proposed an improved SSD object detection algorithm called SSD-ST to better adapt to the detection task of infrared small targets. In the multi-frame filtering stage, we have designed an adaptive pipeline filter (APF) to further reduce the false detection and

CRediT authorship contribution statement

Lianghui Ding: Conceptualization, Methodology, Experiment. Xin Xu: Data curation, Writing – original draft preparation, Experiment. Yuan Cao: Investigation, Experiment. Guangtao Zhai: Visualization, Investigation, Experiment. Feng Yang: Writing – reviewing and editing, Validation. Liang Qian: Supervision, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This paper is supported in part by NSFC China (61771309, 61671301), Shanghai Commission of Science and Technology Funding (SCST 15DZ2270400), and Shanghai Key Laboratory Funding (STCSM 18DZ1200102).

Lianghui Ding (Member, IEEE) received his Ph.D. in 2009 from Shanghai Jiao Tong University (SJTU), China. From Sep. 2009 to Dec. 2010, he was a researcher in Signals and Systems, Uppsala University, Sweden. Currently, he is an Associate Professor in Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University. His research areas include wireless communication, wireless power transfer, and image processing, etc. He has published more than 60 papers and applied for more

References (34)

  • S.D. Deshpande et al.

    Max-mean and max-median filters for detection of small targets

  • S. Kim et al.

    Small target detection utilizing robust methods of the human visual system for IRST

    J. Infrared Millim. Terahertz Waves

    (2009)
  • C.L.P. Chen et al.

    A local contrast method for small infrared target detection

    IEEE Trans. Geosci. Remote Sens.

    (2014)
  • J. Han et al.

    A robust infrared small target detection algorithm based on human visual system

    IEEE Geosci. Remote Sens. Lett.

    (2014)
  • Y. Qin et al.

    Effective infrared small target detection utilizing a novel local contrast method

    IEEE Geosci. Remote Sens. Lett.

    (2016)
  • J. Liu et al.

    Tiny and dim infrared target detection based on weighted local contrast

    IEEE Geosci. Remote Sens. Lett.

    (2018)
  • R. Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

  • Cited by (60)

    • High-resolution network for static infrared weak and small targets detection

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    Lianghui Ding (Member, IEEE) received his Ph.D. in 2009 from Shanghai Jiao Tong University (SJTU), China. From Sep. 2009 to Dec. 2010, he was a researcher in Signals and Systems, Uppsala University, Sweden. Currently, he is an Associate Professor in Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University. His research areas include wireless communication, wireless power transfer, and image processing, etc. He has published more than 60 papers and applied for more than 20 patents.

    Xin Xu received the B.E. degree from Hangzhou Dianzi University, China, in 2018. He is currently pursuing the master's degree with the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University. His research interests include image processing, computer vision and deep learning.

    Yuan Cao received his Ph.D. in 2014 from Beijing Institute of Technology, China. Currently, he is an engineer in Naval Research Academy. His research areas include image processing and image recognition, etc. He has published more than 10 papers.

    Guangtao Zhai (M'10) received the B.E. and M.E. degrees from Shandong University, Shandong, China, in 2001 and 2004, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2009.

    From 2008 to 2009, he was a Visiting Student with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, where he was a Post-Doctoral Fellow from 2010 to 2012. From 2012 to 2013, he was a Humboldt Research Fellow with the Institute of Multimedia Communication and Signal Processing, Friedrich Alexander University of Erlangen–Nuremberg, Erlangen, Germany. He is currently a Research Professor with the Institute of Image Communication and Information Processing, Shanghai Jiao Tong University. His research interests include multimedia signal processing and perceptual signal processing.

    Prof. Zhai was the recipient of the National Excellent Ph.D. Thesis Award from the Ministry of Education of China in 2012.

    Feng Yang (Member, IEEE) received the Ph.D. degree in information and communication from Shanghai Jiao Tong University. Since 2008, he has been on the Faculty of Shanghai Jiao Tong University, where he is currently an Associate Professor with the Department of Electronic Engineering. He takes part in the program of Beyond 3G Wireless Communication Testing System and is in charge of system design. He is also the PI of some national projects, including the National High Technology Research and Development Program of China (863 Program) and the National Natural Science Foundation of China. His research interests include wireless video communication and multihop communication.

    Liang Qian received the Ph.D. degree in communications and information processing from Shanghai Jiao Tong University, China, in 2004. He was a Visiting Scholar with the Institute of Information Processing, University of Kalsruhe, in 2002. He is currently an Associate Professor with the Department of Electronic Engineering, Shanghai Jiao Tong University. His research interests include digital signal processing for wireless cellular system, satellite signal processing for navigation system, and emergency wireless access for public security.

    View full text