Detection and tracking of infrared small target by jointly using SSD and pipeline filter

doi:10.1016/j.dsp.2020.102949

Digital Signal Processing

Volume 110, March 2021, 102949

https://doi.org/10.1016/j.dsp.2020.102949 Get rights and content

Abstract

Infrared imaging has been an efficient anti-drone approach due to its low-cost, anti-interference and all-weather working characteristics. However, the detection of Unmanned Aerial Vehicle (UAV) through infrared camera is still a challenging issue because infrared targets in the field-of-view are usually small and lack of shape and texture features. In this paper, we propose an infrared small target detection and tracking method based on deep learning. We improve the network architecture of Single Shot MultiBox Detector (SSD) for infrared small target detection, called Single Shot MultiBox Detector for Small Target (SSD-ST), by dropping low-resolution layers and enhance high-resolution layer. In addition, in order to further reduce the false alarm rate and improve the precision, we also design an Adaptive Pipeline Filter (APF) based on the temporal correlation and motion information to correct the detection results. We have evaluated our method over a dataset with 16177 infrared images and 30 trajectories. The results show our method is more robust than traditional methods in complex scenes, and achieve a recall rate higher than 90% and a precision higher than 95%, which prove that our method can well complete the detection and tracking task of infrared small targets.

Introduction

With the development of UAV technology, UAV has been widely used in both military and civil fields. In the military field, UAV has the characteristics of good concealment and strong survivability, which make UAV become increasingly important in modern warfare. In the civil field, due to the advantages of high efficiency and low cost, UAV is widely used in aerial photography, land monitoring and express delivery, etc [1]. However, while UAV technology brings convenience, it also brings a series of problems such as illegal invasion and interference of civil aviation, which pose a great threat to public safety and territorial security [2]. In order to solve these problems, variety of anti-UAV systems have been developed around the world. One of the most important technologies in the anti-UAV system is target monitoring and tracking. Nowadays, radar, infrared imaging and visible light imaging have all been tested for anti-UAV applications [3]. Among these technologies, infrared imaging has attracted more and more attention due to its advantages of low price, anti-interference, and all-weather working characteristic.

However, affected by factors such as the working environment and resolution issue, the UAV target in an infrared image is often very small. In extreme cases, the infrared target can be only a bright point. Therefore, the structure and texture information contained in the infrared image for target detection is scarce, which leads to the invalidation of detection algorithms based on the characteristics of the target structure. In addition, infrared small targets are also susceptible to atmospheric cloud radiation and imaging noise [4], which result in relatively low signal to noise ratio (SNR). All these reasons make it very difficult to detect and track infrared small targets.

To solve this challenging issue, variety of infrared small target detection algorithms have been proposed in the past few decades. These algorithms can be divided into two categories: sequential detection methods and single-frame detection methods. Sequential detection methods mainly use the temporal correlation and motion information of the target for multi-frame joint detection. Qu et al. [5] proposed the Discontinuous Frame Difference to get rid of most stationary pixels and then applied Optical Flow Algorithm to detect the moving target. Based on wavelet packet transform and kurtosis, Wu et al. [6] proposed a new de-noising method and applied it to detect weak and moving point target from image sequences. In general, sequential detection methods are less affected by the environment and can be applied to detection with high SNR. However, when the target moves slowly, the detection performance of such methods is often poor. Therefore, the existing algorithms are mainly based on single-frame detection.

Conventional single-frame detection methods mainly use a series of filters to achieve the detection. Legacy target detection algorithms based on spatial filter, such as the median subtraction filter [7], max mean and max-median filter [8] and the morphological top-hat filter [9] rely much on manually designed parameters, so their robustness and accuracy is often poor. Inspired by the human visual system (HVS), many contrast-based algorithms have been proposed in recent years. Kim et al. [10] proposed a contrast mechanism based algorithm to achieve target enhancement and background clutter suppression by tuning and maximizing the signal-to-clutter ratio (TMSCR) in Laplacian scale-space. Shao et al. [11] make use of Kim's method to increase image contrast and then use the morphological method to further eliminate residual clutter. Chen [12] et al. proposed a two-stage Local Contrast Measure (LCM) to measure the dissimilarity between the current location and its neighborhoods. They first use LCM to get the local contrast map of the input image, and then segment the target through an adaptive threshold. Based on LCM, a series of improved infrared target detection algorithms have been developed, such as ILCM [13], NLCM [14] and WLCM [15]. Contrast-based detection algorithms can improve detection accuracy, compared with the algorithms based on spatial filter. However, when the background becomes complex, the performance of contrast-based detection algorithms decreases significantly.

In recent years, with the development of deep learning, a series of high-performance deep learning-based object detection algorithms have been proposed in the field of visible light target detection. Among these algorithms, the most famous algorithms are the two-stage schemes represented by R-CNN (Regions with Convolutional Neural Network features) [16], Fast R-CNN [17] and Faster R-CNN [18], and the one-stage scheme represented by SSD [19] and YOLO (You Only Look Once) [20], [21], [22]. Inspired by these algorithms, deep learning has been tried in the field of infrared target detection. Du at al. [23] proposed a two-stage infrared target detection algorithm. They first used a Convolutional Neural Network (CNN) to extract features, and then used a support vector machine (SVM) to complete the classification to achieve infrared target detection. Sommer et al. [24] also proposed a two-stage infrared target detection algorithm based on deep learning. They first used the Region Proposal Network (RPN) proposed in Faster R-CNN to generate a series of candidate regions, and then sent these candidate regions to a CNN to classify and determine whether they are real targets. In addition, infrared target detection based on CNN has also been used in [25], [26], [27]. There are also some works using improved methods based on YOLO to detect small targets, such as [28], [29]. In general, compared with traditional single-frame detection algorithms, deep learning-based detection algorithms are more adaptive and more accurate. However, when the size of the target becomes smaller, the false alarm rate of these algorithms is still high.

In this paper, we proposed a novel infrared small target detection and tracking method based on deep learning. Our method includes two stages: single-frame detection and multi-frame filtering. In the single frame detection stage, we proposed an detection algorithm called Single Shot Detector for Small Target (SSD-ST) by dropping the deep low-resolution detection layer and further mining the shallow high-resolution feature layer in SSD to adapt to the detection task of infrared small targets. In the multi-frame filtering stage, we proposed an Adaptive Pipeline Filter (APF) based on the temporal correlation and motion information to correct the detector's detection results and reduce the false detection. We have evaluated our method over a dataset with 16177 infrared images and 30 trajectories. The results show that our method is more robust than traditional methods in complex scenes, and can achieve a recall rate of 90% and a precision of 95%, which prove that our method can well complete the detection and tracking task of infrared small targets. The main contributions of this paper can be summarized as follows:

1)
A new object detection algorithm called Single Shot MultiBox Detector for Small Target (SSD-ST), has been proposed for infrared small target detection.
2)
A new temporal filter called Adaptive Pipeline Filter (APF) has been proposed to correct the detection results based on the temporal correlation and motion information, which can effectively remove false alarms and improve precision.
3)
A novel two-stage detection algorithm for infrared small target based on deep learning has been proposed, which can achieve a recall rate higher than 90% and a precision higher than 95%.

The rest of this paper is organized as follows: In Section 2, we will introduce the proposed two-stage infrared small target detection and tracking method in detail. In Section 3, we will first introduce the dataset and evaluation system used in the experiments and then give the experimental results. Finally, we will draw the conclusions in Section 4.

Section snippets

Method

In this section, we will introduce our infrared small target detection and tracking method, which combines the advantages of single-frame detection and multi-frame filtering. We will first introduce the overall framework of our method, and then introduce the details of each part.

Experiment

In this section, we will first introduce the Infrared small target dataset and evaluation system used in our experiments, and then design a series of experiments to evaluate the detection performance of the SSD-ST and APF proposed in this paper for small infrared targets.

Conclusions

In this paper, we have presented a infrared small target detection and tracking method based on deep learning. Our method includes two stages: single-frame detection and multi-frame filtering. In the single frame detection stage, we have proposed an improved SSD object detection algorithm called SSD-ST to better adapt to the detection task of infrared small targets. In the multi-frame filtering stage, we have designed an adaptive pipeline filter (APF) to further reduce the false detection and

CRediT authorship contribution statement

Lianghui Ding: Conceptualization, Methodology, Experiment. Xin Xu: Data curation, Writing – original draft preparation, Experiment. Yuan Cao: Investigation, Experiment. Guangtao Zhai: Visualization, Investigation, Experiment. Feng Yang: Writing – reviewing and editing, Validation. Liang Qian: Supervision, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This paper is supported in part by NSFC China (61771309, 61671301), Shanghai Commission of Science and Technology Funding (SCST 15DZ2270400), and Shanghai Key Laboratory Funding (STCSM 18DZ1200102).

Lianghui Ding (Member, IEEE) received his Ph.D. in 2009 from Shanghai Jiao Tong University (SJTU), China. From Sep. 2009 to Dec. 2010, he was a researcher in Signals and Systems, Uppsala University, Sweden. Currently, he is an Associate Professor in Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University. His research areas include wireless communication, wireless power transfer, and image processing, etc. He has published more than 60 papers and applied for more

References (34)

Y. Li et al.
Robust infrared small target detection using local steering kernel reconstruction
Pattern Recognit.
(2018)
M. Zeng et al.
The design of Top-Hat morphological filter and application to infrared target detection
Infrared Phys. Technol.
(2006)
X. Shao et al.
An improved infrared dim and small target detection algorithm based on the contrast mechanism of human visual system
Infrared Phys. Technol.
(2012)
Z. Fan et al.
Dim infrared image enhancement based on convolutional neural network
Neurocomputing
(2018)
C. Jing et al.
Detection capability of infrared thermal imaging system analysis
C. Wang et al.
Flying small target detection for anti-UAV based on a Gaussian mixture model in a compressive sensing domain
Sensors
(2019)
S.R. Ganti et al.
Implementation of detection and tracking mechanism for small UAS
Y. Qu et al.
Detecting small moving target in image sequences using optical flow based on the discontinuous frame difference
Y. Wu et al.
A weak moving point target detection method based on high frame rate image sequences
J. Barnett
Statistical analysis of median subtraction filtering with application to point target detection in infrared backgrounds

S.D. Deshpande et al.

Max-mean and max-median filters for detection of small targets

S. Kim et al.

Small target detection utilizing robust methods of the human visual system for IRST

J. Infrared Millim. Terahertz Waves

(2009)

C.L.P. Chen et al.

A local contrast method for small infrared target detection

IEEE Trans. Geosci. Remote Sens.

(2014)

J. Han et al.

A robust infrared small target detection algorithm based on human visual system

IEEE Geosci. Remote Sens. Lett.

(2014)

Y. Qin et al.

Effective infrared small target detection utilizing a novel local contrast method

IEEE Geosci. Remote Sens. Lett.

(2016)

J. Liu et al.

Tiny and dim infrared target detection based on weighted local contrast

IEEE Geosci. Remote Sens. Lett.

(2018)

R. Girshick et al.

Rich feature hierarchies for accurate object detection and semantic segmentation

Cited by (60)

High-resolution network for static infrared weak and small targets detection
2024, Engineering Applications of Artificial Intelligence
Infrared weak and small target detection algorithms have important applications in the field of infrared remote sensing. Since the small pixel proportion in the imaging plane and a lack of distinctive features, achieving accurate and rapid detection for infrared weak and small targets remains a highly challenging problem. Building upon this, we proposed a high-resolution infrared weak and small target detection(HR-IWSTDet) model. 1.HR-IWSTDet constructs a backbone with multi-resolution subnetwork in parallel, using the output feature map from the $2 \times$ high-resolution subnetwork as head to retain fine-grained features, ensures that more positive samples are attended to during the label assignment process. 2. Introduced the channel-splitting attention(CSA) Block, which utilizes the Cross-Resolution Spatial Attention Module (CRSAM) and Single-Resolution Channel Attention Module (SRCAM) to replace two 3 × 3 convolutions in the BasicBlock, enables information flow in both spatial and channel domains, significantly reducing model parameters and inference time. 3. Adopted an enhanced coordinate representation by decoupling the horizontal coordinate $- x$ and vertical coordinate $- y$ into two separate one-dimensional vectors. These vectors are adjusted by a scaling factor $s$ to encode the center point coordinates with a finer measuring unit. One-dimensional vector calculations use Gaussian kernel functions, considering spatial correlations between adjacent labels. Moreover, we built the IWSTD dataset, which consists entirely of infrared weak small target samples. Experimental results on this dataset show that HR-IWSTDet has a parameter count of only 2.60M, achieving an Average Precision (AP) of 84.85%. The inference time for a single frame is as low as 0.015 s. HR-IWSTDet also outperforms existing methods on the public IASTD dataset. Experimental data validates the effectiveness and generalization of the proposed approach.
CourtNet: Dynamically balance the precision and recall rates in infrared small target detection
2023, Expert Systems with Applications
Infrared small-target detection (ISTD) is an important computer vision task. ISTD aims at separating small targets from complex background clutter. The infrared radiation decays with distance, making the targets highly dim and prone to confusion with the background clutter, which makes the detector challenging to balance the precision and recall rates. To deal with this difficulty, this paper proposes a neural-network-based ISTD method called CourtNet, which has three sub-networks: the prosecution network is designed to improve the recall rate; the defendant network is devoted to increasing the precision rate; the jury network weights their results to adaptively balance the precision and recall rates. CourtNet takes the structure of Transformers, whose feature resolution remains unchanged. Furthermore, the prosecution network utilizes a densely connected structure, which can prevent small targets from disappearing in the forward propagation. In addition, a fine-grained attention module performs attention inside patches to accurately locate the small targets. This paper implements extensive experiments on two ISTD datasets, MFIRST and SIRST, and compares CourtNet with ten other traditional and deep-learning-based methods. Experimental results show that with the fast detection speed (60.61 FPS), CourtNet achieves the best $F_{1}$ score, 0.62 (in MFIRST) and 0.73 (in SIRST), among the compared methods. The code and dataset will be available at https://github.com/PengJingchao/CourtNet.
Infrared maritime target detection based on edge dilation segmentation and multiscale local saliency of image details
2023, Infrared Physics and Technology
Infrared maritime target detection is a key technology in the field of maritime search and rescue, which usually requires high detection accuracy. It is challenging to detect dark and weak targets and targets of different sizes. Some methods utilizing grayscale features unable to detect dark targets owing to the inconsideration of the target whose grayscale is lower than its local background. To solve this problem, the medium and high-frequency information in the image is extracted and used as the basis for feature extraction. Besides, although methods based on local contrast can solve the problem of missing detection caused by weak targets with obscure features, the local contrast calculation may be inaccurate and the targets may be missed when the size of the sliding window and target are unmatched. To solve this problem, an edge dilation segmentation method is proposed to obtain complete suspected targets. Then each suspected target is taken as the central block of the local area to ensure that both weak targets and targets of different sizes can be detected. In addition, some wave clutter is prone to cause false alarms due to its characteristics similar to the target. To solve this problem, the multiscale local backgrounds are constructed with certain proportions of the size of the suspected target, and the local saliency of the suspected target is calculated to separate the target from the clutters. Compared with the ten leading methods, the proposed method shows outstanding results, with relatively higher detection accuracy.
Detection and tracking of floating objects based on spatial-temporal information fusion
2023, Expert Systems with Applications
Floating materials seriously damage the landscape and ecosystem of rivers and visual surveillance has become an important technique for improving the water environment. However, it remains a challenging problem in practical applications due to small-scale targets and high scene complexity with many noise problems such as water wave disturbance, light and shadow change, and strong light emission. To address these issues, this study proposes a floating object detection and tracking method based on spatial–temporal information fusion. Specifically, this study improves the network architecture of the Single Shot Multibox Detector (SSD) by enhancing the high-resolution layers to adapt to the detection task of small floating targets. Then, an improved Kernel Correlation Filter (KCF) by introducing a fast histogram of oriented gradient (FHOG) and a pyramid scale estimation strategy is proposed to achieve the estimation of the position and size of floating objects. More significantly, a spatial–temporal information fusion strategy is applied to complement detection information with tracking information based on feature comparison. The proposed method is trained and compared with the state-of-the-art methods based on multiple scenarios. The results show that the proposed method has better performance than other methods in different scenarios, and achieves an average accuracy of more than 91% with a speed of 15.55 FPS, which prove that our method can well complete the detection and tracking task of floating objects. This work enriches the framework of “tracking by detection” and extends the application of floating object detection and tracking in surface vision.
YOLOSR-IST: A deep learning method for small target detection in infrared remote sensing images based on super-resolution and YOLO
2023, Signal Processing
Infrared remote sensing imaging has a wide range of military and civilian applications. The detection of dim small targets is one of the most valuable research topics in this field. However, model-driven methods are not robust enough to noise, target size and contrast in images, and the currently proposed deep learning methods have insufficient ability to process and fuse important features, resulting in more missed detections and false alarms in these methods. To solve these problems, in this paper, a detection method based on super-resolution and deep learning is proposed. First, we use super-resolution preprocessing and multiple data augmentation on the input images. Secondly, based on the characteristics of infrared small target, we propose a new deep learning network named YOLOSR-IST. This network is based on a series of improvements on YOLOv5, including adding Coordinate Attention to backbone, introducing a high-resolution feature map P2 in the feature fusion, and replacing bottleneck layer of the C3 module in the head of the network with Swin Transformer Blocks. The proposed method achieves [email protected] of 99.2% and 94.6% on two public datasets respectively, and solves the problem of missed detections and false alarms more effectively compared with current advanced data-driven detection methods.
High-precision real-time UAV target recognition based on improved YOLOv4
2023, Computer Communications
In recent years, unmanned aerial vehicles (UAVs) have gained widespread use in both military and civilian fields with the advancement of aviation technology and improved communication capabilities. However, the phenomenon of unauthorized UAV flights, or “black flying”, poses a serious threat to the safe flight of aircraft in airspace and public safety. To effectively interfere with and attack UAV targets, it is crucial to enhance the detection and identification of “low, slow and small” UAVs. This study focuses on achieving high-precision and lightweight detection and identification of four-rotor, six-rotor, and fixed-wing UAVs in low-altitude complex environments. By combining deep learning target detection with superresolution feature enhancement, a lightweight UAV detection model is designed and field-tested for verification. To address the challenge of detecting small UAV targets with limited information, the feature fusion network is enhanced based on the traditional YOLOv4 algorithm to improve the detection ability of small targets via small target enhancement and candidate box adjustment. The feasibility of the improved network is quantitatively and qualitatively analyzed. Channel pruning and layer pruning are then applied to the network, significantly reducing its depth and width and realizing a lightweight network. Finally, reasoning quantification is conducted on the embedded platform to enable end-side deployment of the target detection algorithm.

View all citing articles on Scopus

Xin Xu received the B.E. degree from Hangzhou Dianzi University, China, in 2018. He is currently pursuing the master's degree with the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University. His research interests include image processing, computer vision and deep learning.

Yuan Cao received his Ph.D. in 2014 from Beijing Institute of Technology, China. Currently, he is an engineer in Naval Research Academy. His research areas include image processing and image recognition, etc. He has published more than 10 papers.

Guangtao Zhai (M'10) received the B.E. and M.E. degrees from Shandong University, Shandong, China, in 2001 and 2004, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2009.

From 2008 to 2009, he was a Visiting Student with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, where he was a Post-Doctoral Fellow from 2010 to 2012. From 2012 to 2013, he was a Humboldt Research Fellow with the Institute of Multimedia Communication and Signal Processing, Friedrich Alexander University of Erlangen–Nuremberg, Erlangen, Germany. He is currently a Research Professor with the Institute of Image Communication and Information Processing, Shanghai Jiao Tong University. His research interests include multimedia signal processing and perceptual signal processing.

Prof. Zhai was the recipient of the National Excellent Ph.D. Thesis Award from the Ministry of Education of China in 2012.

Feng Yang (Member, IEEE) received the Ph.D. degree in information and communication from Shanghai Jiao Tong University. Since 2008, he has been on the Faculty of Shanghai Jiao Tong University, where he is currently an Associate Professor with the Department of Electronic Engineering. He takes part in the program of Beyond 3G Wireless Communication Testing System and is in charge of system design. He is also the PI of some national projects, including the National High Technology Research and Development Program of China (863 Program) and the National Natural Science Foundation of China. His research interests include wireless video communication and multihop communication.

Liang Qian received the Ph.D. degree in communications and information processing from Shanghai Jiao Tong University, China, in 2004. He was a Visiting Scholar with the Institute of Information Processing, University of Kalsruhe, in 2002. He is currently an Associate Professor with the Department of Electronic Engineering, Shanghai Jiao Tong University. His research interests include digital signal processing for wireless cellular system, satellite signal processing for navigation system, and emergency wireless access for public security.

View full text

Detection and tracking of infrared small target by jointly using SSD and pipeline filter

Abstract

Introduction

Section snippets

Method

Experiment

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Pattern Recognit.

Infrared Phys. Technol.

Infrared Phys. Technol.

Neurocomputing

Detection capability of infrared thermal imaging system analysis

Flying small target detection for anti-UAV based on a Gaussian mixture model in a compressive sensing domain

Sensors

Implementation of detection and tracking mechanism for small UAS

Detecting small moving target in image sequences using optical flow based on the discontinuous frame difference

A weak moving point target detection method based on high frame rate image sequences

Statistical analysis of median subtraction filtering with application to point target detection in infrared backgrounds

Max-mean and max-median filters for detection of small targets

Small target detection utilizing robust methods of the human visual system for IRST

J. Infrared Millim. Terahertz Waves

A local contrast method for small infrared target detection

IEEE Trans. Geosci. Remote Sens.

A robust infrared small target detection algorithm based on human visual system

IEEE Geosci. Remote Sens. Lett.

Effective infrared small target detection utilizing a novel local contrast method

IEEE Geosci. Remote Sens. Lett.

Tiny and dim infrared target detection based on weighted local contrast

IEEE Geosci. Remote Sens. Lett.

Rich feature hierarchies for accurate object detection and semantic segmentation