Regular articleInfrared small target segmentation with multiscale feature representation
Introduction
Small target segmentation for infrared images, which aims to allocate the labels of the class to each pixel in given infrared images, plays an essential role in computer vision tasks (e.g., maritime surveillance [1] and infrared guidance [2] among others). Distinct from the common RGB small targets, infrared small targets have their own obvious characteristics. First, due to the long distances between targets and infrared sensors, the sizes of targets are extremely small in infrared images, even containing only one pixel. In other words, background pixels are usually dominant in infrared images. Second, the appearance of the targets is very dim in infrared images since the infrared radiation energy obviously attenuates with the distance. Consequently, the target is easily confused with the background and sensor noise. The above characteristics significantly complicate infrared small target segmentation.
Traditional methods design hand-crafted features based on various assumptions. The early morphological filtering-based methods [3] need to optimize the parameters of the morphological structural elements for different scenes and cannot stably adapt to scenes in which the sizes of targets vary within a large range. The local contrast measure-based methods [5], [27], [28], [29], [30], whose core idea is to enhance the target and suppress the noise, assume that the target is brighter than the background. These methods usually slide a predefined window on an infrared image and calculate the ratio or difference between the central pixel of the window and the surrounding pixels as an indicator of local contrast. Pixels with higher local contrast are more likely to be the target. However, these methods fail to handle the cases in which the target is close to bright background because of the low local contrast measure of the target. The robust principal component analysis (RPCA)-base methods [4], [31] assume that the target and background can be approximately represented as low-rank and sparse matrices. However, this type of method has serious missed detections. Generally, traditional methods limited by multiple assumptions and prior knowledge cannot obtain better generality.
Recently, deep convolutional neural networks (CNNs), especially fully convolutional networks (FCNs) [6] have achieved remarkable progress in segmentation tasks [7], [8]. Meanwhile, infrared small target segmentation still belongs to the segmentation task. Naturally, there is a question: is it feasible to apply the deep CNN-based method for RGB images to segment infrared small targets? We investigate the distinction between conventional segmentation and infrared small target segmentation again, which is mainly reflected in the multiscale appearances of the targets to be segmented. Intuitively, the method can segment infrared small targets if endowed with a strong capability to model multiscale features. One solution is to implement powerful pyramid structures, such as pyramid pooling modules (PPMs) [9] and atrous spatial pyramid pooling (ASPP) [7], [10], that adequately capture the rich features of different targets at multiple scales. To address the other difficulty that infrared small targets are easily confused with background noise, inspired by [11], [12], we can accurately measure the local similarity by explicitly modeling the differences between pixels. After obtaining multiscale features, the methods consider how to aggregate these features, that is, how to decode. Existing methods usually leverage simple and direct merge operations [6], [8]. However, they may face the problem that these merge operations do not enable a seamless connection between shallow features and multiscale deep features [13]. Shallow features refer to low-level features that contain abundant coarse information for the edges, lines, and corners while deep features refer to high-level features that contain the abstract information of the target. Pang et al. [13] refer to this kind of difference as a semantic gap. A simple merging operation leads to the retention of coarse information in the newly fused features, and this coarse information may be regarded as background noise and affect the segmentation of infrared small targets.
Following the above idea, we propose a CNN-based method with multiscale feature representation to solve the segmentation of infrared small targets, and the overall pipeline is illustrated in Fig. 1. Networks should possess a powerful feature representation ability to model infrared targets due to their tiny appearances. To enhance the network with the representation learning of multi-scale targets, we built several pyramid modules in parallel with different scales. As mentioned above, this pyramid structure can capture the multiscale features of the target. Each scale corresponds to a local similarity pyramid module (LSPM). In addition, dim infrared targets can easily be confused with complicated background regions. To address this issue, the LSPM calculates the local similarity weight for each pixel, which quantifies the degree to which the pixel is similar to other pixels in feature maps. However, the LSPM may lose region information since not all pixels are equally important while the region that contain the target can produce useful information. Inspired by [12] the LSPM transforms the pixel-level similarity to the region-level similarity that represents the degree of similarity of the pixels in specific regions of feature maps. In this way, the LSPM further distinguishes an infrared small target and background by learning the region information. Finally, we stack several LSPMs in parallel to obtain pyramid features that have rich information on targets. Note that we add extra upsampling pyramid features in parallel, termed global guidance, as a supplement to feature aggregation module (FAM). The FAM contains three inputs: shallow features, deep features, and global guidance. It is reasonable to retain important feature maps and ignore irrelevant feature maps during the feature aggregation. We consider the fusion process from channel attention [14] that aggregate each input with a specific weight. In the above ways, the network can be competent to conduct infrared small target segmentation with the spontaneous perception of multiscale features. The contributions are summarized as follows:
We proposed infrared small object segmentation with multiscale feature representation and achieve state-of-the-art performance on benchmark datasets.
The multiscale feature pyramid and feature aggregation with channel attention boost the multiscale representation learning ability of the network.
Section snippets
Related work
Our network is highly related to the following aspects.
Infrared small target segmentation: Conventional methods usually leverage morphological filtering [3] local contrast measures [5], [27], [28], [29], [30] and RPCA [4], [31], followed by adaptive thresholding to segment infrared small targets. Zeng et al. [3] design the Top-Hat morphological filter with the parameters of morphological structural elements via global search. Wei et al. [5] construct a local contrast measure based on a
Methods
In this section, we start with an illustration of the local similarity pyramid module and then describe the feature aggregation module with channel attention.
Experiments
In this section, we first describe the benchmark datasets and metrics. We then provide the implementation details for our network. Next, the proposed network is qualitatively and quantitatively compared with state-of-the-art Infrared Small Target Segmentation (ISTS) method. Finally, we conduct an ablation study to demonstrate the effectiveness of our modules.
Conclusion
In this paper, we propose a network that leverages the local similarity pyramid module and feature aggregate module to segment infrared small targets. Extensive experiments illustrate the network can extract the multiscale features of infrared small targets and spontaneously distinguish target and noises with local similarity. The performance of network outperform existing state-of-the-art methods. Meanwhile, our network can extend to other generic small target object segmentation or semantic
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this papers.
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61671094)
References (33)
- et al.
Infrared target detection in backlighting maritime environment based on visual attention model
Infrared Phys. Technol.
(2019) - et al.
An improved weak light detector used for infrared Imaging guidance system
Optik.
(2016) - et al.
The design of top-hat morphological filter and application to infrared target detection
Infrared Phys. Technol.
(2006) - et al.
Multiscale patch-based contrast measure for small infrared target detection
Pattern Recogn.
(2016) - et al.
Small infrared target detection using absolute average difference weighted by cumulative directional derivatives
Infrared Phys. Technol.
(2019) - et al.
Infrared small target detection via line-based reconstruction and entropy-induced suppression
Infrared Phys. Technol.
(2016) - et al.
Infrared patch-image model for small target detection in a single image
IEEE Trans. Image Process.
(2013) - et al.
- et al.
DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
Cited by (40)
RC-Net: A region-level context network for hyperreflective dots segmentation in retinal OCT images
2024, Optics and Lasers in EngineeringGlobal attention network with multiscale feature fusion for infrared small target detection
2024, Optics and Laser TechnologyCourtNet: Dynamically balance the precision and recall rates in infrared small target detection
2023, Expert Systems with ApplicationsInfrared small target segmentation networks: A survey
2023, Pattern RecognitionDesigning and learning a lightweight network for infrared small target detection via dilated pyramid and semantic distillation
2023, Infrared Physics and Technology