Dynamic imposter based online instance matching for person search
Introduction
Person re-identification is the task of searching person-of-interest across non-overlapping camera views[1]. It has attracted growing research interests for its great value of applications in criminal spotting [2], multi-pedestrian tracking [3] and intelligent security [4]. Numerous endeavors on person re-identification have been made over recent decades [5], [6]. However, it is still far from applying current person re-identification techniques into practical intelligent monitoring systems. One of the key reasons is that typical re-identification systems assume that the person images must be well cropped and aligned from the scene images. While in real-world applications, we usually need to find a target person from the whole images or video frames without available pedestrian boxes.
Person search is a new valuable topic that bridges the gap between person re-identification and the real-world applications [7], [8]. We illustrate the difference between person search and conventional re-identification in Fig. 1. The new task requires a close cooperation between the detector and the identifier. Recently, great efforts have been poured into person search. The technique roots can be coarsely divided into two categories: detection-free methods and detection-based methods. The detection-free methods attempt to recursively shrink the focus area till achieving the precise localization of the target [9], [10]. However, it is computationally prohibitive with the increasing of the gallery size. For the detection-based methods, the most common way is to divide the problem into pedestrian detection and person re-identification tasks [8], [11]. However, the two tasks are highly correlated. Firstly, the feature information can be shared to avoid accumulative error, and save heavy time cost for images of crowds. Secondly, detection and re-identification can complement each other. The qualities of detections largely determine the accuracy of recognition, while the results of recognition provide feedback to refine the locations of detections. Therefore, it will be beneficial to co-learn the pedestrian detection and person re-identification simultaneously.
Despite the considerable progress achieved in recent years, it is still a challenging problem to learn powerful features for person matching. The main reason is that the training samples for each identity are considerably small, and a large amount of unlabeled identities are existed in person search datasets. It is tough to learn discriminative person representations with many classes and little class-specific samples. Therefore, some approaches attempt to exploit the information of unlabeled pedestrians to reinforce the representation power. For example, the Online Instance Matching (OIM) loss [7] treats all the unlabeled persons as a negative class. It forces a labeled person to keep away from the different labeled identities stored in a lookup table, and the unlabeled persons maintained in a circular queue. Nevertheless, the unlabeled persons do not participate in the training process. To solve this problem, the Instance Enhancing Loss (IEL) [12] is proposed to integrate unlabeled persons into the feature learning process. It selectively annotates unlabeled new persons to the labeled identities that they are most similar to. However, the selected unlabeled persons are actually hard negative samples. To learn discriminative representations, those hard negative samples should keep away from the corresponding labeled identities.
To address the above issues, in this paper we propose an novel end-to-end person search framework which integrates both pedestrian detection and person identification to improve the overall accuracy and reduce computations. To make better use of the unlabeled persons, a novel Dynamic Imposter based Online Instance Matching (DI-OIM) loss is proposed. The proposed loss is inspired by the observation that pedestrians appearing in the same image obviously have different identities. Thus, we assign unlabeled persons with dynamic pseudo-labels. The representations of pseudo-labeled persons are defined as imposters, since they do not belong to any of the labeled identities. The features of all the labeled persons are stored in a lookup table. The imposters along with the lookup table are used to optimize the proposed framework. All the different persons are forced to keep away from each other. With the proposed DI-OIM loss, our end-to-end model demonstrates a good efficiency and effectiveness.
In summary, our main contributions are three-folds:
- •
An end-to-end trainable learning framework is proposed for person search. The framework integrates pedestrian detection and person re-identification in a unified framework. By co-learning the two tasks, the learned features are more informative.
- •
A novel DI-OIM loss is proposed to exploit the information of the unlabeled pedestrians. The proposed loss can not only distinguish labeled pedestrians from different identities, but also make the unlabeled pedestrians far from each other.
- •
By unifying the detection and the re-identification tasks, the proposed model achieves state-of-the-art performances on the CUHK-SYSU [7] and PRW datasets [8].
Section snippets
Pedestrian detection
Pedestrian detection aims to localize pedestrians in images and generate bounding boxes for persons. In person search systems, pedestrian detection plays an important role. A large number of efforts have been made to automatically detect pedestrians in natural scenes. Traditional methods are mainly based on handcrafted features and linear classifiers, e.g. Aggregated Channel Features (ACF) [13] and Locally Decorrelated Channel Features (LDCF) [14]. Recently, Convolutional Neural Networks (CNNs)
The proposed approach
In this section, we firstly describe the overall architecture of our framework. Then we briefly explain the OIM loss and the IEL. After that, we elaborate the proposed DI-OIM loss and describe the inference process.
Experiments
In this section, we thoroughly evaluate our method on two public person search datasets. We first briefly introduce the datasets, the evaluation protocols and the implementation details. Secondly, we analyze the proposed loss and make comparisons with other related losses. To validate the effectiveness of our method, we then make extensive comparisons with state-of-the-art algorithms. At last, we conduct further analysis and discussions.
Conclusion
In this work, we focus on the problem of unconstrained person search, where pedestrian bounding boxes are unavailable. We propose an end-to-end framework to simultaneously consider pedestrian detection and person re-identification. Since many unlabeled pedestrians exist in person search datasets, a novel DI-OIM loss is proposed to exploit the information of unlabeled persons. Inspired by the observation that pedestrians within the same image obviously have different identities, we assign
Acknowledgment
This work is supported in part by the National Natural Science Foundation of China (NSFC), Nos. 61725202, 61751212 and 61771088.
References (39)
Intelligent multi-camera video surveillance: a review
Pattern Recognit. Lett.
(2013)- et al.
Deep visual tracking: review and experimental comparison
Pattern Recognit.
(2018) - et al.
Cross-view semantic projection learning for person re-identification
Pattern Recognit.
(2018) - et al.
Regularized local metric learning for person re-identification
Pattern Recognit. Lett.
(2015) - et al.
Deep feature learning with relative distance comparison for person re-identification
Pattern Recognit.
(2015) - et al.
Deep ranking model by large adaptive margin learning for person re-identification
Pattern Recognit.
(2017) - et al.
IAN: the individual aggregation network for person search
Pattern Recognit.
(2019) - et al.
Multi-target tracking by learning local-to-global trajectory models
Pattern Recognit.
(2014) - et al.
Person re-identification by local maximal occurrence representation and metric learning
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(2015)
Learning a discriminative null space for person re-identification
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
Joint detection and identification feature learning for person search
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
Person re-identification in the wild
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
Neural person search machines
Proceedings of IEEE International Conference on Computer Vision
RCAA: relational context-aware agents for person search
Proceedings of European Conference on Computer Vision
Person search via a mask-guided two-stream CNN model
Proceedings of European Conference on Computer Vision
Instance enhancing loss: deep identity-sensitive feature embedding for person search
Proceedings of IEEE International Conference on Image Processing
Fast feature pyramids for object detection
Local decorrelation for improved pedestrian detection
Advances in Neural Information Processing Systems
Cited by (21)
Towards effective person search with deep learning: A survey from systematic perspective
2024, Pattern RecognitionMI<sup>3</sup>C: Mining intra- and inter-image context for person search
2024, Pattern RecognitionJoint discriminative representation learning for end-to-end person search
2024, Pattern RecognitionLearning feature contexts by transformer and CNN hybrid deep network for weakly supervised person search
2024, Computer Vision and Image UnderstandingMaking person search enjoy the merits of person re-identification
2022, Pattern Recognition