RGB-IR cross-modality person ReID based on teacher-student GAN model
Introduction
Person ReID is also referring to as pedestrian ReID ([29]). It is designed to match specific pedestrians in images or video sequences. The main challenge of ReID is that the intra-class (same person in different situations) variations are usually significant due to the changes in camera viewing conditions, such as viewpoint or situation differences, which makes it challenging to identify the same person. Meanwhile, the inter-class (different people in the same situation) variations also influence ReID performance.
In recent years, most existing works in person ReID are to learn discriminative features of person identity by a specifically designed backbone model ([1], [21], [22]). There are also works focusing on problems of occlusions ([11], [37]), different poses ([17]), illumination changes ([31]), lack of labels ([23], [24]) and resolution changes ([12]). These works are under the RGB-RGB camera setting. Both query images and the gallery images are in RGB mode.
However, RGB-RGB camera ReID is greatly restricted when the light condition is weak or unavailable. A person may appear in one camera during the day and reemerge in another camera at night. In such a case, the RGB images captured at night by an RGB camera will have little effective information in ReID because of the darkness. As shown in Fig 1, human eyes can hardly get any person identity information in the images which have lots of noise as well.
As known to all, an infrared camera forms a grey image (single channel image) using infrared radiation, which can increase in-the-dark visibility without actually using a visible light source. Thus, using both RGB and IR images will complement each other and enhance person ReID performance. As shown in Fig 2, the query images are all under IR modality, providing much more information than those in Fig 1.
However, few researchers have studied such RGB-IR cross-modality person ReID. The main challenge is that different pedestrians can appear to be very similar in the same modality, while the same pedestrian under different modalities can look quite different. Another challenge is that IR images only have grey-scale pixels, which provide much less information compared to RGB images, making it more challenging to extract effective features for the task of ReID.
In this paper, we propose a Teacher-Student GAN based cross-modality person ReID model (TS-GAN). The critical insight of our approach is that we design a novel network in which the Student ReID module is guided by a pretrained Teacher module to encourage the closeness between RGB and IR ReID features. It tremendously improves the quality of features obtained for ReID classification and reduces the gap between different modalities. To improve the model’s effectiveness, we have added several innovations summarised as follows:
- 1.
IR ReID Teacher module is pretrained using Real IR images in the train-set, which obtains very high accuracy. We then use it as the teacher to guide the feature learning in the Student ReID module.
- 2.
We use joint cycle-consistency GAN with joint discriminator to generate the corresponding Fake IR person images from the input Real RGB person images, thus obtain pair-wise person images under different modalities. The (Real RGB, Fake IR) image pairs are then used to train the Student ReID module with MSE reconstruction loss such that the cross-modality gap can be reduced.
- 3.
To enhance the feature extraction ability of the Student module, we also use another two MSE losses. One is between Real IR image features from the Teacher module and those from the Student module. The other one is between Fake IR image features from both modules.
- 4.
Unlike other GAN based cross-modality ReID methods, our model only requires the GAN module at the train stage. During testing, it is more resource-saving and efficient to feed-forward through the ReID backbone module without the involvement of GAN.
Section snippets
RGB-RGB person ReID
Most researchers focused on traditional RGB-RGB person ReID. One primary method of person ReID is metric learning, which is to formalize the problem as supervised metric learning where a projection matrix is sought out ([5], [28], [30]). Another primary method is to learn appropriate features associated with the same ID using features distance information ([9]) on a backbone module ([1], [18]), such as Resnet50 ([8]). All of these works focus on RGB-RGB person ReID, which may fail in some
Overview
The overall model structure for our TS-GAN model is shown in Fig 3. The whole model consists of three main parts, which are: (1) RGB-IR image generation module, (2) ReID backbone and (3) RGB-IR TS module. We use subscripts “S” and “T” to distinguish blocks belonging to the Student or Teacher modules.
As shown in Fig 3, and (in the green dotted rectangle) are the generator and joint discriminator for IR images. (in the red dotted rectangle) is the ReID feature encoder. (in the blue
Dataset and evaluation protocol
SYSU-MM01 ([27]) is the most popular and newest dataset in RGB-IR cross-modality person ReID. It contains images captured by six cameras, including two IR cameras and four RGB ones. RegDB ([15]) is collected by dual camera systems. It contains 412 identities and each identity has 10 different thermal (IR) images and 10 different visible (RGB) images.
Our experiments follow the standard evaluation protocol in existing RGB-IR cross-modality ReID methods. For SYSU-MM01, There are two evaluation
Conclusion
In this paper, we proposed a novel TS-GAN model to learn the common representation features for RGB-IR cross-modality person images. We designed the IR joint GAN and IR Teacher model to enhance Student ReID backbone and reduce the domain gap of inputs from different modalities. Comprehensive experiments on challenging cross-modality person ReID datasets, SYSU-MM01 and RegDB, have demonstrated that our approach outperforms the state-of-the-art methods regarding ReID accuracy.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (37)
- et al.
Deep feature learning with relative distance comparison for person re-identification
Pattern Recognit.
(2015) - et al.
Multi-level factorisation net for person re-identification
CVPR
(2018) - et al.
Darkrank: accelerating deep metric learning via cross sample similarities transfer
AAAI
(2018) - et al.
Cross-modality person re-identification with generative adversarial training.
IJCAI
(2018) - et al.
Histograms of oriented gradients for human detection
CVPR
(2005) - et al.
Generative adversarial nets
- et al.
Hsme: hypersphere manifold embedding for visible thermal person re-identification
Proceedings of the AAAI Conference on Artificial Intelligence
(2019) - et al.
Deep residual learning for image recognition
CVPR
(2016) - A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person re-identification,...
- G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network,...
Vrstc: occlusion-free video person re-identification
CVPR
Recover and identify: a generative dual model for cross-resolution person re-identification
ICCV
Person re-identification by local maximal occurrence representation and metric learning
CVPR
Bag of tricks and a strong baseline for deep person re-identification
CVPR Workshops
Person recognition system based on a combination of body images from visible light and thermal cameras
Sensors
Pytorch: an imperative style, high-performance deep learning library
NeurIPS
Pose-normalized image generation for person re-identification
ECCV
Dual attention matching network for context-aware feature sequence based person re-identification
CVPR
Cited by (33)
Cross-modality person re-identification based on intermediate modal generation
2024, Optics and Lasers in EngineeringRGB-T image analysis technology and application: A survey
2023, Engineering Applications of Artificial IntelligenceDeep learning for visible-infrared cross-modality person re-identification: A comprehensive review
2023, Information FusionCitation Excerpt :Specifically, AlignGAN first achieved the pixel-level alignment by generating the fake infrared images from the real visible images, and then matched the generated fake IR images and real IR images via a feature alignment module. After that, Zhang et al. [89] proposed a teacher–student GAN model (TS-GAN), which generated the fake IR images from existing visible images to reduce cross-modality variations and guide the extraction of discriminative person features. Differently, some works employ the way of generating fake visible images from the real infrared images for compensation.
Channel exchange and adversarial learning guided cross-modal person re-identification
2022, Knowledge-Based SystemsCross-modality disentanglement and shared feedback learning for infrared-visible person re-identification
2022, Knowledge-Based SystemsCitation Excerpt :However, they do not consider specific information in visible images, such as the colors of clothes and bags. The schemes proposed in literature [17,18] achieve a unified representation of images with two modalities. Choi et al. [19] aim to simultaneously disentangle ID-excluded factors and ID-discriminative factors from the attribute encoder to generate cross-modality images with different poses and illumination.