Abstract
Visible thermal person re-identification, also known as RGB-infrared person re-identification, is an emerging cross-modality searching problem that identifies the same person from different modalities. To solve this problem, it is necessary to know what a person looks like in different modalities. Images of the same person at the same time from the same camera view in both modalities should be captured, so that similarities and differences could be discovered. However, existing datasets do not completely satisfy those requirements. Thus, a modality-transfer generative adversarial network is proposed to generate a cross-modality counterpart for a source image in the target modality, obtaining paired images for the same person. Given that query images are from one modality and gallery images are from another modality, it is necessary to produce a unified representation for both modalities so cross-modality matching could be performed. In this study, a novel dual-level unified latent representation is proposed for visible thermal person re-identification task, including an image-level patch fusion strategy and a feature-level hierarchical granularity triplet loss, producing a more general and robust unified feature embedding. Extensive experiments on both the SYSU-MM01 dataset (with visible and near-infrared images) and the RegDB dataset (with visible and far-infrared images) demonstrate the efficiency and generality of the proposed method, which achieves state-of-the-art performance. The code will be publicly released.
Similar content being viewed by others
References
Bai, X., Yang, M., Huang, T., Dou, Z., Yu, R., Xu, Y.: Deep-person: learning discriminative deep features for person re-identification. arXiv:1711.10658 (2017)
Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Dai, P., Ji, R., Wang, H., Wu, Q., Huang, Y.: Cross-modality person re-identification with generative adversarial training. In: IJCAI, pp. 677–683 (2018)
Deng, W., Zheng, L., Ye, Q., Kang, G., Yang, Y., Jiao, J.: Image–image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48, 2993–3003 (2015)
Fan, X., Jiang, W., Luo, H., Fei, M.: SphereReID: Deep hypersphere manifold embedding for person re-identification. ArXiv e-prints (2018)
Ge, Y., Li, Z., Zhao, H., Yin, G., Yi, S., Wang, X., Li, H.: FD-GAN: pose-guided feature distilling GAN for robust person re-identification. In: Advances in Neural Information Processing Systems (2018)
Giachetti, A., Isaia, L., Garro, V.: Multiscale descriptors and metric learning for human body shape retrieval. Vis. Comput. 32(6–8), 693–703 (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Hao, Y., Wang, N., Li, J., Gao, X.: HSME: hypersphere manifold embedding for visible thermal person re-identification. Proc. AAAI Conf. Artif. Intell. 33, 8385–8392 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hermans, A., Beyer, L., Leibe, B.: Defense of the triplet loss for person re-identification. arXiv:1703.07737 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Hong, K., Hong, S.: Real-time stress assessment using thermal imaging. Vis. Comput. 32(11), 1369–1377 (2016)
Hou, X.N., Ding, S.H., Ma, L.Z., Wang, C.J., Li, J.L., Huang, F.Y.: Similarity metric learning for face verification using sigmoid decision function. Vis. Comput. 32(4), 479–490 (2016)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, Koray: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Kabbai, L., Abdellaoui, M., Douik, A.: Image classification by combining local and global features. Vis. Comput. 35(5), 679–693 (2019)
Khamis, S., Kuo, C.H., Singh, V.K., Shet, V.D., Davis, L.S.: Joint learning for attribute-consistent person re-identification. In: European Conference on Computer Vision (ECCV), pp. 134–146 (2014)
Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Li, J., Yang, B., Yang, W., Sun, C., Xu, J.: Subspace-based multi-view fusion for instance-level image retrieval. Vis. Comput. (2020)
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., Hu, J.: Pose transferrable person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
Nguyen, D., Hong, H., Kim, K., Park, K.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors 17(3), 605 (2017)
Qian, X., Fu, Y., Xiang, T., Wang, W., Qiu, J., Wu, Y., Jiang, Y.G., Xue, X.: Pose-normalized image generation for person re-identification. In: The European Conference on Computer Vision (ECCV) (2018)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015)
Su, C., Li, J., Zhang, S., Xing, J., Gao, W., Tian, Q.: Pose-driven deep convolutional model for person re-identification. In: ICCV (2017)
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling. In: The European Conference on Computer Vision (ECCV) (2018)
Varior, R.R., Haloi, M., Wang, G.: Gated siamese convolutional neural network architecture for human re-identification. In: European Conference on Computer Vision (ECCV), pp. 791–808 (2016)
Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: European Conference on Computer Vision (ECCV), pp. 135–153 (2016)
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: ACM Multimedia Conference on Multimedia (2018)
Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., Hou, Z.: RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Wei, L., Zhang, S., Gao, W., Tian, Q.: Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. ArXiv e-prints (2017)
Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: GLAD: global-local-alignment descriptor for pedestrian retrieval. In: ACM MM (2017)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Wu, A., Zheng, W.S., Yu, H.X., Gong, S., Lai, J.: RGB-infrared cross-modality person re-identification. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Xia, B.N., Gong, Y., Zhang, Y., Poellabauer, C.: Second-order non-local attention networks for person re-identification. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Ye, M., Lan, X., Li, J., Yuen, P.C.: Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Ye, M., Wang, Z., Lan, X., Yuen, P.C.: Visible thermal person re-identification via dual-constrained top-ranking. In: IJCAI, pp. 1092–1099 (2018)
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 34–39 (2014)
Zhang, X., Luo, H., Fan, X., Xiang, W., Sun, Y., Xiao, Q., Jiang, W., Zhang, C., Sun, J.: AlignedReID: surpassing human-level performance in person re-identification. arXiv:1711.08184 (2017)
Zhao, H., Tian, M., Sun, S., Shao, J., Yan, J., Yi, S., Wang, X., Tang, X.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Zheng, F., Deng, C., Sun, X., Jiang, X., Guo, X., Yu, Z., Huang, F., Ji, R.: Pyramidal person re-identification via multi-loss dynamic training. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: a video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (ECCV) (2016)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv:1610.02984 (2016)
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. ArXiv e-prints arXiv:1708.04896 (2017)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Funding
This work is supported by the National Natural Science Foundation of China (No. 61633019), the Public Projects of Zhejiang Province (No. LGF18F030002), and the Science Foundation of Chinese Aerospace Industry (JCKY2018204B053).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, X., Jiang, W., Luo, H. et al. Modality-transfer generative adversarial network and dual-level unified latent representation for visible thermal Person re-identification. Vis Comput 38, 279–294 (2022). https://doi.org/10.1007/s00371-020-02015-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-020-02015-z