Self-Supervised Modality-Aware Multiple Granularity Pre-Training for RGB-Infrared Person Re-Identification,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Supervised Modality-Aware Multiple Granularity Pre-Training for RGB-Infrared Person Re-Identification
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2023-05-12 , DOI: 10.1109/tifs.2023.3273911
Lin Wan ₁ , Qianyan Jing ₁ , Zongyuan Sun ₁ , Chuang Zhang ₂ , Zhihang Li ₃ , Yehansen Chen ₁

Affiliation

RGB-Infrared person re-identification (RGB-IR ReID) aims to associate people across disjoint RGB and IR camera views. Currently, state-of-the-art performance of RGB-IR ReID is not as impressive as that of conventional ReID. Much of that is due to the notorious modality bias training issue brought by the single-modality ImageNet pre-training, which might yield RGB-biased representations that severely hinder the cross-modality image retrieval. This paper makes first attempt to tackle the task from a pre-training perspective. We propose a self-supervised pre-training solution, named Modality-Aware Multiple Granularity Learning (MMGL), which directly trains models from scratch only on multi-modal ReID datasets, but achieving competitive results against ImageNet pre-training, without using any external data or sophisticated tuning tricks. First, we develop a simple-but-effective 'permutation recovery' pretext task that globally maps shuffled RGB-IR images into a shared latent permutation space, providing modality-invariant global representations for downstream ReID tasks. Second, we present a part-aware cycle-contrastive (PCC) learning strategy that utilizes cross-modality cycle-consistency to maximize agreement between semantically similar RGB-IR image patches. This enables contrastive learning for the unpaired multi-modal scenarios, further improving the discriminability of local features without laborious instance augmentation. Based on these designs, MMGL effectively alleviates the modality bias training problem. Extensive experiments demonstrate that it learns better representations (+8.03% Rank-1 accuracy) with faster training speed (converge only in few hours) and higher data efficiency (< 5% data size) than ImageNet pre-training. The results also suggest it generalizes well to various existing models, losses and has promising transferability across datasets. The code will be released at https://github.com/hansonchen1996/MMGL .

中文翻译：

用于 RGB 红外行人再识别的自监督模态感知多粒度预训练

RGB-Infrared person re-identification (RGB-IR ReID) 旨在将不相交的 RGB 和 IR 摄像机视图中的人关联起来。目前，RGB-IR ReID 的最先进性能不如传统 ReID 令人印象深刻。这在很大程度上是由于单模态 ImageNet 预训练带来的臭名昭著的模态偏差训练问题，这可能会产生严重阻碍跨模态图像检索的 RGB 偏差表示。本文首先尝试从预训练的角度来解决这个任务。我们提出了一种自监督预训练解决方案，名为模态感知多粒度学习 (MMGL)，它仅在多模态 ReID 数据集上直接从头开始训练模型，但在不使用任何外部方法的情况下取得了与 ImageNet 预训练相媲美的结果数据或复杂的调整技巧。第一的，我们开发了一个简单但有效的“排列恢复”借口任务，将打乱的 RGB-IR 图像全局映射到共享的潜在排列空间，为下游 ReID 任务提供模态不变的全局表示。其次，我们提出了一种部分感知循环对比 (PCC) 学习策略，该策略利用跨模态循环一致性来最大化语义相似的 RGB-IR 图像块之间的一致性。这使得能够针对未配对的多模式场景进行对比学习，进一步提高局部特征的可辨别性，而无需费力的实例增强。基于这些设计，MMGL 有效地缓解了模态偏差训练问题。广泛的实验表明它学习了更好的表征（+8. 03% Rank-1 准确率）比 ImageNet 预训练具有更快的训练速度（仅在几个小时内收敛）和更高的数据效率（< 5% 数据大小）。结果还表明它可以很好地泛化到各种现有模型、损失并且具有跨数据集的可移植性。该代码将在https://github.com/hansonchen1996/MMGL .

更新日期：2023-05-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>