CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CM-NAS: Rethinking Cross-Modality Neural Architectures for Visible-Infrared Person Re-Identification
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-21 , DOI: arxiv-2101.08467
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, Ran He

Visible-Infrared person re-identification (VI-ReID) aims at matching cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment. In order to mitigate the impact of large modality discrepancy, existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations. Such a manual design routine, however, highly depends on massive experiments and empirical practice, which is time consuming and labor intensive. In this paper, we systematically study the manually designed architectures, and identify that appropriately splitting Batch Normalization (BN) layers to learn modality-specific representations will bring a great boost towards cross-modality matching. Based on this observation, the essential objective is to find the optimal splitting scheme for each BN layer. To this end, we propose a novel method, named Cross-Modality Neural Architecture Search (CM-NAS). It consists of a BN-oriented search space in which the standard optimization can be fulfilled subject to the cross-modality task. Besides, in order to better guide the search process, we further formulate a new Correlation Consistency based Class-specific Maximum Mean Discrepancy (C3MMD) loss. Apart from the modality discrepancy, it also concerns the similarity correlations, which have been overlooked before, in the two modalities. Resorting to these advantages, our method outperforms state-of-the-art counterparts in extensive experiments, improving the Rank-1/mAP by 6.70%/6.13% on SYSU-MM01 and 12.17%/11.23% on RegDB. The source code will be released soon.

中文翻译：

CM-NAS：对可视红外人员重新识别的跨模态神经体系结构的重新思考

可见光人重新识别（VI-ReID）旨在匹配跨模态行人图像，突破了黑暗环境中单模态人ReID的局限性。为了减轻大模态差异的影响，现有作品手动设计了各种两流体系结构，以分别学习特定于模态和模态可共享的表示形式。但是，这样的手动设计例程在很大程度上取决于大量的实验和经验实践，这既费时又费力。在本文中，我们系统地研究了手动设计的体系结构，并确定适当地拆分批归一化（BN）层以学习特定于模态的表示形式将极大地促进跨模态匹配。基于这一观察，基本目的是为每个BN层找到最佳的分割方案。为此，我们提出了一种新颖的方法，称为跨模态神经体系结构搜索（CM-NAS）。它由面向BN的搜索空间组成，在其中可以根据交叉模式任务来实现标准优化。此外，为了更好地指导搜索过程，我们进一步制定了一个新的基于相关一致性的特定于类的最大平均差异（C3MMD）损失。除了模态差异外，它还涉及两种模态中以前被忽略的相似性相关性。利用这些优势，我们的方法在广泛的实验中胜过了最新技术，在SYSU-MM01上将Rank-1 / mAP提高了6.70％/ 6.13％，在RegDB上提高了12.17％/ 11.23％。源代码将很快发布。

更新日期：2021-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文