当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VTBR: Semantic-based Pretraining for Person Re-Identification
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-10-11 , DOI: arxiv-2110.05074
Suncheng Xiang, Zirui Zhang, Mengyuan Guan, Hao Chen, Binjie Yan, Ting Liu, Yuzhuo Fu

Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that ImageNet pretraining has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, we propose a pure semantic-based pretraining approach named VTBR, which uses dense captions to learn visual representations with fewer images. Specifically, we train convolutional networks from scratch on the captions of FineGPR-C dataset, and transfer them to downstream Re-ID tasks. Comprehensive experiments conducted on benchmarks show that our VTBR can achieve competitive performance compared with ImageNet pretraining -- despite using up to 1.4x fewer images, revealing its potential in Re-ID pretraining.

中文翻译:

VTBR:基于语义的人员重新识别预训练

预训练是计算机视觉中的主要范式。通常,监督式 ImageNet 预训练通常用于初始化人员重新识别 (Re-ID) 模型的主干。然而,最近的工作显示了一个令人惊讶的结果,即 ImageNet 预训练对 Re-ID 系统的影响有限,因为 ImageNet 和个人 Re-ID 数据之间的领域差距很大。为了寻求传统预训练的替代方案,我们首次针对人员 Re-ID 事件手动构建了多样化的 FineGPR-C 字幕数据集。在此基础上,我们提出了一种名为 VTBR 的纯基于语义的预训练方法,该方法使用密集字幕来学习具有较少图像的视觉表示。具体来说,我们在 FineGPR-C 数据集的标题上从头开始训练卷积网络,并将它们转移到下游 Re-ID 任务。
更新日期:2021-10-12
down
wechat
bug