当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-19 , DOI: arxiv-2101.08238
Ammarah Farooq, Muhammad Awais, Josef Kittler, Syed Safwan Khalid

Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align inter-modality representations according to semantic information present for a person and ignore background information. In this work, we present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations. The underlying building block consists of multiple streams of feature maps coming from visual and textual modalities and a novel learnable context sharing semantic alignment network. We also propose complementary intra modal attention learning mechanisms to focus on more fine-grained local details in the features along with a cross-modal affinity loss for robust feature matching. Our design is unique in its ability to implicitly learn feature alignments from data. The entire AXM-Net can be trained in an end-to-end manner. We report results on both person search and cross-modal Re-ID tasks. Extensive experimentation validates the proposed framework and demonstrates its superiority by outperforming the current state-of-the-art methods by a significant margin.

中文翻译:

AXM-Net:用于人员Re-ID的跨模式上下文共享注意网络

跨模式人员重新识别(Re-ID)对于现代视频监控系统至关重要。关键的挑战是根据呈现给人的语义信息来对齐模式间的表示,并忽略背景信息。在这项工作中,我们介绍了AXM-Net,这是一种新颖的基于CNN的体系结构,旨在学习语义对齐的视觉和文本表示形式。底层构建块由来自视觉和文本模式的多个特征图流以及一个新颖的可学习上下文共享语义对齐网络组成。我们还提出了互补的模态内注意学习机制,以集中于特征中更细粒度的局部细节以及用于健壮特征匹配的跨模态亲和力损失。我们的设计具有独特的能力,可以从数据中隐式学习特征对齐。整个AXM-Net可以端到端的方式进行培训。我们报告有关人员搜索和跨模式Re-ID任务的结果。广泛的实验验证了所提出的框架,并通过以明显的优势胜过当前的最新方法来证明其优越性。
更新日期:2021-01-21
down
wechat
bug