当前位置:
X-MOL 学术
›
arXiv.cs.CV
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-19 , DOI: arxiv-2101.08238 Ammarah Farooq, Muhammad Awais, Josef Kittler, Syed Safwan Khalid
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-01-19 , DOI: arxiv-2101.08238 Ammarah Farooq, Muhammad Awais, Josef Kittler, Syed Safwan Khalid
Cross-modal person re-identification (Re-ID) is critical for modern video
surveillance systems. The key challenge is to align inter-modality
representations according to semantic information present for a person and
ignore background information. In this work, we present AXM-Net, a novel CNN
based architecture designed for learning semantically aligned visual and
textual representations. The underlying building block consists of multiple
streams of feature maps coming from visual and textual modalities and a novel
learnable context sharing semantic alignment network. We also propose
complementary intra modal attention learning mechanisms to focus on more
fine-grained local details in the features along with a cross-modal affinity
loss for robust feature matching. Our design is unique in its ability to
implicitly learn feature alignments from data. The entire AXM-Net can be
trained in an end-to-end manner. We report results on both person search and
cross-modal Re-ID tasks. Extensive experimentation validates the proposed
framework and demonstrates its superiority by outperforming the current
state-of-the-art methods by a significant margin.
中文翻译:
AXM-Net:用于人员Re-ID的跨模式上下文共享注意网络
跨模式人员重新识别(Re-ID)对于现代视频监控系统至关重要。关键的挑战是根据呈现给人的语义信息来对齐模式间的表示,并忽略背景信息。在这项工作中,我们介绍了AXM-Net,这是一种新颖的基于CNN的体系结构,旨在学习语义对齐的视觉和文本表示形式。底层构建块由来自视觉和文本模式的多个特征图流以及一个新颖的可学习上下文共享语义对齐网络组成。我们还提出了互补的模态内注意学习机制,以集中于特征中更细粒度的局部细节以及用于健壮特征匹配的跨模态亲和力损失。我们的设计具有独特的能力,可以从数据中隐式学习特征对齐。整个AXM-Net可以端到端的方式进行培训。我们报告有关人员搜索和跨模式Re-ID任务的结果。广泛的实验验证了所提出的框架,并通过以明显的优势胜过当前的最新方法来证明其优越性。
更新日期:2021-01-21
中文翻译:
AXM-Net:用于人员Re-ID的跨模式上下文共享注意网络
跨模式人员重新识别(Re-ID)对于现代视频监控系统至关重要。关键的挑战是根据呈现给人的语义信息来对齐模式间的表示,并忽略背景信息。在这项工作中,我们介绍了AXM-Net,这是一种新颖的基于CNN的体系结构,旨在学习语义对齐的视觉和文本表示形式。底层构建块由来自视觉和文本模式的多个特征图流以及一个新颖的可学习上下文共享语义对齐网络组成。我们还提出了互补的模态内注意学习机制,以集中于特征中更细粒度的局部细节以及用于健壮特征匹配的跨模态亲和力损失。我们的设计具有独特的能力,可以从数据中隐式学习特征对齐。整个AXM-Net可以端到端的方式进行培训。我们报告有关人员搜索和跨模式Re-ID任务的结果。广泛的实验验证了所提出的框架,并通过以明显的优势胜过当前的最新方法来证明其优越性。