当前位置: X-MOL 学术IEEE Trans. Circ. Syst. Video Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Three-Dimension Transmissible Attention Network for Person Re-Identification
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2020-12-01 , DOI: 10.1109/tcsvt.2020.2977427
Yewen Huang , Sicheng Lian , Suian Zhang , Haifeng Hu , Dihu Chen , Tao Su

In this work, we propose a Three-Dimensional Transmissible Attention Network (3DTANet) for Person Re-Identification, which can transmit the attention information from layer to layer and attend to the person image from a three-dimensional perspective. Main contributions of the 3DTANet are: (i) A novel Transmissible Attention (TA) mechanism is introduced, which can transfer attention information between convolution layers. Different from traditional attention mechanism, not only can it convey accumulated attention information layer by layer but also guide the network to retain holistic attention information. (ii) We propose a Three-Dimension Attention (3DA) mechanism, which is capable of extracting a three-dimensional attention map. While previous researches on image attention mechanism extracts channel or spatial attention information separately, 3DA mechanism pays attention to channel and spatial information simultaneously, thereby making them play better complementary role in attention extraction. (iii) A new loss function named L2-norm Multi-labels Loss (L2ML) is applied to acquire higher recognition accuracy calculated by multi labels of same ID and corresponding feature representation. Quite different from the common loss functions, L2-norm Multi-labels Loss is specifically good at optimizing feature distance. In brief, 3DTANet gains two-fold benefit toward higher accuracy. For one thing, the attention information is informative and can be transmitted, feature being more representative. For another, our model is computationally lightweight and can be easily applied to real scenarios. We extensively conduct experiments on four Person Re-Identification benchmark datasets. Our model achieves rank-1 accuracy of 87.50% on CUHK03, 96.23% on Market-1501, 92.50% on DukeMTMC-reID and 76.60% on MSMT17-V2 respectively. The results confirm that the 3DTANet can extract more representative features and attain a higher recognition accuracy, outperforming the state-of-the-art methods.

中文翻译:

用于人员重新识别的三维可传递注意网络

在这项工作中,我们提出了一种用于行人重识别的三维可传递注意力网络(3DTANet),它可以将注意力信息逐层传输,并从三维角度关注人物图像。3DTANet 的主要贡献是: (i) 引入了一种新的可传递注意力 (TA) 机制,可以在卷积层之间传输注意力信息。与传统的注意力机制不同,它不仅可以逐层传递积累的注意力信息,还可以引导网络保留整体的注意力信息。(ii) 我们提出了一种三维注意力(3DA)机制,它能够提取三维注意力图。而以往的图像注意力机制研究分别提取通道或空间注意力信息,3DA机制同时关注通道和空间信息,从而使它们在注意力提取中起到更好的互补作用。(iii) 应用名为 L2-norm Multi-labels Loss (L2ML) 的新损失函数,通过相同 ID 和相应特征表示的多标签计算获得更高的识别精度。与常见的损失函数截然不同,L2-norm Multi-labels Loss 特别擅长优化特征距离。简而言之,3DTANet 在提高准确性方面获得了两倍的好处。一方面,注意力信息信息量大,可以传递,特征更具有代表性。另一方面,我们的模型在计算上是轻量级的,可以轻松应用于实际场景。我们对四个人员重新识别基准数据集进行了广泛的实验。我们的模型在 CUHK03 上达到 87.50%、在 Market-1501 上达到 96.23%、在 DukeMTMC-reID 上达到 92.50% 和 MSMT17-V2 上达到 76.60% 的 rank-1 准确率。结果证实,3DTANet 可以提取更具代表性的特征并获得更高的识别准确率,优于最先进的方法。
更新日期:2020-12-01
down
wechat
bug