当前位置: X-MOL 学术Signal Process. Image Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comprehensive feature fusion mechanism for video-based person re-identification via significance-aware attention
Signal Processing: Image Communication ( IF 3.4 ) Pub Date : 2020-03-19 , DOI: 10.1016/j.image.2020.115835
Lin Chen , Hua Yang , Zhiyong Gao

Video-based person re-identification (Re-ID) is of important capability for artificial intelligence and human–computer interaction. The spatial and temporal features play indispensable roles to comprehensively represent the person sequences. In this paper, we propose a comprehensive feature fusion mechanism (CFFM) for video-based Re-ID. We use multiple significance-aware attention to learn attention-based spatial–temporal feature fusion to better represent the person sequences. Specifically, CFFM consists of spatial attention, periodic attention, significance attention and residual learning. The spatial attention and periodic attention aim to respectively make the system focus on more useful spatial feature extracted by CNN and temporal feature extracted by the recurrent networks. The significance attention is to measure the two features that contribute to the sequence representation. Then the residual learning plays between the spatial and temporal features based on the significance scores for final significance-aware feature fusion. We apply our approach to different representative state-of-the-art networks, proposing several improved networks for improving the video-based Re-ID task. We conduct extensive experimental results on the widely utilized datasets, PRID-2011, i-LIDS-VID and MARS, for the video-based Re-ID task. Results show that the improved networks perform favorably against existing approaches, demonstrating the effectiveness of our proposed CFFM for comprehensive feature fusion. Furthermore, we compare the performance of different modules in CFFM, investigating the varied significance of the different networks, features and sequential feature aggregation modes.



中文翻译:

基于重要性感知的基于视频的人员重新识别的综合特征融合机制

基于视频的人员重新识别(Re-ID)对于人工智能和人机交互具有重要的功能。时空特征在全面代表人物序列方面起着不可或缺的作用。在本文中,我们针对基于视频的Re-ID提出了一种全面的特征融合机制(CFFM)。我们使用多个有意义的注意力来学习基于注意力的时空特征融合,以更好地表示人物序列。具体而言,CFFM由空间注意力,周期性注意力,重要性注意力和残差学习组成。空间注意力和周期性注意力旨在分别使系统专注于CNN提取的更有用的空间特征和递归网络提取的时间特征。重要的注意事项是测量有助于序列表示的两个特征。然后,基于最终得分的感知特征融合的重要性得分,残差学习在空间和时间特征之间发挥作用。我们将我们的方法应用于不同的代表性最新技术网络,并提出了几种改进的网络来改进基于视频的Re-ID任务。我们针对基于视频的Re-ID任务,在广泛使用的数据集PRID-2011,i-LIDS-VID和MARS上进行了广泛的实验结果。结果表明,改进的网络与现有方法相比具有良好的性能,证明了我们提出的CFFM对于全面特征融合的有效性。此外,我们比较了CFFM中不同模块的性能,

更新日期:2020-03-22
down
wechat
bug