当前位置: X-MOL 学术IEEE Trans. Circ. Syst. Video Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical Temporal Modeling with Mutual Distance Matching for Video Based Person Re-Identification
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2021-02-01 , DOI: 10.1109/tcsvt.2020.2988034
Peike Li , Pingbo Panb , Ping Liuc , Mingliang Xu , Yi Yang

Comparing to image-based person re-identification (re-ID) problems, video-based person re-ID can take advantage of more cues from appearance and temporal information, and therefore receives widespread attention recently. However, due to the different pose, occlusion, misalignment and multi-granularity in video sequences, those consequent inter-sequence variations and intra-sequence variations, inevitably makes the feature learning and matching in videos more difficult. Under this circumstance, it is necessary to design an effective discriminative representation learning mechanism, as well as a matching solution, to tackle these variations in video-based person re-ID. To this end, this paper introduces a multi-granularity temporal convolution network and a mutual distance matching measurement, aiming at alleviating the intra-sequence variation and the inter-sequence variation, respectively. Particularly, in the feature learning stage, we model different temporal granularities by hierarchically stacking temporal convolution blocks with different dilation factors. In the feature matching stage, we propose a clip-level probe-gallery mutual distance measurement and consider the most convincing clip pairs by top-k selection. We validate that our proposed method can achieve state-of-the-art results on three video-based person re-ID benchmarks, more than that, we conduct extensive ablation study to demonstrate conciseness and effectiveness of our method in video re-ID tasks.

中文翻译:

用于基于视频的行人重识别的具有相互距离匹配的分层时间建模

与基于图像的行人重识别 (re-ID) 问题相比,基于视频的行人重识别可以利用更多来自外观和时间信息的线索,因此最近受到广泛关注。然而,由于视频序列中的不同姿态、遮挡、错位和多粒度,那些随之而来的序列间变化和序列内变化,不可避免地使视频中的特征学习和匹配变得更加困难。在这种情况下,有必要设计一种有效的判别表示学习机制以及匹配解决方案,以解决基于视频的行人重识别中的这些变化。为此,本文引入了多粒度时间卷积网络和相互距离匹配测量,旨在分别减轻序列内变异和序列间变异。特别是在特征学习阶段,我们通过分层堆叠具有不同膨胀因子的时间卷积块来对不同的时间粒度进行建模。在特征匹配阶段,我们提出了剪辑级探针-画廊相互距离测量,并通过 top-k 选择考虑最有说服力的剪辑对。我们验证了我们提出的方法可以在三个基于视频的行人 re-ID 基准测试中获得最先进的结果,除此之外,我们还进行了广泛的消融研究,以证明我们的方法在视频 re-ID 任务中的简洁性和有效性. 我们通过分层堆叠具有不同膨胀因子的时间卷积块来模拟不同的时间粒度。在特征匹配阶段,我们提出了剪辑级探针-画廊相互距离测量,并通过 top-k 选择考虑最有说服力的剪辑对。我们验证了我们提出的方法可以在三个基于视频的行人 re-ID 基准测试中获得最先进的结果,除此之外,我们还进行了广泛的消融研究,以证明我们的方法在视频 re-ID 任务中的简洁性和有效性. 我们通过分层堆叠具有不同膨胀因子的时间卷积块来模拟不同的时间粒度。在特征匹配阶段,我们提出了剪辑级探针-画廊相互距离测量,并通过 top-k 选择考虑最有说服力的剪辑对。我们验证了我们提出的方法可以在三个基于视频的行人 re-ID 基准测试中获得最先进的结果,除此之外,我们还进行了广泛的消融研究,以证明我们的方法在视频 re-ID 任务中的简洁性和有效性.
更新日期:2021-02-01
down
wechat
bug