当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Person Re-Identification Models From Videos With Weak Supervision
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2021-02-11 , DOI: 10.1109/tip.2021.3056223
Xueping Wang , Min Liu , Dripta S. Raychaudhuri , Sujoy Paul , Yaonan Wanga , Amit K. Roy-Chowdhury

Most person re-identification methods, being supervised techniques, suffer from the burden of massive annotation requirement. Unsupervised methods overcome this need for labeled data, but perform poorly compared to the supervised alternatives. In order to cope with this issue, we introduce the problem of learning person re-identification models from videos with weak supervision. The weak nature of the supervision arises from the requirement of video-level labels, i.e. person identities who appear in the video, in contrast to the more precise frame-level annotations. Towards this goal, we propose a multiple instance attention learning framework for person re-identification using such video-level labels. Specifically, we first cast the video person re-identification task into a multiple instance learning setting, in which person images in a video are collected into a bag. The relations between videos with similar labels can be utilized to identify persons, on top of that, we introduce a co-person attention mechanism which mines the similarity correlations between videos with person identities in common. The attention weights are obtained based on all person images instead of person tracklets in a video, making our learned model less affected by noisy annotations. Extensive experiments demonstrate the superiority of the proposed method over the related methods on two weakly labeled person re-identification datasets.

中文翻译:

监督不力的视频中的学习者重新识别模型

作为监督技术的大多数人重新识别方法都承受着大量注释需求的负担。无监督的方法克服了对标记数据的这种需求,但是与有监督的替代方法相比效果较差。为了解决这个问题,我们引入了从弱监督视频中学习人的重新识别模型的问题。监督的薄弱性源于视频级标签的要求,IE与更精确的帧级注释相比,视频中出现的人物身份。为了实现这一目标,我们提出了一个多实例注意力学习框架,用于使用此类视频级标签进行人员重新识别。具体来说,我们首先将视频人物重新识别任务投放到多实例学习设置中,其中将视频中的人物图像收集到袋子中。具有相似标签的视频之间的关系可以用来识别人员,最重要的是,我们引入了共同人注意机制,该机制挖掘了具有共同身份的视频之间的相似性相关性。注意力权重是基于所有人物图像而不是视频中的人物轨迹获得的,从而使我们学习的模型受到噪声注释的影响较小。
更新日期:2021-02-19
down
wechat
bug