Learning attention for object tracking with adversarial learning network,EURASIP Journal on Image and Video Processing

当前位置： X-MOL 学术 › EURASIP J. Image Video Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning attention for object tracking with adversarial learning network
EURASIP Journal on Image and Video Processing ( IF 2.0 ) Pub Date : 2020-11-11 , DOI: 10.1186/s13640-020-00535-1
Xu Cheng , Chen Song , Yongxiang Gu , Beijing Chen

Artificial intelligence has been widely studied on solving intelligent surveillance analysis and security problems in recent years. Although many multimedia security approaches have been proposed by using deep learning network model, there are still some challenges on their performances which deserve in-depth research. On the one hand, high computational complexity of current deep learning methods makes it hard to be applied to real-time scenario. On the other hand, it is difficult to obtain the specific features of a video by fine-tuning the network online with the object state of the first frame, which fails to capture rich appearance variations of the object. To solve above two issues, in this paper, an effective object tracking method with learning attention is proposed to achieve the object localization and reduce the training time in adversarial learning framework. First, a prediction network is designed to track the object in video sequences. The object positions of the first ten frames are employed to fine-tune prediction network, which can fully mine a specific features of an object. Second, the prediction network is integrated into the generative adversarial network framework, which randomly generates masks to capture object appearance variations via adaptively dropout input features. Third, we present a spatial attention mechanism to improve the tracking performance. The proposed network can identify the mask that maintains the most robust features of the objects over a long temporal span. Extensive experiments on two large-scale benchmarks demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.

中文翻译：

通过对抗性学习网络学习目标跟踪的注意力

近年来，在解决智能监视分析和安全性问题方面，人们对人工智能进行了广泛的研究。尽管已经通过使用深度学习网络模型提出了许多多媒体安全方法，但是它们的性能仍然存在一些挑战，值得深入研究。一方面，当前深度学习方法的高计算复杂度使其难以应用于实时场景。另一方面，难以通过利用第一帧的对象状态在线微调网络来获得视频的特定特征，这无法捕获对象的丰富外观变化。为了解决以上两个问题，本文中，提出了一种有效的具有学习注意力的目标跟踪方法，以实现目标定位并减少对抗学习框架中的训练时间。首先，将预测网络设计为跟踪视频序列中的对象。前十帧的对象位置用于微调预测网络，该网络可以充分挖掘对象的特定特征。其次，将预测网络集成到生成的对抗网络框架中，该框架随机生成蒙版，以通过自适应辍学输入特征捕获对象外观变化。第三，我们提出了一种空间注意力机制来提高跟踪性能。所提出的网络可以识别在较长的时间范围内保持对象最鲁棒特征的蒙版。

更新日期：2020-11-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11