Robust Visual Tracking via Hierarchical Convolutional Features,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust Visual Tracking via Hierarchical Convolutional Features
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2018-08-13 , DOI: 10.1109/tpami.2018.2865311
Chao Ma , Jia-Bin Huang , Xiaokang Yang , Ming-Hsuan Yang

Visual tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we propose to exploit the rich hierarchical features of deep convolutional neural networks to improve the accuracy and robustness of visual tracking. Deep neural networks trained on object recognition datasets consist of multiple convolutional layers. These layers encode target appearance with different levels of abstraction. For example, the outputs of the last convolutional layers encode the semantic information of targets and such representations are invariant to significant appearance variations. However, their spatial resolutions are too coarse to precisely localize the target. In contrast, features from earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchical features of convolutional layers as a nonlinear counterpart of an image pyramid representation and explicitly exploit these multiple levels of abstraction to represent target objects. Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance. We infer the maximum response of each layer to locate targets in a coarse-to-fine manner. To further handle the issues with scale estimation and re-detecting target objects from tracking failures caused by heavy occlusion or out-of-the-view movement, we conservatively learn another correlation filter, that maintains a long-term memory of target appearance, as a discriminative classifier. We apply the classifier to two types of object proposals: (1) proposals with a small step size and tightly around the estimated location for scale estimation; and (2) proposals with large step size and across the whole image for target re-detection. Extensive experimental results on large-scale benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art tracking methods.

中文翻译：

通过分层卷积功能进行可靠的视觉跟踪

目视跟踪具有挑战性，因为目标对象经常会因变形，突然运动，背景混乱和遮挡而导致外观发生重大变化。在本文中，我们建议利用深层卷积神经网络的丰富层次结构特征来提高视觉跟踪的准确性和鲁棒性。在对象识别数据集上训练的深度神经网络由多个卷积层组成。这些层使用不同的抽象级别对目标外观进行编码。例如，最后卷积层的输出对目标的语义信息进行编码，并且这种表示对于显着的外观变化是不变的。但是，它们的空间分辨率太粗糙，无法精确定位目标。相比之下，早期卷积层的特征提供了更精确的定位，但外观变化不变。我们将卷积层的层次特征解释为图像金字塔表示的非线性对应物，并明确地利用这些抽象的多个层次来表示目标对象。具体来说，我们在每个卷积层的输出上学习自适应相关滤波器，以对目标外观进行编码。我们推断每层以从粗到精的方式定位目标的最大响应。为了进一步处理比例估计问题，并通过跟踪重度遮挡或视线范围外的运动而从跟踪故障中重新检测目标对象，我们保守地学习了另一个相关过滤器，该过滤器可以长期保存目标外观，如下判别式分类器。我们将分类器应用于两种类型的对象提议：（1）步长较小且紧紧围绕估计位置进行规模估计的提议；（2）步长较大且在整个图像上用于目标重新检测的建议。在大规模基准数据集上的大量实验结果表明，所提出的算法相对于最新的跟踪方法具有良好的性能。

更新日期：2019-10-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>