Adaptive Feature Fusion for Visual Object Tracking,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptive Feature Fusion for Visual Object Tracking
Pattern Recognition ( IF 8 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.patcog.2020.107679
Shaochuan Zhao , Tianyang Xu , Xiao-Jun Wu , Xue-Feng Zhu

Abstract Recent advanced trackers, consisting of discriminative classification component and dedicated bounding box estimation, have achieved improved performance in the visual tracking community. The most essential factor for the development is the utilization of different Convolutional Neural Networks (CNNs), which significantly improves the model capacity via offline trained deep feature representations. Though powerful deep structures emphasize more semantic appearance through high dimensional latent variables, how to achieve effective feature adaptation in the online tracking stage has not been sufficiently considered yet. To this end, we argue the necessity of exploring hierarchical and complementary appearance descriptors from different convolutional layers to achieve online tracking adaptation. Therefore, in this paper, we propose an adaptive feature fusion mechanism, which can balance the detection granularities from shallow to deep convolutional layers. To be specific, the correlation between template and instance is employed to generate adaptive weights to achieve advanced saliency and discrimination. In addition, considering temporal appearance variation, the projection matrix for the multi-channel inputs is jointly updated with the correlation classifier to further enhance the robustness. The experimental results on four recent benchmarks, i.e., OTB-2015, VOT2018, LaSOT and TrackingNet, demonstrate the effectiveness and robustness of the proposed method, with superior performance compared to the state-of-the-art approaches.

中文翻译：

视觉对象跟踪的自适应特征融合

摘要最近的高级跟踪器，由判别分类组件和专用边界框估计组成，在视觉跟踪社区中取得了改进的性能。发展的最重要因素是利用不同的卷积神经网络 (CNN)，通过离线训练的深度特征表示显着提高了模型容量。尽管强大的深层结构通过高维潜在变量强调更多的语义外观，但尚未充分考虑如何在在线跟踪阶段实现有效的特征适应。为此，我们认为有必要从不同的卷积层探索分层和互补的外观描述符，以实现在线跟踪适应。因此，在本文中，我们提出了一种自适应特征融合机制，可以平衡从浅到深卷积层的检测粒度。具体而言，利用模板和实例之间的相关性来生成自适应权重，以实现高级显着性和判别性。此外，考虑到时间外观变化，多通道输入的投影矩阵与相关分类器联合更新，以进一步增强鲁棒性。最近四个基准的实验结果，即 OTB-2015、VOT2018、LaSOT 和 TrackingNet，证明了所提出方法的有效性和鲁棒性，与最先进的方法相比具有优越的性能。模板和实例之间的相关性用于生成自适应权重，以实现高级显着性和区分度。此外，考虑到时间外观变化，多通道输入的投影矩阵与相关分类器联合更新，以进一步增强鲁棒性。最近四个基准的实验结果，即 OTB-2015、VOT2018、LaSOT 和 TrackingNet，证明了所提出方法的有效性和鲁棒性，与最先进的方法相比具有优越的性能。模板和实例之间的相关性用于生成自适应权重，以实现高级显着性和区分度。此外，考虑到时间外观变化，多通道输入的投影矩阵与相关分类器联合更新，以进一步增强鲁棒性。最近四个基准的实验结果，即 OTB-2015、VOT2018、LaSOT 和 TrackingNet，证明了所提出方法的有效性和鲁棒性，与最先进的方法相比具有优越的性能。多通道输入的投影矩阵与相关分类器联合更新，以进一步增强鲁棒性。最近四个基准的实验结果，即 OTB-2015、VOT2018、LaSOT 和 TrackingNet，证明了所提出方法的有效性和鲁棒性，与最先进的方法相比具有优越的性能。多通道输入的投影矩阵与相关分类器联合更新，以进一步增强鲁棒性。最近四个基准的实验结果，即 OTB-2015、VOT2018、LaSOT 和 TrackingNet，证明了所提出方法的有效性和鲁棒性，与最先进的方法相比具有优越的性能。

更新日期：2021-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>