Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2023-02-24 , DOI: 10.1109/tip.2023.3246800
Tianyang Xu ₁ , Zhenhua Feng ₂ , Xiao-Jun Wu ₃ , Josef Kittler ₂

Affiliation

Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tracking performance as compared with the state-of-the-art tracking methods.

中文翻译：

通过独立的目标不可知检测和有效的 Siamese 跨任务交互实现稳健的视觉对象跟踪

高级 Siamese 视觉对象跟踪架构使用成对输入图像进行联合训练，以执行目标分类和边界框回归。他们在最近的基准测试和比赛中取得了可喜的成绩。然而，现有方法存在两个局限性：首先，虽然连体结构可以估计实例帧中的目标状态，但前提是目标外观与模板没有太大偏差，无法保证图像中目标的检测在存在严重的外观变化的情况下。其次，尽管分类和回归任务共享主干网络的相同输出，但它们的特定模块和损失函数总是独立设计的，没有促进任何交互。然而，在一般的跟踪任务中，中心分类和边界框回归任务协同工作以估计最终目标位置。为了解决上述问题，必须执行与目标无关的检测，以促进基于 Siamese 的跟踪框架中的跨任务交互。在这项工作中，我们赋予了一个具有目标不可知对象检测模块的新型网络，以补充直接目标推理，并避免或最小化潜在模板实例匹配的关键线索的错位。为了统一多任务学习公式，我们开发了一个跨任务交互模块，以确保分类和回归分支的一致监督，提高不同分支的协同作用。为了消除多任务架构中可能出现的潜在不一致，我们分配自适应标签，而不是固定的硬标签，更有效地监督网络训练。在 OTB100、UAV123、VOT2018、VOT2019 和 LaSOT 等多个基准测试中获得的实验结果证明了高级目标检测模块的有效性，以及跨任务交互，与状态相比表现出更优越的跟踪性能。最先进的跟踪方法。

更新日期：2023-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11