Towards real-time object tracking with deep Siamese network and layerwise aggregation,Signal, Image and Video Processing

当前位置： X-MOL 学术 › Signal Image Video Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards real-time object tracking with deep Siamese network and layerwise aggregation
Signal, Image and Video Processing ( IF 2.0 ) Pub Date : 2021-01-25 , DOI: 10.1007/s11760-021-01861-1
Lasheng Yu , Yongpeng Zhao , Xiaopeng Zheng

Siamese networks have drawn increasing interest in the field of visual object tracking due to their balance of precision and efficiency. However, Siamese trackers use relatively shallow backbone networks, such as AlexNet, and therefore do not take full advantage of the capabilities of modern deep convolutional neural networks (CNNs). Moreover, the feature representations of the target object in a Siamese tracker are extracted through the last layer of CNNs and mainly capture semantic information, which causes the tracker's precision to be relatively low and to drift easily in the presence of similar distractors. In this paper, a new nonpadding residual unit is designed and used to stack a 22-layer deep ResNet, referred as ResNet22. After utilizing ResNet22 as the backbone network, we can build a deep Siamese network, which can greatly enhance the tracking performance. Considering that the different levels of the feature maps of the CNN represent different aspects of the target object, we aggregated different deep convolutional layers to make use of ResNet22’s multilevel feature maps, which can form hyperfeature representations of targets. The final network architecture is named DSiamLA. Experimental results show that DSiamLA has achieved significant improvement on multiple benchmark datasets.

中文翻译：

借助深度连体网络和分层聚合实现实时对象跟踪

由于其精确性和效率之间的平衡，暹罗网络在视觉对象跟踪领域引起了越来越多的兴趣。但是，暹罗跟踪器使用相对较浅的主干网络，例如AlexNet，因此没有充分利用现代深度卷积神经网络（CNN）的功能。此外，通过CNN的最后一层来提取暹罗跟踪器中目标对象的特征表示，并且主要捕获语义信息，这导致跟踪器的精度相对较低，并且在存在类似干扰物的情况下也容易漂移。在本文中，设计了一种新的非填充残差单元，并将其用于堆叠22层深的ResNet，称为ResNet22。将ResNet22用作骨干网络后，我们可以建立一个深层的暹罗网络，可以大大提高跟踪性能。考虑到CNN的特征图的不同层代表目标对象的不同方面，我们聚合了不同的深度卷积层，以使用ResNet22的多层特征图，可以形成目标的超特征表示。最终的网络架构称为DSiamLA。实验结果表明，DSiamLA在多个基准数据集上取得了显着改善。

更新日期：2021-01-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文