Triple attention and global reasoning Siamese networks for visual tracking,Machine Vision and Applications

当前位置： X-MOL 学术 › Mach. Vis. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Triple attention and global reasoning Siamese networks for visual tracking
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2022-05-10 , DOI: 10.1007/s00138-022-01301-1
Ping Shu ₁ , Keying Xu ₁ , Hua Bao _{1,

2}

Affiliation

As a fundamental problem in computer vision, the aim of object tracking is to capture the accurate information of the given target in the video sequence, with the initial information determined in the first frame. Despite its significant improvement in the past decades, however, they are still facing various challenges, including occlusion, deformation, fast motion, etc. To attain robust performance, a tracking algorithm based on triple attention mechanism and global reasoning model is presented in this work, which is inspired by the progress of the Siamese network recently. First, in order to solve the problem of insufficient feature extraction, a triple attention model is proposed, which consists of three parts: squeeze-and-excitation (SE) block, spatial SE (sSE) block, and channel SE (cSE) block. Second, to tackle the lack of context information in the tracking procedure, a global reasoning model was added into the template branch and search branch, which will generate two different score maps. As the tracking process continued, these two score maps were summed to construct a regression confidence map with their weight, respectively. Extensive experiments on exited benchmarks including OTB50, OTB100, VOT 2016, VOT2018, GOT-10k, LaSOT, NFS, and TC128 demonstrate that the proposed method achieves competitive results.

中文翻译：

用于视觉跟踪的三重注意力和全局推理连体网络

作为计算机视觉中的一个基本问题，目标跟踪的目的是捕获视频序列中给定目标的准确信息，初始信息在第一帧中确定。尽管在过去的几十年中取得了显着的进步，但它们仍然面临着各种挑战，包括遮挡、变形、快速运动等。为了获得鲁棒的性能，本文提出了一种基于三重注意力机制和全局推理模型的跟踪算法，灵感来自于最近连体网络的进展。首先，为了解决特征提取不足的问题，提出了一种三重注意力模型，它由三部分组成：挤压激励（SE）块、空间SE（sSE）块和通道SE（cSE）块. 第二，为了解决跟踪过程中缺少上下文信息的问题，在模板分支和搜索分支中添加了一个全局推理模型，它将生成两个不同的分数图。随着跟踪过程的继续，将这两个分数图相加，分别构建一个带有权重的回归置信图。在包括 OTB50、OTB100、VOT 2016、VOT2018、GOT-10k、LaSOT、NFS 和 TC128 在内的现有基准上进行的大量实验表明，所提出的方法取得了具有竞争力的结果。

更新日期：2022-05-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11