Dual Siamese network for RGBT tracking via fusing predicted position maps

Guo, Chang; Yang, Dedong; Li, Chang; Song, Peng

doi:10.1007/s00371-021-02131-4

Dual Siamese network for RGBT tracking via fusing predicted position maps

Original article
Published: 02 May 2021

Volume 38, pages 2555–2567, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Chang Guo¹,
Dedong Yang ORCID: orcid.org/0000-0001-7950-6810¹,
Chang Li¹ &
…
Peng Song¹

972 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

Visual object tracking is a basic task in the field of computer vision. Despite the rapid development of visual object tracking, it is not reliable to use only visible light images for object tracking in some cases. Since visible light and thermal infrared images have complementary advantages in imaging, and the use of them as a joint input for tracking becomes more noted, this kind of tracking is RGBT tracking. The existing RGBT tracking can be divided into image-level fusion tracking, feature-level fusion tracking, and response-level fusion tracking. Compared with the first two, response-level fusion tracking can use deeper dual-mode image information, but most of them use traditional tracking methods and introduce weights at inappropriate stages. Based on the above, we propose a response-level fusion tracking algorithm that employed deep learning. And the weight distribution is placed in the feature extraction stage, for which we design the joint modal channel attention module. We adopt the Siamese framework and expand it into a dual Siamese subnetwork. In the meantime, we improve the regional proposal subnetwork and propose the strategy for fusing two modal predicted position maps. To verify the performance of our algorithm, we conducted experiments on two tracking benchmarks. After testing, our algorithm has very good performance and runs at 116 frames per second, which far exceeds the real-time requirement of 25 frames per second.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, pp. 702–715. Springer (2012)
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2015)
Article Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)
Article Google Scholar
Danelljan M., Häger G., Khan F., Felsberg M. : Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, Nottingham, September 1-5, 2014. BMVA Press (2014)
Nam H., Han B.: Learning multi-domain convolutional neural networks for visual tracking (2015)
Jung I., Son J., Baek M., Han B.: Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (ECCV)
Li B., Yan J., Wu W., Zhu Z., Hu X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Zhu Z., Wang Q., Li B., Wu W., Yan J., Hu W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV)
Li B., Wu W., Wang Q., Zhang F., Xing J., Yan J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang Z., Peng H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4591–4600 (2019)
Dalal, N., Triggs B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), Vol. 1, pp. 886–893. IEEE (2005)
Lindeberg, T.: Scale invariant feature transform. Scholarpedia 7(5), 10491 (2012). https://doi.org/10.4249/scholarpedia.10491
Bertinetto L., Valmadre J., Henriques J.F., Vedaldi, A., Torr P.H.: Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp. 850–865. Springer (2016)
Chan, A.L.., Schnelle S.R.: Target tracking using concurrent visible and infrared imageries. In: Signal Processing, Sensor Fusion, and Target Recognition XXI, Vol. 8392, pp. 83920P. International Society for Optics and Photonics (2012)
Chan, AlL, Schnelle, S.: Fusing concurrent visible and infrared videos for improved tracking performance. Optical Eng. 52(1), 017004 (2013)
Article Google Scholar
Li C., Zhao N., Lu Y., Zhu C., Tang J.: Weighted sparse representation regularized graph learning for rgb-t object tracking. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 1856–1864 (2017)
Zhu Y, Li C, Luo B., Tang J., Wang X.: Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 465–472 (2019)
Li, C., Xiaohao, W., Zhao, N., Cao, X., Tang, J.: Fusing two-stream convolutional neural networks for rgb-t object tracking. Neurocomputing 281, 78–85 (2018)
Article Google Scholar
Zhai, S., Shao, P., Liang, X., Wang, X.: Fast rgb-t tracking via cross-modal correlation filters. Neurocomputing 334, 172–181 (2019)
Article Google Scholar
Yun, X., Sun, Y., Yang, X., Nannan, L.: Discriminative fusion correlation learning for visible and infrared tracking. Math. Probl. Eng. 2019, 1–11 (2019). https://doi.org/10.1155/2019/2437521
Luo, C., Sun, B., Yang, K., Taoran, L., Yeh, Wei-Chang: Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Phys. Technol. 99, 265–276 (2019)
Article Google Scholar
Li, C., Cheng, H., Shiyi, H., Liu, X., Tang, J., Lin, L.: Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25(12), 5743–5756 (2016)
Article MathSciNet Google Scholar
Li, C., Liang, X., Yijuan, L., Zhao, N., Tang, J.: Rgb-t object tracking: benchmark and baseline. Pattern Recognit. 96, 106977 (2019)
Article Google Scholar
Zhang, X., Ye, P., Peng, S., Liu, J., Xiao, G.: Dsiammft: an rgb-t fusion tracking method via dynamic siamese networks using multi-layer feature fusion. Signal Process. Image Commun. 84, 115756 (2020)
Article Google Scholar
Jaderberg M., Simonyan K., Zisserman A., et al.: Spatial transformer networks. In: Advances in neural information processing systems, pp. 2017–2025 (2015)
Hu J., Shen L., Sun G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Wang F., Jiang M., Qian C., Yang S., Li C., Zhang H., Wang X., Tang X.: Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164 (2017)
Wang Q., Teng Z., Xing J., Gao J., Hu W., Maybank S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4854–4863 (2018)
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I.: Attention is all you need. arXiv preprint http://arxiv.org/abs/1706.03762 (2017)
Chen Z., Zhong B., Li G., Zhang S., Ji R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6668–6677 (2020)
Tian Z., Shen C., Chen H., He T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9627–9636 (2019)
Ren S., He K., Girshick R., Sun J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
Danelljan M., Hager G., Shahbaz Khan F., Felsberg M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 4310–4318 (2015)
Valmadre J., Bertinetto L., Henriques J., Vedaldi A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)
Zhang J., Ma S., Sclaroff S.: Meem: robust tracking via multiple experts using entropy minimization. In: European conference on computer vision, pp. 188–203. Springer (2014)
Li Y., Zhu J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, pp. 254–265. Springer (2014)
Wu Y., Blasch E., Chen G., Bai L., Ling, H.: Multiple source data fusion via sparse representation for robust visual tracking. In: 14th International Conference on Information Fusion, pp. 1–8. IEEE (2011)

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Hebei Province (F2017202009), and Hebei Province Innovation Capability Improvement Plan (18961604H).

Author information

Authors and Affiliations

School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, Tianjin, China
Chang Guo, Dedong Yang, Chang Li & Peng Song

Authors

Chang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Dedong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chang Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dedong Yang.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, C., Yang, D., Li, C. et al. Dual Siamese network for RGBT tracking via fusing predicted position maps. Vis Comput 38, 2555–2567 (2022). https://doi.org/10.1007/s00371-021-02131-4

Download citation

Accepted: 01 April 2021
Published: 02 May 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00371-021-02131-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual Siamese network for RGBT tracking via fusing predicted position maps

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual Siamese network for RGBT tracking via fusing predicted position maps

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation