Skip to main content
Log in

Dual Siamese network for RGBT tracking via fusing predicted position maps

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Visual object tracking is a basic task in the field of computer vision. Despite the rapid development of visual object tracking, it is not reliable to use only visible light images for object tracking in some cases. Since visible light and thermal infrared images have complementary advantages in imaging, and the use of them as a joint input for tracking becomes more noted, this kind of tracking is RGBT tracking. The existing RGBT tracking can be divided into image-level fusion tracking, feature-level fusion tracking, and response-level fusion tracking. Compared with the first two, response-level fusion tracking can use deeper dual-mode image information, but most of them use traditional tracking methods and introduce weights at inappropriate stages. Based on the above, we propose a response-level fusion tracking algorithm that employed deep learning. And the weight distribution is placed in the feature extraction stage, for which we design the joint modal channel attention module. We adopt the Siamese framework and expand it into a dual Siamese subnetwork. In the meantime, we improve the regional proposal subnetwork and propose the strategy for fusing two modal predicted position maps. To verify the performance of our algorithm, we conducted experiments on two tracking benchmarks. After testing, our algorithm has very good performance and runs at 116 frames per second, which far exceeds the real-time requirement of 25 frames per second.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: European conference on computer vision, pp. 702–715. Springer (2012)

  2. Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2015)

    Article  Google Scholar 

  3. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)

    Article  Google Scholar 

  4. Danelljan M., Häger G., Khan F., Felsberg M. : Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, Nottingham, September 1-5, 2014. BMVA Press (2014)

  5. Nam H., Han B.: Learning multi-domain convolutional neural networks for visual tracking (2015)

  6. Jung I., Son J., Baek M., Han B.: Real-time mdnet. In: Proceedings of the European Conference on Computer Vision (ECCV)

  7. Li B., Yan J., Wu W., Zhu Z., Hu X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  8. Zhu Z., Wang Q., Li B., Wu W., Yan J., Hu W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV)

  9. Li B., Wu W., Wang Q., Zhang F., Xing J., Yan J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  10. Zhang Z., Peng H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4591–4600 (2019)

  11. Dalal, N., Triggs B.: Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), Vol. 1, pp. 886–893. IEEE (2005)

  12. Lindeberg, T.: Scale invariant feature transform. Scholarpedia 7(5), 10491 (2012). https://doi.org/10.4249/scholarpedia.10491

  13. Bertinetto L., Valmadre J., Henriques J.F., Vedaldi, A., Torr P.H.: Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp. 850–865. Springer (2016)

  14. Chan, A.L.., Schnelle S.R.: Target tracking using concurrent visible and infrared imageries. In: Signal Processing, Sensor Fusion, and Target Recognition XXI, Vol. 8392, pp. 83920P. International Society for Optics and Photonics (2012)

  15. Chan, AlL, Schnelle, S.: Fusing concurrent visible and infrared videos for improved tracking performance. Optical Eng. 52(1), 017004 (2013)

    Article  Google Scholar 

  16. Li C., Zhao N., Lu Y., Zhu C., Tang J.: Weighted sparse representation regularized graph learning for rgb-t object tracking. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 1856–1864 (2017)

  17. Zhu Y, Li C, Luo B., Tang J., Wang X.: Dense feature aggregation and pruning for rgbt tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 465–472 (2019)

  18. Li, C., Xiaohao, W., Zhao, N., Cao, X., Tang, J.: Fusing two-stream convolutional neural networks for rgb-t object tracking. Neurocomputing 281, 78–85 (2018)

    Article  Google Scholar 

  19. Zhai, S., Shao, P., Liang, X., Wang, X.: Fast rgb-t tracking via cross-modal correlation filters. Neurocomputing 334, 172–181 (2019)

    Article  Google Scholar 

  20. Yun, X., Sun, Y., Yang, X., Nannan, L.: Discriminative fusion correlation learning for visible and infrared tracking. Math. Probl. Eng. 2019, 1–11 (2019). https://doi.org/10.1155/2019/2437521

  21. Luo, C., Sun, B., Yang, K., Taoran, L., Yeh, Wei-Chang: Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Phys. Technol. 99, 265–276 (2019)

    Article  Google Scholar 

  22. Li, C., Cheng, H., Shiyi, H., Liu, X., Tang, J., Lin, L.: Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25(12), 5743–5756 (2016)

    Article  MathSciNet  Google Scholar 

  23. Li, C., Liang, X., Yijuan, L., Zhao, N., Tang, J.: Rgb-t object tracking: benchmark and baseline. Pattern Recognit. 96, 106977 (2019)

    Article  Google Scholar 

  24. Zhang, X., Ye, P., Peng, S., Liu, J., Xiao, G.: Dsiammft: an rgb-t fusion tracking method via dynamic siamese networks using multi-layer feature fusion. Signal Process. Image Commun. 84, 115756 (2020)

    Article  Google Scholar 

  25. Jaderberg M., Simonyan K., Zisserman A., et al.: Spatial transformer networks. In: Advances in neural information processing systems, pp. 2017–2025 (2015)

  26. Hu J., Shen L., Sun G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)

  27. Wang F., Jiang M., Qian C., Yang S., Li C., Zhang H., Wang X., Tang X.: Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164 (2017)

  28. Wang Q., Teng Z., Xing J., Gao J., Hu W., Maybank S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4854–4863 (2018)

  29. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I.: Attention is all you need. arXiv preprint http://arxiv.org/abs/1706.03762 (2017)

  30. Chen Z., Zhong B., Li G., Zhang S., Ji R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6668–6677 (2020)

  31. Tian Z., Shen C., Chen H., He T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9627–9636 (2019)

  32. Ren S., He K., Girshick R., Sun J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)

  33. Danelljan M., Hager G., Shahbaz Khan F., Felsberg M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 4310–4318 (2015)

  34. Valmadre J., Bertinetto L., Henriques J., Vedaldi A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)

  35. Zhang J., Ma S., Sclaroff S.: Meem: robust tracking via multiple experts using entropy minimization. In: European conference on computer vision, pp. 188–203. Springer (2014)

  36. Li Y., Zhu J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, pp. 254–265. Springer (2014)

  37. Wu Y., Blasch E., Chen G., Bai L., Ling, H.: Multiple source data fusion via sparse representation for robust visual tracking. In: 14th International Conference on Information Fusion, pp. 1–8. IEEE (2011)

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Hebei Province (F2017202009), and Hebei Province Innovation Capability Improvement Plan (18961604H).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dedong Yang.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, C., Yang, D., Li, C. et al. Dual Siamese network for RGBT tracking via fusing predicted position maps. Vis Comput 38, 2555–2567 (2022). https://doi.org/10.1007/s00371-021-02131-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02131-4

Keywords

Navigation