Multiple convolutional features in Siamese networks for object tracking

Li, Zhenxi; Bilodeau, Guillaume-Alexandre; Bouachir, Wassim

doi:10.1007/s00138-021-01185-7

Multiple convolutional features in Siamese networks for object tracking

Original Paper
Published: 11 March 2021

Volume 32, article number 59, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

359 Accesses
5 Citations
Explore all metrics

Abstract

Siamese trackers demonstrated high performance in object tracking due to their balance between accuracy and speed. Unlike classification-based CNNs, deep similarity networks are specifically designed to address the image similarity problem and thus are inherently more appropriate for the tracking task. However, Siamese trackers mainly use the last convolutional layers for similarity analysis and target search, which restricts their performance. In this paper, we argue that using a single convolutional layer as feature representation is not an optimal choice in a deep similarity framework. We present a Multiple Features-Siamese Tracker (MFST), a novel tracking algorithm exploiting several hierarchical feature maps for robust tracking. Since convolutional layers provide several abstraction levels in characterizing an object, fusing hierarchical features allows to obtain a richer and more efficient representation of the target. Moreover, we handle the target appearance variations by calibrating the deep features extracted from two different CNN models. Based on this advanced feature representation, our method achieves high tracking accuracy, while outperforming the standard siamese tracker on object tracking benchmarks.The source code and trained models are available at https://github.com/zhenxili96/MFST.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

References

Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1401–1409 (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) Computer Vision - ECCV 2016 Workshops, pp. 850–865. Springer, Cham (2016)
Chapter Google Scholar
Choi, J., Chang, H.J., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., Choi, J.Y.: Context-aware deep feature compression for high-speed visual tracking. In: IEEE conference on computer vision and pattern recognition, pp. 479–488 (2018)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M., Hicks, S.L., Torr, P.H.S.: Struck: structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10), 2096–2109 (2016)
Article Google Scholar
He, A., Luo, C., Tian, X., Zeng, W.: A twofold siamese network for real-time object tracking. In: CVPR, pp. 4834–4843 (2018)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision - ECCV 2016, pp. 749–765. Springer, Cham (2016)
Chapter Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3), 583–596 (2015). https://doi.org/10.1109/TPAMI.2014.2345390
Article Google Scholar
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp. 749–758 (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141. IEEE computer society (2018)
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L.Č., Vojír, T., Bhat, G., Lukežič, A., Eldesokey, A., Fernández, G., García-Martín, Á., Iglesias-Arias, Á., Alatan, A.A., González-García, A., Petrosino, A., Memarmoghadam, A., Vedaldi, A., Muhič, A., He, A., Smeulders, A., Perera, A.G., Li, B., Chen, B., Kim, C., Xu, C., Xiong, C., Tian, C., Luo, C., Sun, C., Hao, C., Kim, D., Mishra, D., Chen, D., Wang, D., Wee, D., Gavves, E., Gundogdu, E., Velasco-Salido, E., Khan, F.S., Yang, F., Zhao, F., Li, F., Battistone, F., De Ath, G., Subrahmanyam, G.R.K.S., Bastos, G., Ling, H., Galoogahi, H.K., Lee, H., Li, H., Zhao, H., Fan, H., Zhang, H., Possegger, H., Li, H., Lu, H., Zhi, H., Li, H., Lee, H., Chang, H.J., Drummond, I., Valmadre, J., Martin, J.S., Chahl, J., Choi, J.Y., Li, J., Wang, J., Qi, J., Sung, J., Johnander, J., Henriques, J., Choi, J., van de Weijer, J., Herranz, J.R., Martínez, J.M., Kittler, J., Zhuang, J., Gao, J., Grm, K., Zhang, L., Wang, L., Yang, L., Rout, L., Si, L., Bertinetto, L., Chu, L., Che, M., Maresca, M.E., Danelljan, M., Yang, M.H., Abdelpakey, M., Shehata, M., Kang, M., Lee, N., Wang, N., Miksik, O., Moallem, P., Vicente-Moñivar, P., Senna, P., Li, P., Torr, P., Raju, P.M., Ruihe, Q., Wang, Q., Zhou, Q., Guo, Q., Martín-Nieto, R., Gorthi, R.K., Tao, R., Bowden, R., Everson, R., Wang, R., Yun, S., Choi, S., Vivas, S., Bai, S., Huang, S., Wu, S., Hadfield, S., Wang, S., Golodetz, S., Ming, T., Xu, T., Zhang, T., Fischer, T., Santopietro, V., Štruc, V., Wei, W., Zuo, W., Feng, W., Wu, W., Zou, W., Hu, W., Zhou, W., Zeng, W., Zhang, X., Wu, X., Wu, X.J., Tian, X., Li, Y., Lu, Y., Law, Y.W., Wu, Y., Demiris, Y., Yang, Y., Jiao, Y., Li, Y., Zhang, Y., Sun, Y., Zhang, Z., Zhu, Z., Feng, Z.H., Wang, Z., He, Z.: The sixth visual object tracking vot2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) Computer Vision - ECCV 2018 Workshops, pp. 3–53. Springer, Cham (2019)
Chapter Google Scholar
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.K., Čehovin Zajc, L., Drbohlav, O., Lukezic, A., Berg, A., Eldesokey, A., Kapyla, J., Fernandez, G.: The seventh visual object tracking vot2019 challenge results (2019)
Kristan, M., Matas, J., Leonardis, A., Vojir, T., Pflugfelder, R., Fernandez, G., Nebehay, G., Porikli, F., Čehovin, L.: A novel performance evaluation methodology for single-target trackers. IEEE Trans Pattern Anal Mach Intell 38(11), 2137–2155 (2016). https://doi.org/10.1109/TPAMI.2016.2516982
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR) pp. 4277–4286 (2019)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp. 8971–8980 (2018)
Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recognit 76, 323–338 (2018)
Article Google Scholar
Li, Z., Bilodeau, G., Bouachir, W.: Multi-branch siamese networks with online selection for object tracking. In: ISVC, Lecture notes in computer science, vol. 11241, pp. 309–319. Springer (2018)
Ma, C., Huang, J., Yang, X., Yang, M.: Hierarchical convolutional features for visual tracking. In: 2015 IEEE international conference on computer vision (ICCV), pp. 3074–3082 (2015)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Robust visual tracking via hierarchical convolutional features. IEEE transactions on pattern analysis and machine intelligence (2018)
Ma, C., Yang, X., Zhang, C., Yang, M.H.: Long-term correlation tracking. 2015 IEEE conference on computer vision and pattern recognition (CVPR) pp. 5388–5396 (2015)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp. 4293–4302 (2016)
Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5000–5008 (2017)
Wang, G., Luo, C., Sun, X., Xiong, Z., Zeng, W.: Tracking by instance detection: A meta-learning approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6288–6297 (2020)
Wang, M., Liu, Y., Huang, Z.: Large margin object tracking with circulant feature maps. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp. 4800–4808 (2017)
Wu, Y., Lim, J., Yang, M.: Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9), 1834–1848 (2015)
Article Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 2411–2418 (2013)
Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X.: ‘skimming-perusal’ tracking: A framework for real-time and robust long-term tracking. In: IEEE international conference on computer vision (ICCV) (2019)
Zgaren, A., Bouachir, W., Ksantini, R.: Coarse-to-fine object tracking using deep features and correlation filters. In: the international symposium on visual computing (ISVC). Springer (2020)
Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Proc. of the European conference on computer vision (ECCV), pp. 188–203 (2014)
Zheng, Y., Song, H., Zhang, K., Fan, J., Liu, X.: Dynamically spatiotemporal regularized correlation tracking. IEEE Trans Neural Netw Learn Syst 31(7), 2336–2347 (2020). https://doi.org/10.1109/TNNLS.2019.2929407
Article MathSciNet Google Scholar
Zhu, Z., Wang, Q., Bo, L., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: European conference on computer vision, pp. 103–119 (2018)
čehovin, L., Leonardis, A., Kristan, M.: Robust visual tracking using template anchors. In: 2016 IEEE Winter conference on applications of computer vision (WACV), pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477570

Download references

Acknowledgements

This work was supported by The Fonds de recherche du Québec - Nature et technologies (FRQNT) and Mitacs. We also thank Nvidia for providing us with the Nvidia TITAN X GPU.

Author information

Authors and Affiliations

LITIV lab, Polytechnique Montreal, Montreal, H3T 1J4, Canada
Zhenxi Li & Guillaume-Alexandre Bilodeau
TELUQ University, Montreal, H2S 3L5, Canada
Wassim Bouachir

Authors

Zhenxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume-Alexandre Bilodeau
View author publications
You can also search for this author in PubMed Google Scholar
Wassim Bouachir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume-Alexandre Bilodeau.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Bilodeau, GA. & Bouachir, W. Multiple convolutional features in Siamese networks for object tracking. Machine Vision and Applications 32, 59 (2021). https://doi.org/10.1007/s00138-021-01185-7

Download citation

Received: 13 August 2019
Revised: 17 November 2020
Accepted: 12 February 2021
Published: 11 March 2021
DOI: https://doi.org/10.1007/s00138-021-01185-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple convolutional features in Siamese networks for object tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple convolutional features in Siamese networks for object tracking

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation