Skip to main content
Log in

A real-time siamese tracker deployed on UAVs

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Visual object tracking is an essential enabler for the automation of UAVs. Recently, Siamese network based trackers have achieved excellent performance on offline benchmarks. The Siamese network based trackers usually use classic deep and wide networks, such as AlexNet, VggNet, and ResNet, to extract the features of template frame and detection frame. However, due to the poor computing power of embedded devices, these models without modification are too heavy on calculation to be deployed on UAVs. In this paper, we propose a guideline to design a slim backbone: the dimension of output should be smaller than that of the input for every layer. Directed by the guideline, we reduce the computational requirements of AlexNet by 59.4%, while the tracker maintains a comparable accuracy. In addition, we adopt an anchor-free network as the tracking head, which requires less calculation than that of anchor-based method. Based on such approaches, our tracker achieves an AUC of 60.9% on UAV123 data set and reaches 30 frames per second on NVIDIA Jetson TX2, which, therefore, can be embedded in UAVs. To the best of our knowledge, it is the first real-time Siamese tracker deployed on the embedded system of UAVs. The code is available at GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/bitshenwenxiao/SiamSlim.

  2. https://www.youtube.com/channel/UCFww_uAXBJSC2jcrRDL940w.

Abbreviations

\(x^0\) :

the detection frame.

\(x^k\) :

the feature of detection frame of kth layer.

\(z^0\) :

the template frame.

\(z^k\) :

the feature of template frame of kth layer.

\(f(\cdot )\) :

the mapping of the feature extractor network.

\(f_k(\cdot )\) :

the mapping of kth layer of the feature extractor network.

\(g(\cdot )\) :

the mapping of the tracking head network.

\(w_k\) :

the width of the feature map of kth layer.

\(h_k\) :

the height of the feature map of kth layer.

\(c_k\) :

the channel of the feature map of kth layer.

\(d_k\) :

the dimension of discrete state space of kth layer.

\(X^d\) :

the state space of which the dimension is d.

\(c(\cdot )\) :

the cardinality of state space.

\(C_{cls}^x(\cdot )\) :

a convolution layer for the feature of the detection frame in classification branch.

\(C_{cls}^z(\cdot )\) :

a convolution layer for the feature of the template frame in classification branch.

\(C_{reg}^x(\cdot )\) :

a convolution layer for the feature of the detection frame in regression branch.

\(C_{reg}^z(\cdot )\) :

a convolution layer for the feature of the template frame in regression branch.

\(n_s\) :

the number of strides of the backbone.

(ij):

the position in output map.

\(P_{cls}^{w \times h \times 2}\) :

the output of the classification branch. \(P_{reg}^{w \times h \times 4} = \left[ {\begin{array}{*{20}{c}}{d_l}&{d_t}&{d_r}&{d_b}\end{array}} \right] \) the output of the regression branch.

\(*\) :

multi-channel convolution.

box :

the bounding box of the target.

\((box_{l}, box_{t})\) :

the top-left corner point of the target.

\((box_{r}, box_{b})\) :

the bottom-right corner point of the target.

L :

the loss of training.

References

  1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp. 850–865. Springer (2016)

  2. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6182–6191 (2019)

  3. Böttger, T., Steger, C.: Accurate and robust tracking of rigid objects in real time. J. Real-Time Image Proc. 18(3), 493–510 (2021)

    Article  Google Scholar 

  4. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

  5. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)

  6. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660–4669 (2019)

  7. Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7183–7192 (2020)

  8. Deng, C., He, S., Han, Y., Boya, Z.: Learning dynamic spatial-temporal regularization for uav object tracking. IEEE Signal Processing Letters (2021)

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee (2009)

  10. Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., Tian, Q.: The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 370–386 (2018)

  11. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5374–5383 (2019)

  12. Gurcan, I., Temizel, A.: Heterogeneous cpu-gpu tracking-learning-detection (h-tld) for real-time object tracking. J. Real-Time Image Proc. 16(2), 339–353 (2019)

    Article  Google Scholar 

  13. Hadfield, S., Bowden, R., Lebeda, K.: The visual object tracking vot2016 challenge results. Lect. Notes Comput. Sci. 9914, 777–823 (2016)

    Article  Google Scholar 

  14. He, K., Girshick, R., Dollár, P.: Rethinking imagenet pre-training. In: Proceedings of the IEEE international conference on computer vision, pp. 4918–4927 (2019)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  16. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–799 (2018)

  17. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., et al.: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0 (2018)

  18. Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.K., Cehovin Zajc, L., Drbohlav, O., Lukezic, A., Berg, A., et al.: The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0 (2019)

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

  20. Li, B., Wu, W., Wang, Q., Zhang, F., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  21. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  22. Li, T., Ding, F., Yang, W.: Uav object tracking by background cues and aberrances response suppression mechanism. Neural Comput. Appl. 33(8), 3347–3361 (2021)

    Article  Google Scholar 

  23. Li, X., Ma, C., Wu, B., He, Z., Yang, M.H.: Target-aware deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1369–1378 (2019)

  24. Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European conference on computer vision, pp. 254–265. Springer (2014)

  25. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)

  26. Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018)

  27. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European conference on computer vision, pp. 445–461. Springer (2016)

  28. Otoom, M., Al-Louzi, M.: Enhanced tld-based video object-tracking implementation tested on embedded platforms. J. Real-Time Image Proc. 18(3), 937–952 (2021)

    Article  Google Scholar 

  29. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp. 8026–8037 (2019)

  30. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning, pp. 4095–4104. PMLR (2018)

  31. Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5296–5305 (2017)

  32. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  33. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520 (2018)

  34. Shen, H., Lin, D., Song, T., Gao, G.: Anti-distractors: two-branch siamese tracker with both static and dynamic filters for object tracking. Multimed. Syst. 16(4), 1522–1530 (2018)

    Google Scholar 

  35. Sheu, M.H., Jhang, Y.S., Morsalin, S., Huang, Y.F., Sun, C.C., Lai, S.C.: Uav object tracking application based on patch color group feature on embedded system. Electronics 10(15), 1864 (2021)

    Article  Google Scholar 

  36. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  37. Varfolomieiev, A., Lysenko, O.: An improved algorithm of median flow for visual object tracking and its implementation on arm platform. J. Real-Time Image Proc. 11(3), 527–534 (2016)

    Article  Google Scholar 

  38. Whitehead, N., Fit-Florea, A.: Precision & performance: Floating point and ieee 754 compliance for nvidia gpus. rn (A+ B) 21(1), 18749–19424 (2011)

    Google Scholar 

  39. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2411–2418 (2013)

  40. Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: AAAI, pp. 12549–12556 (2020)

  41. Yan, B., Peng, H., Wu, K., Wang, D., Fu, J., Lu, H.: Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15180–15189 (2021)

  42. Yang, S., Chen, H., Xu, F., Li, Y., Yuan, J.: High-performance uavs visual tracking based on siamese network. The Visual Computer pp. 1–17 (2021)

  43. Zhang, W., Song, K., Rong, X., Li, Y.: Coarse-to-fine uav target tracking with deep reinforcement learning. IEEE Trans. Autom. Sci. Eng. 16(4), 1522–1530 (2018)

    Article  Google Scholar 

  44. Zhao, X., Zhou, S., Lei, L., Deng, Z.: Siamese network for object tracking in aerial video. In: 2018 IEEE 3rd international conference on image, vision and computing (ICIVC), pp. 519–523. IEEE (2018)

  45. Zhu, M., Zhang, H., Zhang, J., Zhuo, L.: Multi-level prediction siamese network for real-time uav visual tracking. Image Vis. Comput. 103, 104002 (2020)

    Article  Google Scholar 

  46. Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C., et al.: Visdrone-vdt2018: The vision meets drone video detection and tracking challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Song.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, H., Lin, D. & Song, T. A real-time siamese tracker deployed on UAVs. J Real-Time Image Proc 19, 463–473 (2022). https://doi.org/10.1007/s11554-021-01190-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01190-z

Keywords

Navigation