Multi-view aggregation for real-time accurate object detection of a moving camera

Hu, Jiyuan; Wang, Tao; Zhu, Shiqiang

doi:10.1007/s11554-022-01253-9

Multi-view aggregation for real-time accurate object detection of a moving camera

Original Research Paper
Published: 21 September 2022

Volume 19, pages 1169–1179, (2022)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Jiyuan Hu¹,
Tao Wang¹ &
Shiqiang Zhu²

376 Accesses
1 Citation
Explore all metrics

Abstract

Object detection plays an important role on various mobile robot tasks. However, directly applying existing detectors on videos from a mobile robot will cause a sharp accuracy decline, because such videos introduce some extra difficulties on accurate detection. This paper proposes a viewpoint-based memory mechanism to handle detection performance deterioration and improve detection accuracy of the videos in real time. The mechanism positively organizes previous results from multiple viewpoints of target objects as prior knowledge to enhance detection accuracy for succeeding frames, and it is designed as an extension module of an existing image detector. In experiments, we collect testing dataset from an indoor mobile robot, and compare performance of several sole image detectors and the same detectors extended by the extension module. The result shows the mechanism module achieves 20.7% object localization rate margin in average at a cost of 18.1 ms, and the mechanism can give positive impact on various existing detectors. The result indicates the proposed method achieves good accuracy margin, has acceptable time cost, and gets a degree of universal applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

ByteTrack: Multi-object Tracking by Associating Every Detection Box

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

References

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 379–387 (2016)
Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–825 (2016)
Kang, K., Li, H., Yan, J., Zeng, X., Yang, B., Xiao, T., Zhang, C., Wang, Z., Wang, R., Wang, X., et al.: T-cnn: tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2896–2907 (2017)
Article Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3038–3046 (2017)
Deng, H., Hua, Y., Song, T., Zhang, Z., Xue, Z., Ma, R., Robertson, N., Guan, H.: Object guided external memory network for video object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6678–6687 (2019)
Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10337–10346 (2020)
Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Ieee (2009)
Zhu, X., Dai, J., Yuan, L., Wei, Y.: Towards high performance video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7210–7218 (2018)
Hu, J., Wang, T., Li, Y., Zhu, S.: A novel memory mechanism for video object detection from indoor mobile robots. SIViP 15(8), 1785–1795 (2021)
Article Google Scholar
Kang, K., Li, H., Xiao, T., Ouyang, W., Yan, J., Liu, X., Wang, X.: Object detection in videos with tubelet proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 727–735 (2017)
Hu, J., Wang, T., Zhu, S.: Viewpoint-based memory mechanism for object detection of moving sensors. In: Proceedings of the IEEE Conference on Intelligent Systems and Computer Vision, pp. 1–7 (2022)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence. 39(6), 1137–1149 (2017)
Han, W., Khorrami, P., Paine, T.L., Ramachandran, P., Babaeizadeh, M., Shi, H., Li, J., Yan, S., Huang, T.S.: Seq-nms for video object detection. arXiv preprint arXiv:1602.08465 (2016)
Konno, T., Amma, A., Kanezaki, A.: Incremental multi-view object detection from a moving camera. In: Proceedings of the 2nd ACM International Conference on Multimedia in Asia, pp. 1–7 (2021)
Tripathi, S., Lipton, Z.C., Belongie, S., Nguyen, T.: Context matters: refining object detection in video with recurrent neural networks. arXiv preprint arXiv:1607.04648 (2016)
Broad, A., Jones, M., Lee, T.Y.: Recurrent multi-frame single shot detector for video object detection. In: BMVC, p. 94 (2018)
Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 331–346 (2018)
Xiao, F., Lee, Y.J.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 485–501 (2018)
Liu, M., Zhu, M.: Mobile video object detection with temporally-aware feature maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5686–5695 (2018)
Liu, M., Zhu, M., White, M., Li, Y., Kalenichenko, D.: Looking fast and slow: memory-guided mobile video object detection. arXiv preprint arXiv:1903.10172 (2019)
Melonee W., T.F.: Specification for turtlebot compatible platforms. ROSWeb. https://www.ros.org/reps/rep-0119.html (2021). (Accessed 1 March 2022)
Deselaers, T., Alexe, B., Ferrari, V.: Localizing objects while learning their appearance. In: European conference on computer vision, pp. 452–466. Springer (2010)

Download references

Acknowledgements

This work was supported by Robotics Institute of Zhejiang University under Grant 110202-I21707.

Author information

Authors and Affiliations

Ocean College, Zhejiang University, Zhoushan, 316000, China
Jiyuan Hu & Tao Wang
Zhejiang Lab, Hangzhou, 310000, China
Shiqiang Zhu

Authors

Jiyuan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiqiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, J., Wang, T. & Zhu, S. Multi-view aggregation for real-time accurate object detection of a moving camera. J Real-Time Image Proc 19, 1169–1179 (2022). https://doi.org/10.1007/s11554-022-01253-9

Download citation

Received: 31 March 2022
Accepted: 06 September 2022
Published: 21 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11554-022-01253-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view aggregation for real-time accurate object detection of a moving camera

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

ByteTrack: Multi-object Tracking by Associating Every Detection Box

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-view aggregation for real-time accurate object detection of a moving camera

Abstract

Access this article

Similar content being viewed by others

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

ByteTrack: Multi-object Tracking by Associating Every Detection Box

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation