A novel memory mechanism for video object detection from indoor mobile robots

Hu, Jiyuan; Wang, Tao; Li, Yuehua; Zhu, Shiqiang

doi:10.1007/s11760-021-01926-1

A novel memory mechanism for video object detection from indoor mobile robots

Original Paper
Published: 07 May 2021

Volume 15, pages 1785–1795, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jiyuan Hu¹,
Tao Wang ORCID: orcid.org/0000-0001-6500-2224^1,3,4,5,
Yuehua Li² &
…
Shiqiang Zhu^1,2

316 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Video object detection has great potential to enhance visual perception abilities for indoor mobile robots in various regions. In this paper, a novel memory mechanism is proposed to enhance the detection performance for moving sensor videos (MSV), which obtain from indoor mobile robot. And the proposed mechanism could be applied as an extension module for a number of existing image object detectors. First, we analyze characteristics of the indoor MSVs, concluding the key characteristics as mild changes, complicated contents and relative movements. Second, a memory-unit dispatching and application method is devised to maintain prior memory contents and utilize the contents to achieve better detection performance. Finally, we create a corresponding indoor MSV dataset and compress the mechanism into a module to evaluate its localization performance. Our experiment results are presented to illustrate the proposed mechanism and achieve an average localization margin by 19.8% compared with several representative original detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi-view aggregation for real-time accurate object detection of a moving camera

Article 21 September 2022

Jiyuan Hu, Tao Wang & Shiqiang Zhu

Spatio-temporal compression for semi-supervised video object segmentation

Article 13 August 2022

Chuanjun Ji, Yadang Chen, … Enhua Wu

MMA: Motion Memory Attention Network for Video Object Detection

References

Chen, J., Wang, J., Zhao, L., et al.: Branch-structured detector for fast face detection using asymmetric LBP features. SIViP 14, 1699–1706 (2020)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Prest, A., Leistnet, C., Civera, J., et al.: Learning object class detectors from weakly annotated video. In: CVPR (2012)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Girshick, R.: Fast r-cnn. In: ICCV (2015)
Ren, S., He, K., R. Girshick, et al.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS (2015)
Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: NIPS (2016)
He, K., Gkioxari, G., Doll´ar, P., et al.: Mask r-cnn. In: ICCV (2017)
Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: CVPR (2019)
Liu, W., Anguelov, D., Erhan, D. et al.: SSD: single shot multibox detector. In: ICCV (2016)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: CVPR (2016)
Redmon, J., Farhadi, A., et al.: YOLO9000: better, faster, stronger. In: CVPR (2017)
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint https://arxiv.org/abs/1804.02767 (2018)
Han, W., Khorrami, P., Le Paine, T., et al.: Seq-nms for video object detection. arXiv preprinthttps://arxiv.org/abs/1602.08465, 2016.
Kang, K., Li, H., Xiao, T., et al.: Object detection in videos with tubelet proposal networks. In: CVPR (2017)
Kang, K., Li, H., Yan, J., et al.: T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint https://arxiv.org/abs/1604.02532 (2016)
Kang, K., Ouyang, W., Li, H.: Detect to track and track to detect. In convolutional neural networks. In: CVPR (2016)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: ICCV (2017)
Zhu, X., Xiong, Y., Dai, J., et al.: Deep feature flow for video recognition. In: CVPR (2017)
Zhu, X., Wang, Y., Dai, J., et al.: Flow-guided feature aggregation for video object detection. In: ICCV (2017)
Lee, B., Erdenee, E., Jin, S., et al.: Multi-class multi-object tracking using changing point detection. In: ECCV (2016).
Zhu, X., Dai, J., Yuan, L., et al.: Towards high performance video object detection. In: CVPR (2018)
Liu, M., Zhu, M., White, M., et al.: Looking Fast and slow: memory-guided mobile video object detection. arXiv preprint https://arxiv.org/abs/1903.10172.
Xiao, F., Lee, Y.: Video Object detection with an aligned spatial-temporal memory. In: ECCV (2018)
Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. In: ECCV (2018)
Liu, M., Zhu, M.: Mobile video object detection with temporally-aware feature maps. In: CVPR (2018)
Ren, Z., Yu, Z., Yang X., et al.: Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: CVPR (2020)
Jiang, Z., Liu, Y., Yang, C.: Learning Where to focus for efficient video object detection. In: ECCV (2020)
Deng, J., Pan, Y., Yao, T.: Relation distillation networks for video object detection. In: ICCV (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., Merri¨enboer, B., Gulcehre, C., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint https://arxiv.org/abs/1406.1078, (2014)
Deng, H., Hua, Y., Song, T., et al.: Object guided external memory network for video object detection. In: ICCV (2019)
Chen, Y., Cao, Y., Hu, H.: Memory enhanced global-local aggregation for video object detection. In: CVPR (2020)
Howard, A., Zhu, M., Chen, B., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint https://arxiv.org/abs/1704.04861, 2017.
Wu, H., Chen, Y., Wang, N.: Sequence level semantics aggregation for video object detection. In: ICCV (2019)
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV (2015)
Huang, C., Wu, B., Nevatia, R.: Robust object tracking by hierarchical association of detection responses. In: ECCV (2008)
Anderson, J.R., Milson, R.: Human memory: an adaptive perspective. Psychol. Rev. 96(4), 703–719 (1989)
Article Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-Backward error: automatic detection of tracking failures. In: ICPR (2010)
Melonee, W., Tully, F.: Specification for turtlebot compatible platforms. ROSWeb. https://www.ros.org/reps/rep-0119.html (2021). Accessed 1 March 2021.
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Lin, T.Y., Maire, M., Belongie, S.,et al.: Microsoft coco: common objects in context. In: ECCV (2014)
Deselaers, T., Alexe, B., Ferrari, V.: Localizing objects while learning their appearance. In: ECCV (2010)

Download references

Acknowledgements

This work was supported by Robotics Institute of Zhejiang University under Grant K11804, and Stable Support Project of State Administration of Science, Technology and Industry for National Defence Grant, PRC under Grant HTKJ2019KL502005.

Author information

Authors and Affiliations

Ocean College, Zhejiang University, Zhoushan, 316000, China
Jiyuan Hu, Tao Wang & Shiqiang Zhu
Zhejiang Lab, Hangzhou, 310014, China
Yuehua Li & Shiqiang Zhu
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, 310007, China
Tao Wang
Engineering Research Center of Oceanic Sensing Technology and Equipment, Ministry of Education, Zhoushan, 316000, China
Tao Wang
Key Laboratory of Ocean Observation-Imaging Testbed of Zhejiang Province, Zhoushan, 316000, China
Tao Wang

Authors

Jiyuan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehua Li
View author publications
You can also search for this author in PubMed Google Scholar
Shiqiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, J., Wang, T., Li, Y. et al. A novel memory mechanism for video object detection from indoor mobile robots. SIViP 15, 1785–1795 (2021). https://doi.org/10.1007/s11760-021-01926-1

Download citation

Received: 21 January 2021
Revised: 28 March 2021
Accepted: 23 April 2021
Published: 07 May 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11760-021-01926-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A novel memory mechanism for video object detection from indoor mobile robots

Abstract

Access this article

Similar content being viewed by others

Multi-view aggregation for real-time accurate object detection of a moving camera

Spatio-temporal compression for semi-supervised video object segmentation

MMA: Motion Memory Attention Network for Video Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel memory mechanism for video object detection from indoor mobile robots

Abstract

Access this article

Similar content being viewed by others

Multi-view aggregation for real-time accurate object detection of a moving camera

Spatio-temporal compression for semi-supervised video object segmentation

MMA: Motion Memory Attention Network for Video Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation