Skip to main content
Log in

A novel memory mechanism for video object detection from indoor mobile robots

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Video object detection has great potential to enhance visual perception abilities for indoor mobile robots in various regions. In this paper, a novel memory mechanism is proposed to enhance the detection performance for moving sensor videos (MSV), which obtain from indoor mobile robot. And the proposed mechanism could be applied as an extension module for a number of existing image object detectors. First, we analyze characteristics of the indoor MSVs, concluding the key characteristics as mild changes, complicated contents and relative movements. Second, a memory-unit dispatching and application method is devised to maintain prior memory contents and utilize the contents to achieve better detection performance. Finally, we create a corresponding indoor MSV dataset and compress the mechanism into a module to evaluate its localization performance. Our experiment results are presented to illustrate the proposed mechanism and achieve an average localization margin by 19.8% compared with several representative original detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chen, J., Wang, J., Zhao, L., et al.: Branch-structured detector for fast face detection using asymmetric LBP features. SIViP 14, 1699–1706 (2020)

    Article  Google Scholar 

  2. Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  3. Prest, A., Leistnet, C., Civera, J., et al.: Learning object class detectors from weakly annotated video. In: CVPR (2012)

  4. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

  5. Girshick, R.: Fast r-cnn. In: ICCV (2015)

  6. Ren, S., He, K., R. Girshick, et al.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS (2015)

  7. Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: NIPS (2016)

  8. He, K., Gkioxari, G., Doll´ar, P., et al.: Mask r-cnn. In: ICCV (2017)

  9. Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: CVPR (2019)

  10. Liu, W., Anguelov, D., Erhan, D. et al.: SSD: single shot multibox detector. In: ICCV (2016)

  11. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: CVPR (2016)

  12. Redmon, J., Farhadi, A., et al.: YOLO9000: better, faster, stronger. In: CVPR (2017)

  13. Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint https://arxiv.org/abs/1804.02767 (2018)

  14. Han, W., Khorrami, P., Le Paine, T., et al.: Seq-nms for video object detection. arXiv preprinthttps://arxiv.org/abs/1602.08465, 2016.

  15. Kang, K., Li, H., Xiao, T., et al.: Object detection in videos with tubelet proposal networks. In: CVPR (2017)

  16. Kang, K., Li, H., Yan, J., et al.: T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint https://arxiv.org/abs/1604.02532 (2016)

  17. Kang, K., Ouyang, W., Li, H.: Detect to track and track to detect. In convolutional neural networks. In: CVPR (2016)

  18. Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: ICCV (2017)

  19. Zhu, X., Xiong, Y., Dai, J., et al.: Deep feature flow for video recognition. In: CVPR (2017)

  20. Zhu, X., Wang, Y., Dai, J., et al.: Flow-guided feature aggregation for video object detection. In: ICCV (2017)

  21. Lee, B., Erdenee, E., Jin, S., et al.: Multi-class multi-object tracking using changing point detection. In: ECCV (2016).

  22. Zhu, X., Dai, J., Yuan, L., et al.: Towards high performance video object detection. In: CVPR (2018)

  23. Liu, M., Zhu, M., White, M., et al.: Looking Fast and slow: memory-guided mobile video object detection. arXiv preprint https://arxiv.org/abs/1903.10172.

  24. Xiao, F., Lee, Y.: Video Object detection with an aligned spatial-temporal memory. In: ECCV (2018)

  25. Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. In: ECCV (2018)

  26. Liu, M., Zhu, M.: Mobile video object detection with temporally-aware feature maps. In: CVPR (2018)

  27. Ren, Z., Yu, Z., Yang X., et al.: Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: CVPR (2020)

  28. Jiang, Z., Liu, Y., Yang, C.: Learning Where to focus for efficient video object detection. In: ECCV (2020)

  29. Deng, J., Pan, Y., Yao, T.: Relation distillation networks for video object detection. In: ICCV (2019)

  30. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  31. Cho, K., Merri¨enboer, B., Gulcehre, C., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint https://arxiv.org/abs/1406.1078, (2014)

  32. Deng, H., Hua, Y., Song, T., et al.: Object guided external memory network for video object detection. In: ICCV (2019)

  33. Chen, Y., Cao, Y., Hu, H.: Memory enhanced global-local aggregation for video object detection. In: CVPR (2020)

  34. Howard, A., Zhu, M., Chen, B., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint https://arxiv.org/abs/1704.04861, 2017.

  35. Wu, H., Chen, Y., Wang, N.: Sequence level semantics aggregation for video object detection. In: ICCV (2019)

  36. Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV (2015)

  37. Huang, C., Wu, B., Nevatia, R.: Robust object tracking by hierarchical association of detection responses. In: ECCV (2008)

  38. Anderson, J.R., Milson, R.: Human memory: an adaptive perspective. Psychol. Rev. 96(4), 703–719 (1989)

    Article  Google Scholar 

  39. Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-Backward error: automatic detection of tracking failures. In: ICPR (2010)

  40. Melonee, W., Tully, F.: Specification for turtlebot compatible platforms. ROSWeb. https://www.ros.org/reps/rep-0119.html (2021). Accessed 1 March 2021.

  41. Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

  42. Lin, T.Y., Maire, M., Belongie, S.,et al.: Microsoft coco: common objects in context. In: ECCV (2014)

  43. Deselaers, T., Alexe, B., Ferrari, V.: Localizing objects while learning their appearance. In: ECCV (2010)

Download references

Acknowledgements

This work was supported by Robotics Institute of Zhejiang University under Grant K11804, and Stable Support Project of State Administration of Science, Technology and Industry for National Defence Grant, PRC under Grant HTKJ2019KL502005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, J., Wang, T., Li, Y. et al. A novel memory mechanism for video object detection from indoor mobile robots. SIViP 15, 1785–1795 (2021). https://doi.org/10.1007/s11760-021-01926-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-01926-1

Keywords

Navigation