Skip to main content

Advertisement

Log in

Multi-scale ResNet for real-time underwater object detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

An automatic underwater object recognition system is essential to reduce the costs of underwater inspection. In this study, we propose a novel convolutional neural network architecture that is trained on underwater video frames. This method is based on a modified residual neural network (ResNet) for underwater object detection. Multi-scale ResNet (M-ResNet), the modified method, improves efficiency by utilizing multi-scale operations for the accurate detection of objects of various sizes, especially small objects. The experimental results show that the proposed method yields an accuracy of 96.5% (mAP) in recognition performance. As a consequence, we propose a novel system for automatic object detection as an application for marine environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Mliki, H., Dammak, S., Fendri, E.: An improved multi-scale face detection using convolutional neural network. SIViP 14, 1345–1353 (2020)

    Article  Google Scholar 

  2. Wang, X., Yang, J.: Marathon athletes number recognition model with compound deep neural network. SIViP 14, 1379–1386 (2020)

    Article  Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

  4. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Alexander C.B.: SSD: single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37 (2016)

  5. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  7. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.: YOLOv4: Optimal Speed and Accuracy of Object Detection (2020). arXiv:2004.10934

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)

  10. Priyadharsini, X.R., Sharmila, T.S.: Object Detection in underwater acoustic images using edge based segmentation method. Procedia Comput. Sci. 165, 759–765 (2019)

    Article  Google Scholar 

  11. Yang, H., Liu, P., Hu, Y., Fu, J.N.: Research on underwater object recognition based on YOLOv3. Microsyst. Technol. (2020). https://doi.org/10.1007/s00542-019-04694-8

    Article  Google Scholar 

  12. Song, Y., He, B., Liu, P.: Real-time object detection for AUVs using self-cascaded convolutional neural networks. IEEE J. Ocean. Eng. (2020). https://doi.org/10.1109/JOE.2019.2950974

    Article  Google Scholar 

  13. Salman, A., Siddiqui, S.A., Shafait, F., Mian, A., Shortis, M.R., Khurshid, K., Ulges, A., Schwanecke, U.: Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J. Mar. Sci. 77(4), 1295–1307 (2020)

    Article  Google Scholar 

  14. Cao, S., Zhao, D., Liu, X., Sun, Y.: Real-time robust detector for underwater live crabs based on deep learning. Comput. Electron. Agric. 172, 105339 (2020)

    Article  Google Scholar 

  15. Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  16. Girshick. R.: Fast R-CNN (2015). arXiv:1504.08083

  17. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2016). arXiv:1506.01497v3

  18. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)

  19. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)

  20. Joseph, R., Farhadi, A.: YOLOv3: an incremental improvement. CVPR (2018)

  21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)

  22. Pedersen, M., Haurum, J.B., Gade, R., Moeslund, T.B.: Detection of marine animals in a new underwater dataset with varying visibility. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 18–26 (2019)

  23. Fabbri, C., Islam, M.J., Sattar, J.: Enhancing underwater imagery using generative adversarial networks. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 7159–7165 (2018)

  24. Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21–26 (2017)

  25. Bulo, S.R., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5639–5647 (2018)

  26. Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., & Jia, J.: PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 267–283 (2018)

  27. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (ICLR) (2016)

  28. Salman, A., Maqbool, S., Khan, A.H., Jalal, A., Shafait, F.: Real-time fish detection in complex backgrounds using probabilistic background modelling. Ecol. Inform. 51, 44–51 (2019)

    Article  Google Scholar 

  29. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(60), 1–48 (2019)

    Google Scholar 

  30. Antoniou, A., Storkey, A.H.: Edwards, Data Augmentation Generative Adversarial Networks (2017). arXiv:1711.04340

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huang-Chu Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, TS., Huang, HC., Lee, JC. et al. Multi-scale ResNet for real-time underwater object detection. SIViP 15, 941–949 (2021). https://doi.org/10.1007/s11760-020-01818-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-020-01818-w

Keywords

Navigation