Abstract
An automatic underwater object recognition system is essential to reduce the costs of underwater inspection. In this study, we propose a novel convolutional neural network architecture that is trained on underwater video frames. This method is based on a modified residual neural network (ResNet) for underwater object detection. Multi-scale ResNet (M-ResNet), the modified method, improves efficiency by utilizing multi-scale operations for the accurate detection of objects of various sizes, especially small objects. The experimental results show that the proposed method yields an accuracy of 96.5% (mAP) in recognition performance. As a consequence, we propose a novel system for automatic object detection as an application for marine environments.
Similar content being viewed by others
References
Mliki, H., Dammak, S., Fendri, E.: An improved multi-scale face detection using convolutional neural network. SIViP 14, 1345–1353 (2020)
Wang, X., Yang, J.: Marathon athletes number recognition model with compound deep neural network. SIViP 14, 1379–1386 (2020)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Alexander C.B.: SSD: single shot multibox detector. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.: YOLOv4: Optimal Speed and Accuracy of Object Detection (2020). arXiv:2004.10934
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Priyadharsini, X.R., Sharmila, T.S.: Object Detection in underwater acoustic images using edge based segmentation method. Procedia Comput. Sci. 165, 759–765 (2019)
Yang, H., Liu, P., Hu, Y., Fu, J.N.: Research on underwater object recognition based on YOLOv3. Microsyst. Technol. (2020). https://doi.org/10.1007/s00542-019-04694-8
Song, Y., He, B., Liu, P.: Real-time object detection for AUVs using self-cascaded convolutional neural networks. IEEE J. Ocean. Eng. (2020). https://doi.org/10.1109/JOE.2019.2950974
Salman, A., Siddiqui, S.A., Shafait, F., Mian, A., Shortis, M.R., Khurshid, K., Ulges, A., Schwanecke, U.: Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J. Mar. Sci. 77(4), 1295–1307 (2020)
Cao, S., Zhao, D., Liu, X., Sun, Y.: Real-time robust detector for underwater live crabs based on deep learning. Comput. Electron. Agric. 172, 105339 (2020)
Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Girshick. R.: Fast R-CNN (2015). arXiv:1504.08083
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2016). arXiv:1506.01497v3
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7263–7271 (2017)
Joseph, R., Farhadi, A.: YOLOv3: an incremental improvement. CVPR (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)
Pedersen, M., Haurum, J.B., Gade, R., Moeslund, T.B.: Detection of marine animals in a new underwater dataset with varying visibility. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 18–26 (2019)
Fabbri, C., Islam, M.J., Sattar, J.: Enhancing underwater imagery using generative adversarial networks. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 7159–7165 (2018)
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21–26 (2017)
Bulo, S.R., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5639–5647 (2018)
Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., & Jia, J.: PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 267–283 (2018)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (ICLR) (2016)
Salman, A., Maqbool, S., Khan, A.H., Jalal, A., Shafait, F.: Real-time fish detection in complex backgrounds using probabilistic background modelling. Ecol. Inform. 51, 44–51 (2019)
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(60), 1–48 (2019)
Antoniou, A., Storkey, A.H.: Edwards, Data Augmentation Generative Adversarial Networks (2017). arXiv:1711.04340
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pan, TS., Huang, HC., Lee, JC. et al. Multi-scale ResNet for real-time underwater object detection. SIViP 15, 941–949 (2021). https://doi.org/10.1007/s11760-020-01818-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01818-w