Abstract
Traditional object recognition algorithms cannot meet the requirements of object recognition accuracy in the actual warehousing and logistics field. In recent years, the rapid development of the deep learning theory has provided a technical approach for solving the above problems, and a number of object recognition algorithms has been proposed based on deep learning, which have been promoted and applied. However, deep learning has the following problems in the application process of object recognition: First, the nonlinear modeling ability of the activation function in the deep learning model is poor; second, the deep learning model has a large number of repeated pooling operations during which information is lost. In view of these shortcomings, this paper proposes multiple-parameter exponential linear units with uniform and learnable parameter forms and introduces two learned parameters in the exponential linear unit (ELU), enabling it to represent piecewise linear and exponential nonlinear functions. Therefore, the ELU has good nonlinear modeling capabilities. At the same time, to improve the problem of losing information in the large number of repeated pooling operations, this paper proposes a new global convolutional neural network structure. This network structure makes full use of the local and global information of different layer feature maps in the network. It can reduce the problem of losing feature information in the large number of pooling operations. Based on the above ideas, this paper suggests an object recognition algorithm based on the optimized nonlinear activation function-global convolutional neural network. Experiments were carried out on the CIFAR100 dataset and the ImageNet dataset using the object recognition algorithm proposed in this paper. The results show that the object recognition method suggested in this paper not only has a better recognition accuracy than traditional machine learning and other deep learning models but also has a good stability and robustness.
Similar content being viewed by others
Data availability statement
The data used to support the findings of this study are included within the paper.
References
Hu, H., Gu, J., Zhang, Z.: Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)
Chen, L., Wang, F., Wang, L.: Research on warehouse object detection algorithm based on fused DenseNet and SSD. In: Chinese Conference on Image and Graphics Technologies, pp. 602–611. Springer, Berlin (2019)
Hoang, D.C., Stoyanov, T., Lilienthal, A.J.: Object-RPE: dense 3D reconstruction and pose estimation with convolutional neural networks for warehouse robots. arXiv preprint https//arxiv.org/abs/1908.08601 (2019)
Eudes, A., Marzat, J., Sanfourche, M.: Autonomous and safe inspection of an industrial warehouse by a multi-rotor MAV, pp. 221–235. Field and Service Robotics. Springer, Cham (2018)
Wang, J., Chen, Y., Hao, S.: Deep learning for sensor-based activity recognition: a survey. Pattern Recogn. Lett. 115, 3–11 (2019)
Asaoka, T., Nagata, K., Nishi, T.: Detection of object arrangement patterns using images for robot picking. ROBOMECH J. 5(1), 23–32 (2018)
Csurka, G., Dance, C., Fan, L.: Visual categorization with bags of keypoints, pp. 1–2. Workshop on statistical learning in computer vision, ECCV (2004)
Yang, J., Yu, K., Gong, Y.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition, pp. 1794–1801 (2009)
Lazebnik, S., Schmid. C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)
Ning, J., Yang, J., Jiang, S.: Object tracking via dual linear structured SVM and explicit feature map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4266–4274 (2016)
Al-Halah, Z., Stiefelhagen. R.: How to transfer? zero-shot object recognition via hierarchical transfer of semantic attributes. In: IEEE Winter Conference on Applications of Computer Vision, pp. 837–843 (2015)
Huang, Y., Zhu, F., Shao, L.: Color object recognition via cross-domain learning on RGB-D images. In: IEEE International Conference on Robotics and Automation, pp. 1672–1677 (2016)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Gualdi, G., Prati, A., Cucchiara, R.: Multistage particle windows for fast and accurate object detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1589–1604 (2011)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Khagi, B., Kwon, G.R., Lama, R.: Comparative analysis of Alzheimer’s disease classification by CDR level using CNN, feature selection, and machine-learning techniques. Int. J. Imaging Syst. Technol. 29(3), 297–310 (2019)
Wei, Y., Xia, W., Lin, M.: Hcp: a flexible cnn framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2016)
Bao, Y., Tang, Z., Li, H.: Computer vision and deep learning–based data anomaly detection method for structural health monitoring. Struct. Health Monit. 18(2), 401–421 (2019)
Xiong, W., Wu, L., Alleva, F.: The Microsoft 2017 conversational speech recognition system. In: IEEE International Conference on Acoustics, pp. 5934–5938 (2018)
Mitra, V., Sivaraman, G., Nam, H.: Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition. Speech Commun. 89, 103–112 (2017)
Ding, C., Tao, D.: Robust face recognition via multimodal deep face representation. IEEE Trans. Multimedia 17(11), 2049–2058 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint https//arxiv.org/abs/1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint https//arxiv.org/abs/1605.07146 (2016)
Gao, H., Cheng, B., Wang, J.: Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment. IEEE Trans. Ind. Inf. 14(9), 4224–4231 (2018)
Girshick, R., Donahue, J., Darrell, T.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)
Ren, S., He, K., Girshick, R.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)
Tian, L., Tang, C., Xu, M.: Accurate and efficient extraction of fringe orientation from the poor-quality ESPI fringe pattern with a convolutional neural network. Appl. Opt. 58(27), 7523–7530 (2019)
Luo, Z., Mishra, A., Achkar, A.: Non-local deep features for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6609–6617 (2017)
He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 and cifar-100 datasets, p. 6 (2009). https://www.cs.toronto.edu/kriz/cifar.html
Huang, G., Sun, Y., Liu, Z.: Deep networks with stochastic depth. In: European Conference on Computer Vision, pp. 646–661 (2016)
Zhan, Y., Taylor, M.E.: Online transfer learning in reinforcement learning domains. In: AAAI Fall Symposium Series (2015)
Hoffman, J., Rodner, E., Donahue, J.: Asymmetric and category invariant feature transformations for domain adaptation. Int. J. Comput. Vision 109(1–2), 28–41 (2014)
Li, H., Kadav, A., Durdanovic, I.: Pruning filters for efficient convnets. arXiv preprint https//arxiv.org/abs/1608.08710 (2016)
Liu, Z., Li, J., Shen, Z.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)
Zhao, L., Wang, J., Li, X.: On the connection of deep fusion to ensembling. arXiv preprint https//arxiv.org/abs/1611.07718 (2016).
Xie, G., Wang, J., Zhang, T.: Interleaved structured sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8847–8856 (2018)
Huang, G., Liu, Z., Van Der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Sun, K., Li, M., Liu, D.: Igcv3: interleaved low-rank group convolutions for efficient deep neural networks. arXiv preprint https//arxiv.org/abs/1806.00178 (2018)
Deng, J., Dong, W., Socher, R.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Gong, Y., Lazebnik, S., Gordo, A.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural. In: Neural Information Processing Systems, pp. 1–9 (2014)
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint https//arxiv.org/abs/1312.4400 (2013)
Shen, F., Shen, C., Liu, W.: Supervised discrete hashing: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 37–45 (2015)
Liu, H., Wang, R., Shan, S.: Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2064–2072 (2016)
Cao, Z., Long, M., Wang, J.: Hashnet: deep learning to hash by continuation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5608–5617 (2017)
Yang, H.F., Lin, K., Chen, C.S.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 437–451 (2017)
Jiang, Q.Y., Cui, X., Li, W.J.: Deep discrete supervised hashing. IEEE Trans. Image Process. 27(12), 5996–6009 (2018)
Lin, T.-Y., Dollar, P., Girshick, R.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Lin, T.-Y., Dollar, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 99, 2999–3007 (2017)
Acknowledgements
This paper is supported by National Natural Science Foundation of China (No. 61701188), China Postdoctoral Science Foundation (No. 2019M650512), Beijing Intelligent Logistics System Collaborative Innovation Center (No. BILSCIC-2019KF-22), and Hebei IoT Monitoring Engineering Technology Research Center funded project (No. IOT202004).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
An, FP., Liu, Je. & Bai, L. Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network. Vis Comput 38, 541–553 (2022). https://doi.org/10.1007/s00371-020-02033-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-020-02033-x