Abstract
With the advent of image processing and computer vision for automotive under real-time constraints, the need for fast and architecture-optimized arithmetic operations is crucial. Alternative and efficient representations for real numbers are starting to be explored, and among them, the recently introduced posit\(^{\mathrm{TM}}\) number system is highly promising. Furthermore, with the implementation of the architecture-specific mathematical library thoroughly targeting single-instruction multiple-data (SIMD) engines, the acceleration provided to deep neural networks framework is increasing. In this paper, we present the implementation of some core image processing operations exploiting the posit arithmetic and the ARM scalable vector extension SIMD engine. Moreover, we present applications of real-time image processing to the autonomous driving scenario, presenting benchmarks on the tinyDNN deep neural network (DNN) framework.
Similar content being viewed by others
References
Burgess, N., Milanovic, J., Stephens, N., Monachopoulos, K., Mansell, D.: Bfloat16 processing for neural networks. In: 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), pp. 88–91 (2019)
Köster, U., Webb, T., Wang, X., Nassar, M., Bansal, A.K., Constable, W., Elibol, O., Gray, S., Hall, S., Hornof, L., et al.: Flexpoint: An adaptive numerical format for efficient training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 1742–1752 (2017)
Popescu, V., Nassar, M., Wang, X., Tumer, E., Webb, T.: Flexpoint: Predictive numerics for deep learning. In: 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp. 1–4 (2018)
Johnson, J.: Rethinking floating point for deep learning. CoRR (2018). [Online]. Available: arxiv:1811.01721
Gustafson, J.L.: The End of Error: Unum Computing. Chapman and Hall/CRC, Boca Raton (2015)
Gustafson, J.L.: A radical approach to computation with real numbers. Supercomput. Front. Innov. 3(2), 38–53 (2016)
Gustafson, J.L., Yonemoto, I.T.: Beating floating point at its own game: posit arithmetic. Supercomput. Front. Innov. 4(2), 71–86 (2017)
Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S.: Novel arithmetics to accelerate machine learning classifiers in autonomous driving applications. In: 26th IEEE International Conference on Electronics Circuits and Systems (ICECS’19) (2019)
Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S.: A fast approximation of the hyperbolic tangent when using posit numbers and its application to deep neural networks. In: International Workshop on Applications in Electronics Pervading Industry, Environment and Society (ApplePies’19) (2019). [Online]. https://doi.org/10.1007/978-3-030-37277-4_25
Cococcioni, M., Ruffaldi, E., Saponara, S.: Exploiting posit arithmetic for deep neural networks in autonomous driving applications. IEEE Automotive (2018)
Carmichael, Z., Langroudi, H.F., Khazanov, C., Lillie, J., Gustafson, J.L., Kudithipudi, D.: Deep positron: A deep neural network using the posit number system. In: 2019 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1421–1426 (2019)
Fatemi Langroudi, S.H., Carmichael, Z., Gustafson, J., Kudithipudi, D.: PositNN framework: Tapered precision deep learning inference for the edge, pp. 53–59 (2019)
Lu, J., Fang, C., Xu, M., Lin, J., Wang, Z.: Evaluations on deep neural networks training using posit number system. IEEE Trans. Comput. 1 (2020)
A sneak peek into SVE and VLA programming (2018). https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/a-sneak-peek-into-sve-and-vla-programming
ARM Scalable Vector Extension and application to Machine Learning (2019). https://developer.arm.com/-/media/developer/products/software-tools/hpc/White%20papers/arm-scalable-vector-extensions-and-application-to-machine-learning.pdf?revision=510ee340-fce1-4fd8-bad6-bade674620a5
ARM NN (2019). https://developer.arm.com/ip-products/processors/machine-learning/arm-nn
Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S.: Fast approximations of activation functions in deep neural networks when using posit arithmetic. Sensors 20(5) (2020)
Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S., de Dinechin, B.: Novel arithmetics in deep neural networks signal processing for autonomous driving: challenges and opportunities. IEEE Signal Process. Mag. (2020). https://doi.org/10.1109/MSP.2020.2988436
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 971–980. Curran Associates, Inc. (2017). [Online]. Available: http://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15. JMLR.org, pp. 448–456 (2015). [Online]. Available: http://dl.acm.org/citation.cfm?id=3045118.3045167
Kukačka, J., Golkov, V., Cremers, D.: Regularization for deep learning: a taxonomy (2017)
Plaut, D.C., et al.: Experiments on learning by back propagation (1986)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Post-K Supercomputer with Fujitsu’s Original CPU, A64FX Powered by ARM ISA (2019). https://www.fujitsu.com/global/Images/post-k_supercomputer_with_fujitsu’s_original_cpu_a64fx_powered_by_arm_isa.pdf
ARM C Language Extensions for SVE. https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Acknowledgements
This work is partially funded by H2020 European Processor Initiative (Grant agreement No 826647) and partially by the Italian Ministry of Education and Research (MIUR) in the framework of the CrossLab project (Departments of Excellence).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: The posit designer tool
Appendix: The posit designer tool
When choosing the posit configuration, we need to take into account multiple factors, such as target dynamic range and target decimal precision. We developed a MATLAB tool to analyse different alternative representations for real numbers, and we provide posit configurations that match the requirements for converting a given format into its closest posit alternative, evaluating range and resolution of them. The tool provides the following information:
- 1.
Number-type statistics such as the total number of bits, maximum value and \(\epsilon \) value (i.e. smallest step we can make from a number of that format). Figure 8 shows the output of this functionality. (In that figure, bin32_8 is a 32-bit float IEEE 754 with 8 bits for the exponent, i.e. a standard single-precision representation.)
- 2.
Graphical evaluation of \(\epsilon \) value against the max value (in a logarithmic scale).
- 3.
Next posit with 0 exponent bits that covers the dynamic range of a given number format. Figure 9 shows the output of this functionality.
- 4.
posit to fixed type to build appropriate quire space for deferred rounding operations (such as exact multiply and accumulate).
Furthermore, we derived a general formula that allows us to convert any posit\(\langle X,Y\rangle \) to any posit\(\langle Z,W\rangle \) (with \(X > Z\)) without losing the dynamic range coverage:
This may be useful when trying to reduce the number of bits during of neural network weights after we trained it. Table 6 shows an example of application of this formula.
Rights and permissions
About this article
Cite this article
Cococcioni, M., Rossi, F., Ruffaldi, E. et al. Fast deep neural networks for image processing using posits and ARM scalable vector extension. J Real-Time Image Proc 17, 759–771 (2020). https://doi.org/10.1007/s11554-020-00984-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-00984-x