An improved optimization technique using Deep Neural Networks for digit recognition

Senthil, T.; Rajan, C.; Deepika, J.

doi:10.1007/s00500-020-05262-3

An improved optimization technique using Deep Neural Networks for digit recognition

Methodologies and Application
Published: 05 September 2020

Volume 25, pages 1647–1658, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

400 Accesses
4 Citations
Explore all metrics

Abstract

In the world of information retrieval, recognizing hand-written digits stands as an interesting application of machine learning (deep learning). Though this is already a matured field, a way to recognize digits using an effective optimization using soft computing technique is a challenging task. Training such a system with larger data often fails due to higher computation and storage. In this paper, a recurrent deep neural network with hybrid mini-batch and stochastic Hessian-free optimization (MBSHF) is for accurate and faster convergence of predictions as outputs. A second-order approximation is used for achieving better performance for solving quadratic equations which greatly depends on computation and storage. Also, the proposed technique uses an iterative minimization algorithm for faster convergence using a random initialization though huge additional parameters are involved. As a solution, a convex approximation of MBSHF optimization is formulated and its performance on experimenting with the standard MNIST dataset is discussed. A recurrent deep neural network till a depth of 20 layers is successfully trained using the proposed MBSHF optimization, resulting in a better quality performance in computation and storage. The results are compared with other standard optimization techniques like mini-batch stochastic gradient descent (MBSGD), stochastic gradient descent (SGD), stochastic Hessian-free optimization (SHF), Hessian-free optimization (HF), nonlinear conjugate gradient (NCG). The proposed technique produced higher recognition accuracy of 12.2% better than MBSGD, 27.2% better than SHF, 35.4% better than HF, 40.2% better than NCG and 32% better than SGD on an average when applied to 50,000 testing sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning

Article 25 August 2023

Convergence of Stochastic Gradient Descent in Deep Neural Network

Article 06 January 2021

A Machine Learning Framework for Recognizing Handwritten Digits Using Convexity-Based Feature Vector Encoding

References

Avancha DD, Mudigere S, Vaidynathan D, Sridharan K, Kalamkar S, Kaul D, Dubey B (2016) Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709
Bashar A (2019) Survey on evolving deep learning neural network architectures. J Artif Intell 1(02):73–82
Google Scholar
Bengio Y, Boulanger-Lewandowski N, Pascanu R (2012) Advances in optimizing recurrent networks. arXiv:1212.0901
Bordes A, Bottou L, Gallinari P (2010) SGD-QN. Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754
MathSciNet MATH Google Scholar
Bottou L, Curtis FE, Nocedal J (2016) Opt. methods for large-scale machine learning. arXiv:1606.04838
Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Prog 134(1):1–29
Article MathSciNet Google Scholar
Chapelle O, Erhan D (2011) Improved preconditioner for hessian free optimization. In: NIPS workshop on deep learning and unsupervised feature learning
Chen J, Pan X, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous SGD, pp 1–12. arXiv:1604.00981
Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2017) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Springer, vol 71, issue (2–3), pp 569–608
Deepika J, Senthil T, Rajan C, Surendar A (2018) Machine learning algorithms: a background artifact. Int J Eng Technol 7:143–149
Article Google Scholar
Deng HG, Yu L, Dahl D, Mohamed GE, Jaitly A, Senior N, Vanhoucke A, Nguyen V, Sainath P (2012) Deep neural networks for acoustic modeling in speech recognition. Signal Process Magz 29:82–97
Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Goodfellow I, Le QV, Saxe A, Lee H, Ng AY (2010) Measuring invariances in deep network. Adv Neural Inform Process Syst 22:646–654
Google Scholar
Graves A (2012) Supervised sequence labeling with recurrent neural networks. Stud Comput Intell 385:5–13
MathSciNet MATH Google Scholar
Han D, Nie H, Chen J, Chen M, Deng Z, Zhang J (2018) Multi-modal haptic image recognition based on deep learning. Sensor Rev. https://doi.org/10.1108/SR-08-2017-0160
Article Google Scholar
He X, Mudigere D, Smelyanskiy M, Takac M (2017) Distributed Hessian-free optimization for deep neural network. In: Association for the advancement of artificial intelligence, pp 1–8
Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR
Kiros R (2013) Training neural networks with stochastic hessian-free optimization. pp 1–12. arXiv:130.1.3641
Le Q, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272
LeCun YB, Orr LG, Muller KG, Muller K (1998) Efficient backprop on neural networks, Springer, vol 2, pp 918–92
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR
Martens J, Sutskever I (2012) Training deep and recurrent networks with hessian-free optimization. Neural Netw 2:479–535
Google Scholar
Martens J, Sutskever I (2016) Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th international conference on machine learning, pp 1033–1040
Martens J, Bottou L, Littman M (2010) Deep learning via hessian-free optimization. In: Proceedings of the Twenty-seventh international conference on machine learning, vol 10, pp 735–742
Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin
MATH Google Scholar
Nocedal J, Wright SJ, Mikosch TV, Resnick SI, Robinson SM (1999) Numerical optimization, Springer series in operations research and financial engineering
Pascanu R, Bengio Y (2014) Revisiting natural gradient for deep networks. In: International conference on learning representations
Pearlmutter BA (1994) Fast exact multiplication by the hessian. Neural Comput 6(1):147–160
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster, R-CNN-Towards real-time object detection with region proposal networks. In: NIPS
Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A-R (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: IEEE workshop on automatic speech recognition and understanding (ASRU). pp 30–35
Schraudolph N (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–1738
Article Google Scholar
Seide F, Fu H, Droppo J, Li G, Yu D (2014) On parallelizability of stochastic gradient descent for speech DNNS. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 235–239
Shakya S (2019) Machine learning based nonlinearity determination for optical fiber communication-review. J Ubiquit Comput Commun Technol (UCCT) 1(02):121–127
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR, pp 1–12
Szegedy C, Liu W, YangqingJia PS, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vis Pattern Recognit 1:1–9
Google Scholar
Wiesler S, Li J, Xue J (2013) Investigations on hessian free optimization for cross-entropy training of deep neural network. In: INTERSPEECH, pp 3317–3321
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR
Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. arXiv:1609.03528
Zhang K, Ren X, Sun SJ (2015) Deep residual learning for image recognition. pp 1–12. arXiv:1512.03385

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, K.S.R Institute for Engineering and Technology, Tiruchengode, Namakkal(DT), Tamilnadu, India
T. Senthil
Department of Information Technology, K.S.Rangasamy College of Technology, Tiruchengode, Namakkal(DT), Tamilnadu, India
C. Rajan
Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam, Erode(DT), Tamilnadu, India
J. Deepika

Authors

T. Senthil
View author publications
You can also search for this author in PubMed Google Scholar
C. Rajan
View author publications
You can also search for this author in PubMed Google Scholar
J. Deepika
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Senthil.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

Animals and humans are not involved in this research work.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Senthil, T., Rajan, C. & Deepika, J. An improved optimization technique using Deep Neural Networks for digit recognition. Soft Comput 25, 1647–1658 (2021). https://doi.org/10.1007/s00500-020-05262-3

Download citation

Published: 05 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s00500-020-05262-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved optimization technique using Deep Neural Networks for digit recognition

Abstract

Access this article

Similar content being viewed by others

A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning

Convergence of Stochastic Gradient Descent in Deep Neural Network

A Machine Learning Framework for Recognizing Handwritten Digits Using Convexity-Based Feature Vector Encoding

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved optimization technique using Deep Neural Networks for digit recognition

Abstract

Access this article

Similar content being viewed by others

A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning

Convergence of Stochastic Gradient Descent in Deep Neural Network

A Machine Learning Framework for Recognizing Handwritten Digits Using Convexity-Based Feature Vector Encoding

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation