Skip to main content
Log in

An improved optimization technique using Deep Neural Networks for digit recognition

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In the world of information retrieval, recognizing hand-written digits stands as an interesting application of machine learning (deep learning). Though this is already a matured field, a way to recognize digits using an effective optimization using soft computing technique is a challenging task. Training such a system with larger data often fails due to higher computation and storage. In this paper, a recurrent deep neural network with hybrid mini-batch and stochastic Hessian-free optimization (MBSHF) is for accurate and faster convergence of predictions as outputs. A second-order approximation is used for achieving better performance for solving quadratic equations which greatly depends on computation and storage. Also, the proposed technique uses an iterative minimization algorithm for faster convergence using a random initialization though huge additional parameters are involved. As a solution, a convex approximation of MBSHF optimization is formulated and its performance on experimenting with the standard MNIST dataset is discussed. A recurrent deep neural network till a depth of 20 layers is successfully trained using the proposed MBSHF optimization, resulting in a better quality performance in computation and storage. The results are compared with other standard optimization techniques like mini-batch stochastic gradient descent (MBSGD), stochastic gradient descent (SGD), stochastic Hessian-free optimization (SHF), Hessian-free optimization (HF), nonlinear conjugate gradient (NCG). The proposed technique produced higher recognition accuracy of 12.2% better than MBSGD, 27.2% better than SHF, 35.4% better than HF, 40.2% better than NCG and 32% better than SGD on an average when applied to 50,000 testing sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Avancha DD, Mudigere S, Vaidynathan D, Sridharan K, Kalamkar S, Kaul D, Dubey B (2016) Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709

  • Bashar A (2019) Survey on evolving deep learning neural network architectures. J Artif Intell 1(02):73–82

    Google Scholar 

  • Bengio Y, Boulanger-Lewandowski N, Pascanu R (2012) Advances in optimizing recurrent networks. arXiv:1212.0901

  • Bordes A, Bottou L, Gallinari P (2010) SGD-QN. Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754

    MathSciNet  MATH  Google Scholar 

  • Bottou L, Curtis FE, Nocedal J (2016) Opt. methods for large-scale machine learning. arXiv:1606.04838

  • Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Prog 134(1):1–29

    Article  MathSciNet  Google Scholar 

  • Chapelle O, Erhan D (2011) Improved preconditioner for hessian free optimization. In: NIPS workshop on deep learning and unsupervised feature learning

  • Chen J, Pan X, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous SGD, pp 1–12. arXiv:1604.00981

  • Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2017) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Springer, vol 71, issue (2–3), pp 569–608

  • Deepika J, Senthil T, Rajan C, Surendar A (2018) Machine learning algorithms: a background artifact. Int J Eng Technol 7:143–149

    Article  Google Scholar 

  • Deng HG, Yu L, Dahl D, Mohamed GE, Jaitly A, Senior N, Vanhoucke A, Nguyen V, Sainath P (2012) Deep neural networks for acoustic modeling in speech recognition. Signal Process Magz 29:82–97

    Google Scholar 

  • Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  • Goodfellow I, Le QV, Saxe A, Lee H, Ng AY (2010) Measuring invariances in deep network. Adv Neural Inform Process Syst 22:646–654

    Google Scholar 

  • Graves A (2012) Supervised sequence labeling with recurrent neural networks. Stud Comput Intell 385:5–13

    MathSciNet  MATH  Google Scholar 

  • Han D, Nie H, Chen J, Chen M, Deng Z, Zhang J (2018) Multi-modal haptic image recognition based on deep learning. Sensor Rev. https://doi.org/10.1108/SR-08-2017-0160

    Article  Google Scholar 

  • He X, Mudigere D, Smelyanskiy M, Takac M (2017) Distributed Hessian-free optimization for deep neural network. In: Association for the advancement of artificial intelligence, pp 1–8

  • Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech

  • Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR

  • Kiros R (2013) Training neural networks with stochastic hessian-free optimization. pp 1–12. arXiv:130.1.3641

  • Le Q, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272

  • LeCun YB, Orr LG, Muller KG, Muller K (1998) Efficient backprop on neural networks, Springer, vol 2, pp 918–92

  • Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR

  • Martens J, Sutskever I (2012) Training deep and recurrent networks with hessian-free optimization. Neural Netw 2:479–535

    Google Scholar 

  • Martens J, Sutskever I (2016) Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th international conference on machine learning, pp 1033–1040

  • Martens J, Bottou L, Littman M (2010) Deep learning via hessian-free optimization. In: Proceedings of the Twenty-seventh international conference on machine learning, vol 10, pp 735–742

  • Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin

    MATH  Google Scholar 

  • Nocedal J, Wright SJ, Mikosch TV, Resnick SI, Robinson SM (1999) Numerical optimization, Springer series in operations research and financial engineering

  • Pascanu R, Bengio Y (2014) Revisiting natural gradient for deep networks. In: International conference on learning representations

  • Pearlmutter BA (1994) Fast exact multiplication by the hessian. Neural Comput 6(1):147–160

    Article  Google Scholar 

  • Ren S, He K, Girshick R, Sun J (2015) Faster, R-CNN-Towards real-time object detection with region proposal networks. In: NIPS

  • Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A-R (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: IEEE workshop on automatic speech recognition and understanding (ASRU). pp 30–35

  • Schraudolph N (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–1738

    Article  Google Scholar 

  • Seide F, Fu H, Droppo J, Li G, Yu D (2014) On parallelizability of stochastic gradient descent for speech DNNS. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 235–239

  • Shakya S (2019) Machine learning based nonlinearity determination for optical fiber communication-review. J Ubiquit Comput Commun Technol (UCCT) 1(02):121–127

    Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR, pp 1–12

  • Szegedy C, Liu W, YangqingJia PS, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vis Pattern Recognit 1:1–9

    Google Scholar 

  • Wiesler S, Li J, Xue J (2013) Investigations on hessian free optimization for cross-entropy training of deep neural network. In: INTERSPEECH, pp 3317–3321

  • Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR

  • Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. arXiv:1609.03528

  • Zhang K, Ren X, Sun SJ (2015) Deep residual learning for image recognition. pp 1–12. arXiv:1512.03385

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Senthil.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

Animals and humans are not involved in this research work.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Senthil, T., Rajan, C. & Deepika, J. An improved optimization technique using Deep Neural Networks for digit recognition. Soft Comput 25, 1647–1658 (2021). https://doi.org/10.1007/s00500-020-05262-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05262-3

Keywords

Navigation