Abstract
In the world of information retrieval, recognizing hand-written digits stands as an interesting application of machine learning (deep learning). Though this is already a matured field, a way to recognize digits using an effective optimization using soft computing technique is a challenging task. Training such a system with larger data often fails due to higher computation and storage. In this paper, a recurrent deep neural network with hybrid mini-batch and stochastic Hessian-free optimization (MBSHF) is for accurate and faster convergence of predictions as outputs. A second-order approximation is used for achieving better performance for solving quadratic equations which greatly depends on computation and storage. Also, the proposed technique uses an iterative minimization algorithm for faster convergence using a random initialization though huge additional parameters are involved. As a solution, a convex approximation of MBSHF optimization is formulated and its performance on experimenting with the standard MNIST dataset is discussed. A recurrent deep neural network till a depth of 20 layers is successfully trained using the proposed MBSHF optimization, resulting in a better quality performance in computation and storage. The results are compared with other standard optimization techniques like mini-batch stochastic gradient descent (MBSGD), stochastic gradient descent (SGD), stochastic Hessian-free optimization (SHF), Hessian-free optimization (HF), nonlinear conjugate gradient (NCG). The proposed technique produced higher recognition accuracy of 12.2% better than MBSGD, 27.2% better than SHF, 35.4% better than HF, 40.2% better than NCG and 32% better than SGD on an average when applied to 50,000 testing sample size.
Similar content being viewed by others
References
Avancha DD, Mudigere S, Vaidynathan D, Sridharan K, Kalamkar S, Kaul D, Dubey B (2016) Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709
Bashar A (2019) Survey on evolving deep learning neural network architectures. J Artif Intell 1(02):73–82
Bengio Y, Boulanger-Lewandowski N, Pascanu R (2012) Advances in optimizing recurrent networks. arXiv:1212.0901
Bordes A, Bottou L, Gallinari P (2010) SGD-QN. Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754
Bottou L, Curtis FE, Nocedal J (2016) Opt. methods for large-scale machine learning. arXiv:1606.04838
Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Prog 134(1):1–29
Chapelle O, Erhan D (2011) Improved preconditioner for hessian free optimization. In: NIPS workshop on deep learning and unsupervised feature learning
Chen J, Pan X, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous SGD, pp 1–12. arXiv:1604.00981
Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2017) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Springer, vol 71, issue (2–3), pp 569–608
Deepika J, Senthil T, Rajan C, Surendar A (2018) Machine learning algorithms: a background artifact. Int J Eng Technol 7:143–149
Deng HG, Yu L, Dahl D, Mohamed GE, Jaitly A, Senior N, Vanhoucke A, Nguyen V, Sainath P (2012) Deep neural networks for acoustic modeling in speech recognition. Signal Process Magz 29:82–97
Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
Goodfellow I, Le QV, Saxe A, Lee H, Ng AY (2010) Measuring invariances in deep network. Adv Neural Inform Process Syst 22:646–654
Graves A (2012) Supervised sequence labeling with recurrent neural networks. Stud Comput Intell 385:5–13
Han D, Nie H, Chen J, Chen M, Deng Z, Zhang J (2018) Multi-modal haptic image recognition based on deep learning. Sensor Rev. https://doi.org/10.1108/SR-08-2017-0160
He X, Mudigere D, Smelyanskiy M, Takac M (2017) Distributed Hessian-free optimization for deep neural network. In: Association for the advancement of artificial intelligence, pp 1–8
Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR
Kiros R (2013) Training neural networks with stochastic hessian-free optimization. pp 1–12. arXiv:130.1.3641
Le Q, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272
LeCun YB, Orr LG, Muller KG, Muller K (1998) Efficient backprop on neural networks, Springer, vol 2, pp 918–92
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR
Martens J, Sutskever I (2012) Training deep and recurrent networks with hessian-free optimization. Neural Netw 2:479–535
Martens J, Sutskever I (2016) Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th international conference on machine learning, pp 1033–1040
Martens J, Bottou L, Littman M (2010) Deep learning via hessian-free optimization. In: Proceedings of the Twenty-seventh international conference on machine learning, vol 10, pp 735–742
Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin
Nocedal J, Wright SJ, Mikosch TV, Resnick SI, Robinson SM (1999) Numerical optimization, Springer series in operations research and financial engineering
Pascanu R, Bengio Y (2014) Revisiting natural gradient for deep networks. In: International conference on learning representations
Pearlmutter BA (1994) Fast exact multiplication by the hessian. Neural Comput 6(1):147–160
Ren S, He K, Girshick R, Sun J (2015) Faster, R-CNN-Towards real-time object detection with region proposal networks. In: NIPS
Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A-R (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: IEEE workshop on automatic speech recognition and understanding (ASRU). pp 30–35
Schraudolph N (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–1738
Seide F, Fu H, Droppo J, Li G, Yu D (2014) On parallelizability of stochastic gradient descent for speech DNNS. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 235–239
Shakya S (2019) Machine learning based nonlinearity determination for optical fiber communication-review. J Ubiquit Comput Commun Technol (UCCT) 1(02):121–127
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR, pp 1–12
Szegedy C, Liu W, YangqingJia PS, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vis Pattern Recognit 1:1–9
Wiesler S, Li J, Xue J (2013) Investigations on hessian free optimization for cross-entropy training of deep neural network. In: INTERSPEECH, pp 3317–3321
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR
Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. arXiv:1609.03528
Zhang K, Ren X, Sun SJ (2015) Deep residual learning for image recognition. pp 1–12. arXiv:1512.03385
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
Animals and humans are not involved in this research work.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Senthil, T., Rajan, C. & Deepika, J. An improved optimization technique using Deep Neural Networks for digit recognition. Soft Comput 25, 1647–1658 (2021). https://doi.org/10.1007/s00500-020-05262-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05262-3