Weight and bias initialization routines for Sigmoidal Feedforward Network

Mittal, Apeksha; Singh, Amit Prakash; Chandra, Pravin

doi:10.1007/s10489-020-01960-5

Weight and bias initialization routines for Sigmoidal Feedforward Network

Published: 07 November 2020

Volume 51, pages 2651–2671, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Apeksha Mittal¹,
Amit Prakash Singh¹ &
Pravin Chandra¹

312 Accesses
2 Citations
Explore all metrics

Abstract

The success of the Sigmoidal Feedforward Networks in the solution of complex learning task can be attributed to their Universal Approximation Property. These networks are trained using non-linear iterative optimization method (of first-order or second-order) to solve a learning task. The convergence rate in Sigmoidal Feedforward Network training is affected by the initial choice of weights, therefore, in this paper, we propose two new weight initialization routines (Routine-1 and Routine-2) using characteristics of input and output data and property of activation function. Routine-1 uses the linear dependency of weight update step size on derivative of activation function and thus, initialize weights and bias to activate the activation function region near zero (input), where the derivative is maximum, therefore, increasing the weight update step size, and hence, the convergence speed. The same principle is used to derive Routine-2, that initialize weights and bias to activate distinct point in the significant range of activation function (where significant range defines the non-saturated region in activation function), such that, each node evolves independently of each other, and act as distinct feature identifier. Initializing weights in significant range reduces chances of (hidden) nodes getting stuck in saturated state. The networks initialized using proposed routines has higher convergence and higher probability to achieve deeper minima. The efficiency of proposed routines is evaluated by comparing them to conventional random weight initialization routine and 11 weight initialization routines proposed in literature (4 well established routines and 7 recently proposed routines) for several benchmark problems. The proposed routine is also tested for larger networks sizes and larger datasets such as MNIST. The results show that the performance of proposed routines is better than conventional random weight initialization routine and 11 established weight initialization routines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Architectural Parameter-Independent Network Initialization Scheme for Sigmoidal Feedforward ANNs

Article 26 October 2019

Sarfaraz Masood, M. N. Doja & Pravin Chandra

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

Comparison of Random Weight Initialization to New Weight Initialization CONEXP

Notes

Linear activation function is used at the output layer in this paper, though sigmoidal activation function can also be used.
if we augment the input with x₀ = 1, then, the relation can also be written as \(n_{j}^{(h)}={\sum }_{i=0}^{I} w_{ji}x_{i}\), where, w_j0 ≡ 𝜃_j. In this paper, we show the threshold 𝜃_j explicitly.
Error is refered to the deviation of the actual output from desired output of SFN.
(−λ,λ) is the active/useful range of activation function, in which the value of it’s derivative is greater than 5% of the maximum value. The value of λ for hyperbolic tangent function equals 2.1783 (approximated upto four decimal places).
Consideration of initialization for γ, for the logistic activation function is same as that for the hyperbolic tangent activation function, as stated after relation (29).
https://www.kaggle.com/pablomonleon/montreal-bike-lanes
https://www.kaggle.com/pablomonleon/montreal-bike-lanes
Wine Quality Dataset can be used as both regression and classification problem. In this work, the dataset is used as a regression problem.
For each of 12 training problems and for 3 weight initialization routines, 30 networks are trained. Thus, we get 14, 3 × 30 matrix for MSE_train and 14, 3 × 30 matrix for MSE_test. Due to the volume of data obtained, the results for MSE_train and MSE_test are not reported in this work.
14 t-test tables are obtained for each training problem. The entry 1 in i th row and j th column, indicates that method i is statistically better than method j, whereas, entry 0 indicates that method i is statistically similar to method j. Due to the the volume of data, we report the summarized result of the data (all the tables obtained for each function are superimposed).
For each of 14 training problems and for each of 12 WIR’s, 30 networks are trained. Thus we get 14 12 × 30 matrix for MSE_train and 14, 12 × 30 matrix for MSE_test. Due to the volume of data obtained, the results for MSE_train and MSE_test are not reported in this work.

References

Haykin S (2004) Neural network. A comprehensive foundation. Neural Networks 2
Cybenko G (1989) Approximation by superpositions of a Sigmoidal Function. Math Control Signals Syst 2(4):303
Article MathSciNet Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359
Article Google Scholar
Funahashi KI (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2(3):183
Article Google Scholar
Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Netw 11(1):15
Article Google Scholar
Erdogmus D, Fontenla-Romero O, Principe JC, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16(2):325
Article Google Scholar
Riedmiller M, Braun H (1993) .. In: IEEE international conference on neural networks. IEEE, pp 586–591
Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989
Article Google Scholar
Hagan MT, Demuth HB, Beale MH, De Jesús O (1996) Neural network design, vol 20, Pws Pub, Boston
Møller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525
Article Google Scholar
Karlik B, Olgac AV (2011) Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int J Artif Intell Exp Syst 1(4):111
Google Scholar
Chandra P (2003) Sigmoidal function classes for feedforward artificial neural networks. Neural Process Lett 18(3):205
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep. DTIC Document
Yam JY, Chow TW (2000) A weight initialization method for improving training speed in feedforward neural network. Neurocomputing 30(1):219
Article Google Scholar
Yam YF, Chow TW (1995) Determining initial weights of feedforward neural networks based on least squares method. Neural Process Lett 2(2):13
Article Google Scholar
Bottou L (1988) .. In: Proceedings of the international workshop neural networks application. Neuro-Nimes, vol 88, pp 197–217
Nguyen D, Widrow B (1990) .. In: 1990 IJCNN international joint conference on neural networks. IEEE, pp 21–26
Drago G P, Ridella S (1992) Statistically controlled activation weight initialization (SCAWI). IEEE Trans Neural Netw 3(4):627
Article Google Scholar
Kim Y, Ra J (1991) In: 1991 IEEE international joint conference on neural networks. IEEE, pp 2396–2401
Chen CL, Nutter RS (1991) In: 1991 IEEE international joint conference on neural networks, 1991. IEEE, pp 2063–2068
Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12(2):430
Article Google Scholar
Sodhi SS, Chandra P, Tanwar S (2014) .. In: International joint conference on neural networks (IJCNN). IEEE, pp 291–298
Sodhi SS, Chandra P (2014) .. In: 2014 IEEE international advance computing conference (IACC). IEEE, pp 1275–1280
Mittal A, Chandra P, Singh AP (2015) .. In: 2015 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1371–1376
Qiao J, Li S, Li W (2016) Mutual information based weight initialization method for sigmoidal feedforward neural networks. Neurocomputing 207:676
Article Google Scholar
Timotheou S (2009) A novel weight initialization method for the random neural network. Neurocomputing 73(1–3):160
Article Google Scholar
Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2014) Solving the linear interval tolerance problem for weight initialization of neural networks. Neural Netw 54:17
Article Google Scholar
Bhatia M, Veenu, Chandra P (2018) A new weight initialization method for sigmoidal FFANN. J Intell Fuzzy Syst (Preprint) 1
Kurata G, Xiang B, Zhou B (2016) .. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies , pp 521–526
Mittal A, Singh AP, Chandra P (2020) .. In: Intelligent systems, technologies and applications. Springer, pp 141–153
Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551
Article Google Scholar
Chandra P, Singh Y (2004) Feedforward sigmoidal networks-equicontinuity and fault-tolerance properties. IEEE Trans Neural Netw 15(6):1350
Article Google Scholar
Hashem S, Schmeiser B (1995) Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Trans Neural Netw 6(3):792
Article Google Scholar
Lawrence S, Giles CL (2000) .. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the new millennium, vol 1. IEEE, pp 114–119
Jabbar H, Khan D (2015) Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study). Computer Science Communication and Instrumentation Devices
Sheela KG, Deepa SN (2013) Review on methods to fix number of hidden neurons in neural networks. Mathematical Problems in Engineering
LeCun YA, Bottou L, Orr GB, Müller KR (2012) .. In: Neural networks: tricks of the trade. Springer, pp 9–48
Takeda M, Goodman JW (1986) Neural networks for computation: Number representations and programming complexity. Appl Opt 25(18):3033
Article Google Scholar
Igel C, Hüsken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105
Article Google Scholar
Bache K, Lichman M (2013) Uci machine learning repository
Breiman L (1991) The II method for estimating multivariate functions from noisy data. Technometrics 33(2):125
MathSciNet MATH Google Scholar
Cherkassky V, Gehring D, Mulier F (1996) Comparison of adaptive methods for function estimation from samples. IEEE Trans Neural Netw 7(4):969
Article Google Scholar
Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory and methods. Wiley, New York
Book Google Scholar
Masters T (1993) Practical neural network recipes in C++. Morgan Kaufmann
Maechler M, Martin D, Schimert J, Csoppenszky M, Hwang J (1990) .. In: Proceedings of the 2nd international IEEE conference on tools for artificial intelligence, pp 350–358
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67
MathSciNet MATH Google Scholar
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547
Article Google Scholar
Ein-Dor P, Feldmesser J (1987) Attributes of the performance of central processing units: a relative performance prediction model. Commun ACM 30(4):308
Article Google Scholar
Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5 (1):81
Article Google Scholar
Kibler D, Aha DW, Albert MK (1989) Instance-based prediction of real-valued attributes. Comput Intell 5(2):51
Article Google Scholar
Martinez WL, Martinez AR (2007) Computational statistics handbook with MATLAB. Chapman and Hall/CRC
Mittal A, Singh AP, Chandra P (2017) A new weight initialization using statistically resilient method and Moore–Penrose inverse method for SFANN. Int J Recent Res Asp 4:98
Google Scholar
Fanaee-TH, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Progr Artif Intell 2(2–3):113
Article Google Scholar
Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346
Article Google Scholar
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist. 10, 34

Download references

Acknowledgements

This publication is an outcome of the R&D work undertaken project under the Visvesvaraya PhD Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation. The authors would like to thank editor of journal for providing feedback.

Author information

Authors and Affiliations

University School of Information, Communication & Technology, Guru Gobind Singh Indraprastha University, Delhi, India
Apeksha Mittal, Amit Prakash Singh & Pravin Chandra

Authors

Apeksha Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Amit Prakash Singh
View author publications
You can also search for this author in PubMed Google Scholar
Pravin Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Apeksha Mittal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mittal, A., Singh, A.P. & Chandra, P. Weight and bias initialization routines for Sigmoidal Feedforward Network. Appl Intell 51, 2651–2671 (2021). https://doi.org/10.1007/s10489-020-01960-5

Download citation

Accepted: 18 September 2020
Published: 07 November 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s10489-020-01960-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Weight and bias initialization routines for Sigmoidal Feedforward Network

Abstract

Access this article

Similar content being viewed by others

Architectural Parameter-Independent Network Initialization Scheme for Sigmoidal Feedforward ANNs

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

Comparison of Random Weight Initialization to New Weight Initialization CONEXP

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weight and bias initialization routines for Sigmoidal Feedforward Network

Abstract

Access this article

Similar content being viewed by others

Architectural Parameter-Independent Network Initialization Scheme for Sigmoidal Feedforward ANNs

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

Comparison of Random Weight Initialization to New Weight Initialization CONEXP

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation