Skip to main content
Log in

A review on weight initialization strategies for neural networks

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Over the past few years, neural networks have exhibited remarkable results for various applications in machine learning and computer vision. Weight initialization is a significant step employed before training any neural network. The weights of a network are initialized and then adjusted repeatedly while training the network. This is done till the loss converges to a minimum value and an ideal weight matrix is obtained. Thus weight initialization directly drives the convergence of a network. Therefore, the selection of an appropriate weight initialization scheme becomes necessary for end-to-end training. An appropriate technique initializes the weights such that the training of the network is accelerated and the performance is improved. This paper discusses various advances in weight initialization for neural networks. The weight initialization techniques in the literature adopted for feed-forward neural network, convolutional neural network, recurrent neural network and long short term memory network have been discussed in this paper. These techniques are classified as (1) initialization techniques without pre-training, which are further classified into random initialization and data-driven initialization, (2) initialization techniques with pre-training. The different weight initialization and weight optimization techniques which select optimal weights for non-iterative training mechanism have also been discussed. We provide a close overview of different initialization schemes in these categories. This paper concludes with discussions on existing schemes and the future scope for research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2014) Solving the linear interval tolerance problem for weight initialization of neural networks. Neural Netw 54:17–37

    MATH  Google Scholar 

  • Aguirre D, Fuentes O (2019) Improving weight initialization of relu and output layers. In: International conference on artificial neural networks. Springer, pp 170–184

  • Alberti M, Seuret M, Pondenkandath V, Ingold R, Liwicki M (2017) Historical document image segmentation with LDA-initialized deep neural networks. In: Proceedings of the 4th international workshop on historical document imaging and processing, pp 95–100

  • Atakulreka A, Sutivong D (2007) Avoiding local minima in feedforward neural networks by simultaneous learning. In: Australasian joint conference on artificial intelligence. Springer, pp 100–109

  • Balduzzi D, Frean M, Leary L, Lewis J, Ma KWD, McWilliams B (2017) The shattered gradients problem: if resnets are the answer, then what is the question? In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR. org, pp 342–350

  • Bengio Y, Simard P, Frasconi P et al (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Google Scholar 

  • Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160

  • Burnaev E, Erofeev P (2016) The influence of parameter initialization on the training time and accuracy of a nonlinear regression model. J Commun Technol Electron 61(6):646–660

    Google Scholar 

  • Cachi PG, Ventura S, Cios KJ (2020) Fast convergence of competitive spiking neural networks with sample-based weight initialization. In: International conference on information processing and management of uncertainty in knowledge-based systems. Springer, pp 773–786

  • Caflisch RE et al (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numer 1998:1–49

    MathSciNet  MATH  Google Scholar 

  • Cao J, Lin Z, Huang GB (2012) Self-adaptive evolutionary extreme learning machine. Neural Process Lett 36(3):285–305

    Google Scholar 

  • Cao W, Gao J, Ming Z, Cai S (2017a) Some tricks in parameter selection for extreme learning machine. In: IOP conference series: materials science and engineering. IOP Publishing, vol 261, p 012002

  • Cao W, Gao J, Ming Z, Cai S, Zheng H (2017b) Impact of probability distribution selection on RVFL performance. In: International conference on smart computing and communication. Springer, pp 114–124

  • Cao W, Wang X, Ming Z, Gao J (2018) A review on neural networks with random weights. Neurocomputing 275:278–287

    Google Scholar 

  • Cao W, Patwary MJ, Yang P, Wang X, Ming Z (2019) An initial study on the relationship between meta features of dataset and the initialization of NNRW. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  • Cao W, Hu L, Gao J, Wang X, Ming Z (2020) A study on the relationship between the rank of input data and the performance of random weight neural network. Neural Comput Appl 32:1–12

    Google Scholar 

  • Cetin BC, Burdick JW, Barhen J (1993) Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In: IEEE international conference on neural networks. IEEE, pp 836–842

  • Chen CL, Nutter RS (1991) Improving the training speed of three-layer feedforward neural nets by optimal estimation of the initial weights. In: [Proceedings] 1991 IEEE international joint conference on neural networks. IEEE, pp 2063–2068

  • Cho JH, Lee DJ, Chun MG (2007) Parameter optimization of extreme learning machine using bacterial foraging algorithm. J Korean Inst Intell Syst 17(6):807–812

    Google Scholar 

  • Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems, pp 3079–3087

  • De Castro LN, Iyoda EM, Von Zuben FJ, Gudwin R (1998) Feedforward neural network initialization: an evolutionary approach. In: Proceedings 5th Brazilian symposium on neural networks (Cat. No. 98EX209). IEEE, pp 43–48

  • de Oliveira JFL, Ludermir TB (2012) An evolutionary extreme learning machine based on fuzzy fish swarms. In: Proceedings on the international conference on artificial intelligence (ICAI). The Steering Committee of The World Congress in Computer Science, Computer, p 1

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Demuth H, Beale M (2000) Neural network toolbox user’s guide. The MathWorks, Inc, Portola Valley

    Google Scholar 

  • Dolezel P, Skrabanek P, Gago L (2016) Weight initialization possibilities for feedforward neural network with linear saturated activation functions. IFAC-PapersOnLine 49(25):49–54

    Google Scholar 

  • Drago GP, Ridella S (1992) Statistically controlled activation weight initialization (SCAWI). IEEE Trans Neural Netw 10(1109/72):143378

    Google Scholar 

  • Duch W, Korczak J (1998) Optimization and global minimization methods suitable for neural networks. Neural Comput Surv 2:163–212

    Google Scholar 

  • Duch W, Adamczak R, Jankowski N (1997) Initialization and optimization of multilayered perceptrons. In: Third conference on neural networks and their applications, pp 99–104

  • Emmett F, Joe R (2019) The effect of varying training on neural network weights and visualizations. J Emerg Investig 2(1)

  • Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  • Eshtay M, Faris H, Obeid N (2018) Improving extreme learning machine by competitive swarm optimization and its application for medical diagnosis problems. Expert Syst Appl 104:134–152

    Google Scholar 

  • Fernández-Redondo M, Hernández-Espinosa C (2001) Weight initialization methods for multilayer feedforward. In: ESANN, pp 119–124

  • Ferreira MF, Camacho R, Teixeira LF (2018) Autoencoders as weight initialization of deep classification networks applied to papillary thyroid carcinoma. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 629–632

  • Gabrielsson RB, Carlsson G (2018) A look at the topology of convolutional neural networks. arXiv:181003234

  • Gan Y, Liu J, Dong J, Zhong G (2015) A PCA-based convolutional network. arXiv:150503703

  • García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064

    Google Scholar 

  • Ghazi MM, Nielsen M, Pai A, Modat M, Cardoso MJ, Ourselin S, Sørensen L (2019) On the initialization of long short-term memory networks. In: International conference on neural information processing. Springer, pp 275–286

  • Ghosh R, Verma B (2003) A hierarchical method for finding optimal architecture and weights using evolutionary least square based learning. Int J Neural Syst 13(01):13–24

    Google Scholar 

  • Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  • Go J, Lee C (1999) Analyzing weight distribution of neural networks. In: IJCNN’99. International joint conference on neural networks. Proceedings (Cat. No. 99CH36339). IEEE, vol 2, pp 1154–1157

  • Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993

    Google Scholar 

  • Halawa K (2014) A new multilayer perceptron initialisation method with selection of weights on the basis of the function variability. In: International conference on artificial intelligence and soft computing. Springer, pp 47–58

  • Han F, Yao HF, Ling QH (2011) An improved extreme learning machine based on particle swarm optimization. In: International conference on intelligent computing. Springer, pp 699–704

  • Hasegawa R, Hotta K (2016) Plsnet: a simple network using partial least squares regression for image classification. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 1601–1606

  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  • Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116

    MATH  Google Scholar 

  • Hsiao TCR, Lin CW, Chiang HK (2003) Partial least-squares algorithm for weights initialization of backpropagation network. Neurocomputing 50:237–247

    MATH  Google Scholar 

  • Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541). IEEE, vol 2, pp 985–990

  • Huang FJ, Boureau YL, LeCun Y, et al. (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  • Huang GB, Bai Z, Kasun LLC, Vong CM (2015) Local receptive fields based extreme learning machine. IEEE Comput Intell Mag 10(2):18–29

    Google Scholar 

  • Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:150203167

  • Javed K, Gouriveau R, Zerhouni N (2014) Sw-elm: a summation wavelet extreme learning machine algorithm with a priori parameter initialization. Neurocomputing 123:299–307

    Google Scholar 

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks. IEEE, vol 4, pp 1942–1948

  • Kim Y, Ra J (1991) Weight value initialization for improving training speed in the backpropagation network. In: [Proceedings] 1991 IEEE international joint conference on neural networks. IEEE, pp 2396–2401

  • Koturwar S, Merchant S (2017) Weight initialization of deep neural networks (DNNS) using data statistics. arXiv:171010570

  • Krähenbühl P, Doersch C, Donahue J, Darrell T (2015) Data-dependent initializations of convolutional neural networks. arXiv:151106856

  • Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40

    MATH  Google Scholar 

  • LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient backprop. In: Neural networks: tricks of the trade. Springer, pp 9–48

  • Lehtokangas M, Saarinen J (1998) Weight initialization with reference patterns. Neurocomputing 20(1–3):265–278

    Google Scholar 

  • Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw 14(1):79–88

    Google Scholar 

  • Li G, Alnuweiri H, Wu Y, Li H (1993) Acceleration of back propagation through initial weight pre-training with delta rule. In: IEEE international conference on neural networks. IEEE, pp 580–585

  • Li J, Cheng Jh, Shi Jy, Huang F (2012) Brief introduction of back propagation (BP) neural network algorithm and its improvement. In: Advances in computer science and information engineering. Springer, pp 553–558

  • Li S, Zhao Z, Liu T, Hu R, Du X (2017) Initializing convolutional filters with semantic features for text classification. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1884–1889

  • Masci J, Meier U, Cireşan D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: International conference on artificial neural networks. Springer, pp 52–59

  • Masden M, Sinha D (2020) Linear discriminant initialization for feed-forward neural networks. arXiv:200712782

  • Masood S, Doja M, Chandra P (2015) Analysis of weight initialization methods for gradient descent with momentum. In: 2015 International conference on soft computing techniques and implementations (ICSCTI). IEEE, pp 131–136

  • Matias T, Araújo R, Antunes CH, Gabriel D (2013) Genetically optimized extreme learning machine. In: 2013 IEEE 18th conference on emerging technologies and factory automation (ETFA). IEEE, pp 1–8

  • Mishkin D, Matas J (2015) All you need is a good init. arXiv:151106422

  • Mittal A, Singh AP, Chandra P (2020) A modification to the Nguyen-Widrow weight initialization method. In: Intelligent systems, technologies and applications. Springer, pp 141–153

  • Mohapatra P, Chakravarty S, Dash PK (2015) An improved cuckoo search based extreme learning machine for medical data classification. Swarm Evol Comput 24:25–49

    Google Scholar 

  • Murru N, Rossini R (2016) A Bayesian approach for initialization of weights in backpropagation neural net with application to character recognition. Neurocomputing 193:92–105

    Google Scholar 

  • Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: 1990 IJCNN international joint conference on neural networks. IEEE, pp 21–26

  • Nguyen G, Dlugolinsky S, Bobák M, Tran V, García ÁL, Heredia I, Malík P, Hluchỳ L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52(1):77–124

    Google Scholar 

  • Nwankpa C, Ijomah W, Gachagan A, Marshall S (2018) Activation functions: Comparison of trends in practice and research for deep learning. arXiv:181103378

  • Pacifico LD, Ludermir TB (2013) Evolutionary extreme learning machine based on particle swarm optimization and clustering strategies. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–6

  • Paine TL, Khorrami P, Han W, Huang TS (2014) An analysis of unsupervised pre-training in light of recent advances. arXiv:14126597

  • Pang S, Yang X (2016) Deep convolutional extreme learning machine and its application in handwritten digit classification. Comput Intell Neurosci 2016:3049632

    Google Scholar 

  • Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst Mag 22(3):52–67

    Google Scholar 

  • Pavelka A, Procházka A (2004) Algorithms for initialization of neural network weights. In: In Proceedings of the 12th annual conference, MATLAB, pp 453–459

  • Qiao J, Li S, Li W (2016) Mutual information based weight initialization method for sigmoidal feedforward neural networks. Neurocomputing 207:676–683

    Google Scholar 

  • Ramos EZ, Nakakuni M, Yfantis E (2017) Quantitative measures to evaluate neural network weight initialization strategies. In: 2017 IEEE 7th annual computing and communication workshop and conference (CCWC). IEEE, pp 1–7

  • Rodriguez FJ, Garcia-Martinez C, Lozano M (2012) Hybrid metaheuristics based on evolutionary algorithms and simulated annealing: taxonomy, comparison, and synergy test. IEEE Trans Evol Comput 16(6):787–800

    Google Scholar 

  • Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V (2017) Stacked deep convolutional auto-encoders for emotion recognition from facial expressions. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 1586–1593

  • Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science. Technical report

  • Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:13126120

  • Schneider J (2020) Correlated initialization for correlated data. arXiv:200304422

  • Shimodaira H (1994) A weight value initialization method for improving learning performance of the backpropagation algorithm in neural networks. In: Proceedings sixth international conference on tools with artificial intelligence. TAI 94, IEEE, pp 672–675

  • Sodhi SS, Chandra P, Tanwar S (2014) A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298

  • Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    MathSciNet  MATH  Google Scholar 

  • Subudhi B, Jena D (2008) Differential evolution and levenberg marquardt trained neural network scheme for nonlinear system identification. Neural Process Lett 27(3):285–296

    Google Scholar 

  • Sudowe P, Leibe B (2016) Patchit: self-supervised network weight initialization for fine-grained recognition. BMVC 1:24–25

    Google Scholar 

  • Tan S, Li B (2014) Stacked convolutional auto-encoders for steganalysis of digital images. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific. IEEE, pp 1–4

  • Tang J, Wang D, Zhang Z, He L, Xin J, Xu Y (2017) Weed identification based on k-means feature learning combined with convolutional neural network. Comput Electron Agric 135:63–70

    Google Scholar 

  • Tao X, Zhou X, He YL, Ashfaq RAR (2016) Impact of variances of random weights and biases on extreme learning machine. JSW 11(5):440–454

    Google Scholar 

  • Tapson J, De Chazal P, van Schaik A (2015) Explicit computation of input weights in extreme learning machines. In: Proceedings of ELM-2014 vol 1. Springer, pp 41–49

  • Thimm G, Fiesler E (1997) High-order and multilayer perceptron initialization. IEEE Trans Neural Netw 8(2):349–359

    Google Scholar 

  • Trinh TH, Luong MT, Le QV (2019) Selfie: self-supervised pretraining for image embedding. arXiv:190602940

  • Tu S, Huang Y, Liu G et al (2017) Csfl: a novel unsupervised convolution neural network approach for visual pattern classification. AI Commun 30(5):311–324

    MathSciNet  Google Scholar 

  • Vorontsov E, Trabelsi C, Kadoury S, Pal C (2017) On orthogonality and learning recurrent networks with long term dependencies. In: Proceedings of the 34th international conference on machine learning-volume 70, JMLR. org, pp 3570–3578

  • Wang X, Cao W (2018) Non-iterative approaches in training feed-forward neural networks and their applications

  • Wang W, Liu X (2017) The selection of input weights of extreme learning machine: a sample structure preserving point of view. Neurocomputing 261:28–36

    Google Scholar 

  • Wessels LF, Barnard E (1992) Avoiding false local minima by proper initialization of connections. IEEE Trans Neural Netw 3(6):899–905

    Google Scholar 

  • Wessels L, Barnard E, Van Rooyen E (1990) The physical correlates of local minima. In: International neural network conference

  • Wiehman S, Kroon S, De Villiers H (2016) Unsupervised pre-training for fully convolutional neural networks. In: 2016 Pattern recognition association of south africa and robotics and mechatronics international conference (PRASA-RobMech). IEEE, pp 1–6

  • Xu Y, Shu Y (2006) Evolutionary extreme learning machine–based on particle swarm optimization. In: International symposium on neural networks. Springer, pp 644–652

  • Yam JYF, Chow TWS (2000) A weight initialization method for improving training speed in feedforward neural network. Neurocomputing. https://doi.org/10.1016/S0925-2312(99)00127-7

    Article  Google Scholar 

  • Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12(2):430–434

    Google Scholar 

  • Yam YF, Leung CT, Tam PK, Siu WC (2002) An independent component analysis based weight initialization method for multilayer perceptrons. Neurocomputing 48(1–4):807–818

    MATH  Google Scholar 

  • Yang XS (2010) Nature-inspired metaheuristic algorithms. Luniver Press, Bristol

    Google Scholar 

  • Yang Z, Wen X, Wang Z (2015) Qpso-elm: An evolutionary extreme learning machine based on quantum-behaved particle swarm optimization. In: 2015 seventh international conference on advanced computational intelligence (ICACI). IEEE, pp 69–72

  • Yang H, Ding X, Chan R, Hu H, Peng Y, Zeng T (2020) A new initialization method based on normed statistical spaces in deep networks. Inverse Probl Imaging 15:147

    MathSciNet  MATH  Google Scholar 

  • Yoon HS, Bae CS, Min BW (1995) Neural networks using modified initial connection strengths by the importance of feature elements. In: 1995 IEEE international conference on systems, man and cybernetics. Intelligent Systems for the 21st century. IEEE, vol 1, pp 458–461

  • Zhang J, Sanderson AC (2009) Jade: adaptive differential evolution with optional external archive. IEEE Trans Evol Comput 13(5):945–958

    Google Scholar 

  • Zhang Q, Zhang L (2018) Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Front Comput Sci 12(6):1140–1148

    Google Scholar 

  • Zhang Y, Cai Z, Wu J, Wang X, Liu X (2015) A memetic algorithm based extreme learning machine for classification. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  • Zhang X, Lin X, Ashfaq RAR (2018) Impact of different random initializations on generalization performance of extreme learning machine. JCP 13(7):805–822

    Google Scholar 

  • Zhang H, Dauphin YN, Ma T (2019) Fixup initialization: residual learning without normalization. arXiv:190109321

  • Zhu QY, Qin AK, Suganthan PN, Huang GB (2005) Evolutionary extreme learning machine. Pattern Recognit 38(10):1759–1763

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Shrinivas P. Mahajan, Head of Department, E&TC, College of Engineering, Pune for encouraging to carry the research work at the department. The authors would also like to thank Center of Excellence in Signal and Image Processing (CoE-S&IP) at College of Engineering, Pune for providing the necessary resources for this research work. The authors would also like to thank the reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meenal V. Narkhede.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narkhede, M.V., Bartakke, P.P. & Sutaone, M.S. A review on weight initialization strategies for neural networks. Artif Intell Rev 55, 291–322 (2022). https://doi.org/10.1007/s10462-021-10033-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10033-z

Keywords

Navigation