A review on weight initialization strategies for neural networks

Narkhede, Meenal V.; Bartakke, Prashant P.; Sutaone, Mukul S.

doi:10.1007/s10462-021-10033-z

A review on weight initialization strategies for neural networks

Published: 28 June 2021

Volume 55, pages 291–322, (2022)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Meenal V. Narkhede ORCID: orcid.org/0000-0002-9531-4509¹,
Prashant P. Bartakke¹ &
Mukul S. Sutaone¹

5785 Accesses
82 Citations
5 Altmetric
Explore all metrics

Abstract

Over the past few years, neural networks have exhibited remarkable results for various applications in machine learning and computer vision. Weight initialization is a significant step employed before training any neural network. The weights of a network are initialized and then adjusted repeatedly while training the network. This is done till the loss converges to a minimum value and an ideal weight matrix is obtained. Thus weight initialization directly drives the convergence of a network. Therefore, the selection of an appropriate weight initialization scheme becomes necessary for end-to-end training. An appropriate technique initializes the weights such that the training of the network is accelerated and the performance is improved. This paper discusses various advances in weight initialization for neural networks. The weight initialization techniques in the literature adopted for feed-forward neural network, convolutional neural network, recurrent neural network and long short term memory network have been discussed in this paper. These techniques are classified as (1) initialization techniques without pre-training, which are further classified into random initialization and data-driven initialization, (2) initialization techniques with pre-training. The different weight initialization and weight optimization techniques which select optimal weights for non-iterative training mechanism have also been discussed. We provide a close overview of different initialization schemes in these categories. This paper concludes with discussions on existing schemes and the future scope for research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Delving into Feature Maps: An Explanatory Analysis to Evaluate Weight Initialization

Improving Weight Initialization of ReLU and Output Layers

A New Initialization Method for Neural Networks with Weight Sharing

References

Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2014) Solving the linear interval tolerance problem for weight initialization of neural networks. Neural Netw 54:17–37
MATH Google Scholar
Aguirre D, Fuentes O (2019) Improving weight initialization of relu and output layers. In: International conference on artificial neural networks. Springer, pp 170–184
Alberti M, Seuret M, Pondenkandath V, Ingold R, Liwicki M (2017) Historical document image segmentation with LDA-initialized deep neural networks. In: Proceedings of the 4th international workshop on historical document imaging and processing, pp 95–100
Atakulreka A, Sutivong D (2007) Avoiding local minima in feedforward neural networks by simultaneous learning. In: Australasian joint conference on artificial intelligence. Springer, pp 100–109
Balduzzi D, Frean M, Leary L, Lewis J, Ma KWD, McWilliams B (2017) The shattered gradients problem: if resnets are the answer, then what is the question? In: Proceedings of the 34th international conference on machine learning, vol 70, JMLR. org, pp 342–350
Bengio Y, Simard P, Frasconi P et al (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, pp 153–160
Burnaev E, Erofeev P (2016) The influence of parameter initialization on the training time and accuracy of a nonlinear regression model. J Commun Technol Electron 61(6):646–660
Google Scholar
Cachi PG, Ventura S, Cios KJ (2020) Fast convergence of competitive spiking neural networks with sample-based weight initialization. In: International conference on information processing and management of uncertainty in knowledge-based systems. Springer, pp 773–786
Caflisch RE et al (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numer 1998:1–49
MathSciNet MATH Google Scholar
Cao J, Lin Z, Huang GB (2012) Self-adaptive evolutionary extreme learning machine. Neural Process Lett 36(3):285–305
Google Scholar
Cao W, Gao J, Ming Z, Cai S (2017a) Some tricks in parameter selection for extreme learning machine. In: IOP conference series: materials science and engineering. IOP Publishing, vol 261, p 012002
Cao W, Gao J, Ming Z, Cai S, Zheng H (2017b) Impact of probability distribution selection on RVFL performance. In: International conference on smart computing and communication. Springer, pp 114–124
Cao W, Wang X, Ming Z, Gao J (2018) A review on neural networks with random weights. Neurocomputing 275:278–287
Google Scholar
Cao W, Patwary MJ, Yang P, Wang X, Ming Z (2019) An initial study on the relationship between meta features of dataset and the initialization of NNRW. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Cao W, Hu L, Gao J, Wang X, Ming Z (2020) A study on the relationship between the rank of input data and the performance of random weight neural network. Neural Comput Appl 32:1–12
Google Scholar
Cetin BC, Burdick JW, Barhen J (1993) Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In: IEEE international conference on neural networks. IEEE, pp 836–842
Chen CL, Nutter RS (1991) Improving the training speed of three-layer feedforward neural nets by optimal estimation of the initial weights. In: [Proceedings] 1991 IEEE international joint conference on neural networks. IEEE, pp 2063–2068
Cho JH, Lee DJ, Chun MG (2007) Parameter optimization of extreme learning machine using bacterial foraging algorithm. J Korean Inst Intell Syst 17(6):807–812
Google Scholar
Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems, pp 3079–3087
De Castro LN, Iyoda EM, Von Zuben FJ, Gudwin R (1998) Feedforward neural network initialization: an evolutionary approach. In: Proceedings 5th Brazilian symposium on neural networks (Cat. No. 98EX209). IEEE, pp 43–48
de Oliveira JFL, Ludermir TB (2012) An evolutionary extreme learning machine based on fuzzy fish swarms. In: Proceedings on the international conference on artificial intelligence (ICAI). The Steering Committee of The World Congress in Computer Science, Computer, p 1
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Demuth H, Beale M (2000) Neural network toolbox user’s guide. The MathWorks, Inc, Portola Valley
Google Scholar
Dolezel P, Skrabanek P, Gago L (2016) Weight initialization possibilities for feedforward neural network with linear saturated activation functions. IFAC-PapersOnLine 49(25):49–54
Google Scholar
Drago GP, Ridella S (1992) Statistically controlled activation weight initialization (SCAWI). IEEE Trans Neural Netw 10(1109/72):143378
Google Scholar
Duch W, Korczak J (1998) Optimization and global minimization methods suitable for neural networks. Neural Comput Surv 2:163–212
Google Scholar
Duch W, Adamczak R, Jankowski N (1997) Initialization and optimization of multilayered perceptrons. In: Third conference on neural networks and their applications, pp 99–104
Emmett F, Joe R (2019) The effect of varying training on neural network weights and visualizations. J Emerg Investig 2(1)
Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Eshtay M, Faris H, Obeid N (2018) Improving extreme learning machine by competitive swarm optimization and its application for medical diagnosis problems. Expert Syst Appl 104:134–152
Google Scholar
Fernández-Redondo M, Hernández-Espinosa C (2001) Weight initialization methods for multilayer feedforward. In: ESANN, pp 119–124
Ferreira MF, Camacho R, Teixeira LF (2018) Autoencoders as weight initialization of deep classification networks applied to papillary thyroid carcinoma. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 629–632
Gabrielsson RB, Carlsson G (2018) A look at the topology of convolutional neural networks. arXiv:181003234
Gan Y, Liu J, Dong J, Zhong G (2015) A PCA-based convolutional network. arXiv:150503703
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Google Scholar
Ghazi MM, Nielsen M, Pai A, Modat M, Cardoso MJ, Ourselin S, Sørensen L (2019) On the initialization of long short-term memory networks. In: International conference on neural information processing. Springer, pp 275–286
Ghosh R, Verma B (2003) A hierarchical method for finding optimal architecture and weights using evolutionary least square based learning. Int J Neural Syst 13(01):13–24
Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Go J, Lee C (1999) Analyzing weight distribution of neural networks. In: IJCNN’99. International joint conference on neural networks. Proceedings (Cat. No. 99CH36339). IEEE, vol 2, pp 1154–1157
Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993
Google Scholar
Halawa K (2014) A new multilayer perceptron initialisation method with selection of weights on the basis of the function variability. In: International conference on artificial intelligence and soft computing. Springer, pp 47–58
Han F, Yao HF, Ling QH (2011) An improved extreme learning machine based on particle swarm optimization. In: International conference on intelligent computing. Springer, pp 699–704
Hasegawa R, Hotta K (2016) Plsnet: a simple network using partial least squares regression for image classification. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 1601–1606
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116
MATH Google Scholar
Hsiao TCR, Lin CW, Chiang HK (2003) Partial least-squares algorithm for weights initialization of backpropagation network. Neurocomputing 50:237–247
MATH Google Scholar
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541). IEEE, vol 2, pp 985–990
Huang FJ, Boureau YL, LeCun Y, et al. (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Huang GB, Bai Z, Kasun LLC, Vong CM (2015) Local receptive fields based extreme learning machine. IEEE Comput Intell Mag 10(2):18–29
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:150203167
Javed K, Gouriveau R, Zerhouni N (2014) Sw-elm: a summation wavelet extreme learning machine algorithm with a priori parameter initialization. Neurocomputing 123:299–307
Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks. IEEE, vol 4, pp 1942–1948
Kim Y, Ra J (1991) Weight value initialization for improving training speed in the backpropagation network. In: [Proceedings] 1991 IEEE international joint conference on neural networks. IEEE, pp 2396–2401
Koturwar S, Merchant S (2017) Weight initialization of deep neural networks (DNNS) using data statistics. arXiv:171010570
Krähenbühl P, Doersch C, Donahue J, Darrell T (2015) Data-dependent initializations of convolutional neural networks. arXiv:151106856
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10:1–40
MATH Google Scholar
LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient backprop. In: Neural networks: tricks of the trade. Springer, pp 9–48
Lehtokangas M, Saarinen J (1998) Weight initialization with reference patterns. Neurocomputing 20(1–3):265–278
Google Scholar
Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the structure and parameters of a neural network using an improved genetic algorithm. IEEE Trans Neural Netw 14(1):79–88
Google Scholar
Li G, Alnuweiri H, Wu Y, Li H (1993) Acceleration of back propagation through initial weight pre-training with delta rule. In: IEEE international conference on neural networks. IEEE, pp 580–585
Li J, Cheng Jh, Shi Jy, Huang F (2012) Brief introduction of back propagation (BP) neural network algorithm and its improvement. In: Advances in computer science and information engineering. Springer, pp 553–558
Li S, Zhao Z, Liu T, Hu R, Du X (2017) Initializing convolutional filters with semantic features for text classification. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1884–1889
Masci J, Meier U, Cireşan D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: International conference on artificial neural networks. Springer, pp 52–59
Masden M, Sinha D (2020) Linear discriminant initialization for feed-forward neural networks. arXiv:200712782
Masood S, Doja M, Chandra P (2015) Analysis of weight initialization methods for gradient descent with momentum. In: 2015 International conference on soft computing techniques and implementations (ICSCTI). IEEE, pp 131–136
Matias T, Araújo R, Antunes CH, Gabriel D (2013) Genetically optimized extreme learning machine. In: 2013 IEEE 18th conference on emerging technologies and factory automation (ETFA). IEEE, pp 1–8
Mishkin D, Matas J (2015) All you need is a good init. arXiv:151106422
Mittal A, Singh AP, Chandra P (2020) A modification to the Nguyen-Widrow weight initialization method. In: Intelligent systems, technologies and applications. Springer, pp 141–153
Mohapatra P, Chakravarty S, Dash PK (2015) An improved cuckoo search based extreme learning machine for medical data classification. Swarm Evol Comput 24:25–49
Google Scholar
Murru N, Rossini R (2016) A Bayesian approach for initialization of weights in backpropagation neural net with application to character recognition. Neurocomputing 193:92–105
Google Scholar
Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: 1990 IJCNN international joint conference on neural networks. IEEE, pp 21–26
Nguyen G, Dlugolinsky S, Bobák M, Tran V, García ÁL, Heredia I, Malík P, Hluchỳ L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52(1):77–124
Google Scholar
Nwankpa C, Ijomah W, Gachagan A, Marshall S (2018) Activation functions: Comparison of trends in practice and research for deep learning. arXiv:181103378
Pacifico LD, Ludermir TB (2013) Evolutionary extreme learning machine based on particle swarm optimization and clustering strategies. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–6
Paine TL, Khorrami P, Han W, Huang TS (2014) An analysis of unsupervised pre-training in light of recent advances. arXiv:14126597
Pang S, Yang X (2016) Deep convolutional extreme learning machine and its application in handwritten digit classification. Comput Intell Neurosci 2016:3049632
Google Scholar
Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst Mag 22(3):52–67
Google Scholar
Pavelka A, Procházka A (2004) Algorithms for initialization of neural network weights. In: In Proceedings of the 12th annual conference, MATLAB, pp 453–459
Qiao J, Li S, Li W (2016) Mutual information based weight initialization method for sigmoidal feedforward neural networks. Neurocomputing 207:676–683
Google Scholar
Ramos EZ, Nakakuni M, Yfantis E (2017) Quantitative measures to evaluate neural network weight initialization strategies. In: 2017 IEEE 7th annual computing and communication workshop and conference (CCWC). IEEE, pp 1–7
Rodriguez FJ, Garcia-Martinez C, Lozano M (2012) Hybrid metaheuristics based on evolutionary algorithms and simulated annealing: taxonomy, comparison, and synergy test. IEEE Trans Evol Comput 16(6):787–800
Google Scholar
Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V (2017) Stacked deep convolutional auto-encoders for emotion recognition from facial expressions. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 1586–1593
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science. Technical report
Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:13126120
Schneider J (2020) Correlated initialization for correlated data. arXiv:200304422
Shimodaira H (1994) A weight value initialization method for improving learning performance of the backpropagation algorithm in neural networks. In: Proceedings sixth international conference on tools with artificial intelligence. TAI 94, IEEE, pp 672–675
Sodhi SS, Chandra P, Tanwar S (2014) A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298
Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
MathSciNet MATH Google Scholar
Subudhi B, Jena D (2008) Differential evolution and levenberg marquardt trained neural network scheme for nonlinear system identification. Neural Process Lett 27(3):285–296
Google Scholar
Sudowe P, Leibe B (2016) Patchit: self-supervised network weight initialization for fine-grained recognition. BMVC 1:24–25
Google Scholar
Tan S, Li B (2014) Stacked convolutional auto-encoders for steganalysis of digital images. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific. IEEE, pp 1–4
Tang J, Wang D, Zhang Z, He L, Xin J, Xu Y (2017) Weed identification based on k-means feature learning combined with convolutional neural network. Comput Electron Agric 135:63–70
Google Scholar
Tao X, Zhou X, He YL, Ashfaq RAR (2016) Impact of variances of random weights and biases on extreme learning machine. JSW 11(5):440–454
Google Scholar
Tapson J, De Chazal P, van Schaik A (2015) Explicit computation of input weights in extreme learning machines. In: Proceedings of ELM-2014 vol 1. Springer, pp 41–49
Thimm G, Fiesler E (1997) High-order and multilayer perceptron initialization. IEEE Trans Neural Netw 8(2):349–359
Google Scholar
Trinh TH, Luong MT, Le QV (2019) Selfie: self-supervised pretraining for image embedding. arXiv:190602940
Tu S, Huang Y, Liu G et al (2017) Csfl: a novel unsupervised convolution neural network approach for visual pattern classification. AI Commun 30(5):311–324
MathSciNet Google Scholar
Vorontsov E, Trabelsi C, Kadoury S, Pal C (2017) On orthogonality and learning recurrent networks with long term dependencies. In: Proceedings of the 34th international conference on machine learning-volume 70, JMLR. org, pp 3570–3578
Wang X, Cao W (2018) Non-iterative approaches in training feed-forward neural networks and their applications
Wang W, Liu X (2017) The selection of input weights of extreme learning machine: a sample structure preserving point of view. Neurocomputing 261:28–36
Google Scholar
Wessels LF, Barnard E (1992) Avoiding false local minima by proper initialization of connections. IEEE Trans Neural Netw 3(6):899–905
Google Scholar
Wessels L, Barnard E, Van Rooyen E (1990) The physical correlates of local minima. In: International neural network conference
Wiehman S, Kroon S, De Villiers H (2016) Unsupervised pre-training for fully convolutional neural networks. In: 2016 Pattern recognition association of south africa and robotics and mechatronics international conference (PRASA-RobMech). IEEE, pp 1–6
Xu Y, Shu Y (2006) Evolutionary extreme learning machine–based on particle swarm optimization. In: International symposium on neural networks. Springer, pp 644–652
Yam JYF, Chow TWS (2000) A weight initialization method for improving training speed in feedforward neural network. Neurocomputing. https://doi.org/10.1016/S0925-2312(99)00127-7
Article Google Scholar
Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12(2):430–434
Google Scholar
Yam YF, Leung CT, Tam PK, Siu WC (2002) An independent component analysis based weight initialization method for multilayer perceptrons. Neurocomputing 48(1–4):807–818
MATH Google Scholar
Yang XS (2010) Nature-inspired metaheuristic algorithms. Luniver Press, Bristol
Google Scholar
Yang Z, Wen X, Wang Z (2015) Qpso-elm: An evolutionary extreme learning machine based on quantum-behaved particle swarm optimization. In: 2015 seventh international conference on advanced computational intelligence (ICACI). IEEE, pp 69–72
Yang H, Ding X, Chan R, Hu H, Peng Y, Zeng T (2020) A new initialization method based on normed statistical spaces in deep networks. Inverse Probl Imaging 15:147
MathSciNet MATH Google Scholar
Yoon HS, Bae CS, Min BW (1995) Neural networks using modified initial connection strengths by the importance of feature elements. In: 1995 IEEE international conference on systems, man and cybernetics. Intelligent Systems for the 21st century. IEEE, vol 1, pp 458–461
Zhang J, Sanderson AC (2009) Jade: adaptive differential evolution with optional external archive. IEEE Trans Evol Comput 13(5):945–958
Google Scholar
Zhang Q, Zhang L (2018) Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Front Comput Sci 12(6):1140–1148
Google Scholar
Zhang Y, Cai Z, Wu J, Wang X, Liu X (2015) A memetic algorithm based extreme learning machine for classification. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Zhang X, Lin X, Ashfaq RAR (2018) Impact of different random initializations on generalization performance of extreme learning machine. JCP 13(7):805–822
Google Scholar
Zhang H, Dauphin YN, Ma T (2019) Fixup initialization: residual learning without normalization. arXiv:190109321
Zhu QY, Qin AK, Suganthan PN, Huang GB (2005) Evolutionary extreme learning machine. Pattern Recognit 38(10):1759–1763
MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Shrinivas P. Mahajan, Head of Department, E&TC, College of Engineering, Pune for encouraging to carry the research work at the department. The authors would also like to thank Center of Excellence in Signal and Image Processing (CoE-S&IP) at College of Engineering, Pune for providing the necessary resources for this research work. The authors would also like to thank the reviewers for their valuable comments.

Author information

Authors and Affiliations

Department of Electronics and Telecommunication, College of Engineering, Pune, India
Meenal V. Narkhede, Prashant P. Bartakke & Mukul S. Sutaone

Authors

Meenal V. Narkhede
View author publications
You can also search for this author in PubMed Google Scholar
Prashant P. Bartakke
View author publications
You can also search for this author in PubMed Google Scholar
Mukul S. Sutaone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meenal V. Narkhede.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narkhede, M.V., Bartakke, P.P. & Sutaone, M.S. A review on weight initialization strategies for neural networks. Artif Intell Rev 55, 291–322 (2022). https://doi.org/10.1007/s10462-021-10033-z

Download citation

Published: 28 June 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10462-021-10033-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on weight initialization strategies for neural networks

Abstract

Access this article

Similar content being viewed by others

Delving into Feature Maps: An Explanatory Analysis to Evaluate Weight Initialization

Improving Weight Initialization of ReLU and Output Layers

A New Initialization Method for Neural Networks with Weight Sharing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A review on weight initialization strategies for neural networks

Abstract

Access this article

Similar content being viewed by others

Delving into Feature Maps: An Explanatory Analysis to Evaluate Weight Initialization

Improving Weight Initialization of ReLU and Output Layers

A New Initialization Method for Neural Networks with Weight Sharing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation