Skip to main content
Log in

Learning flat representations with artificial neural networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we propose a method of learning representation layers with squashing activation functions within a deep artificial neural network which directly addresses the vanishing gradients problem. The proposed solution is derived from solving the maximum likelihood estimator for components of the posterior representation, which are approximately Beta-distributed, formulated in the context of variational inference. This approach not only improves the performance of deep neural networks with squashing activation functions on some of the hidden layers - including in discriminative learning - but can be employed towards producing sparse codes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Source: http://yann.lecun.com/exdb/mnist/

  2. Source: https://catalog.ldc.upenn.edu/LDC93S1

  3. Source: https://www.cs.toronto.edu/~kriz/cifar.html

  4. Source: https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools

  5. Source: https://github.com/jameslyons/python_speech_features

  6. Source: https://github.com/deepmind/sonnet

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc, pp 1097–1105

  2. LeCunn Y, Bottou L, Orr GB, Muller K-R (1998) Efficient backprop. In: Neural Networks: tricks of the trade. Springer, New York

  3. Hahnloser RHR, Sarpeshkar R, Mahowald MA, Douglas RJ, Sebastian SH (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature p 405

  4. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks

  5. Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (ELUs)

  6. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  7. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space

  8. Bojanowski P, Joulin A (2017) Unsupervised learning by predicting noise

  9. Rifai S, Vincent P, Muller X, et al. (2011) Contractive auto-encoders: Explicit invariance during feature extraction. Proceedings of the

  10. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  11. Jiang N, Rong W, Peng B, Nie Y, Xiong Z (2015) An empirical analysis of different sparse penalties for autoencoder in unsupervised feature learning. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8

  12. Le QV, Karpenko A, Ngiam J, Ng AY (2011) ICA with reconstruction cost for efficient overcomplete feature learning. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems 24. Curran Associates, Inc, pp 1017–1025

  13. Dinh L, Krueger D, Bengio Y (2014) Nice: Non-linear independent components estimation

  14. Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New Jersey

    Book  Google Scholar 

  15. Miller EG, Fisher JW III (2003) Independent components analysis by direct entropy minimization. Technical report, DTIC Document

  16. Sejnowski TJ, Bell AJ (1995) An information-maximisation approach to blind separation and blind deconvolution

  17. Kingma DP, Welling M (2013) Auto-Encoding variational bayes

  18. Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric

  19. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) Beta-VAE: learning basic visual concepts with a constrained variational framework

  20. Muche R (1999) Applied survival analysis: Regression modeling of time to event data. dw hosmer, jr., s lemeshow. Wiley, New York, p 386. ISBN: 0-471-15410-5

    Google Scholar 

  21. Sussillo D, Abbott LF (2014) Random walk initialization for training very deep feedforward networks

  22. Yoshida Y, Miyato T (2017) Spectral norm regularization for improving the generalizability of deep learning

  23. Paisley J, Blei D, Jordan M (2012) Variational bayesian inference with stochastic search

  24. Beckman RJ, Tiet jen GL (1978) Maximum likelihood estimation for the beta distribution. J Stat Comput Simul 7(3-4):253–258

    Article  Google Scholar 

  25. Krizhevsky A (2012) Learning multiple layers of features from tiny images

  26. Kingma D, Ba J (2014) Adam: A method for stochastic optimization

  27. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks

  28. Bienaymé I-J (1853) Considérations à l’appui de la découverte de laplace sur la loi de probabilité dans la méthode des moindres carrés, vol 37. Comptes rendus de l’Académie des sciences, Paris, pp 309–317

    Google Scholar 

  29. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  30. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition

  31. Liao Q, Poggio T (2016) Bridging the gaps between residual learning recurrent neural networks and visual cortex

  32. Graves A, Fernandez S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

  33. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition

Download references

Acknowledgements

This work was supported by the National Authority for Scientific Research and Innovation, and by the Ministry of European Funds, through the Competitiveness Operational Programme 2014-2020, POC-A.1-A.1.1.4-E-2015 [Grant number: 40/02.09.2016, ID: P_37_778, to RT]. We also gratefully acknowledge the support of the NVIDIA Corporation for the donation of a Titan Xp GPU, and the support of the Microsoft Corporation for a 1-year Azure Research Sponsorship. We are thankful to Dmitri Toren for helping in the making of Figure 1.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vlad Constantinescu or Robi Tacutu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Constantinescu, V., Chiru, C., Boloni, T. et al. Learning flat representations with artificial neural networks. Appl Intell 51, 2456–2470 (2021). https://doi.org/10.1007/s10489-020-02032-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02032-4

Keywords

Navigation