Skip to main content

Advertisement

Log in

Deep Neural Network Structures Solving Variational Inequalities

  • Published:
Set-Valued and Variational Analysis Aims and scope Submit manuscript

Abstract

Motivated by structures that appear in deep neural networks, we investigate nonlinear composite models alternating proximity and affine operators defined on different spaces. We first show that a wide range of activation operators used in neural networks are actually proximity operators. We then establish conditions for the averagedness of the proposed composite constructs and investigate their asymptotic properties. It is shown that the limit of the resulting process solves a variational inequality which, in general, does not derive from a minimization problem. The analysis relies on tools from monotone operator theory and sheds some light on a class of neural networks structures with so far elusive asymptotic properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aragón Artacho, F.J., Campoy, R.: A new projection method for finding the closest point in the intersection of convex sets. Comput. Optim. Appl. 69, 99–132 (2018)

    Article  MathSciNet  Google Scholar 

  2. Attouch, H., Peypouquet, J., Redont, P.: Backward-forward algorithms for structured monotone inclusions in Hilbert spaces. J. Math. Anal. Appl. 457, 1095–1117 (2018)

    Article  MathSciNet  Google Scholar 

  3. Baillon, J.-B., Bruck, R.E., Reich, S.: On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces. Houston J. Math. 4, 1–9 (1978)

    MathSciNet  MATH  Google Scholar 

  4. Baillon, J.-B., Combettes, P.L., Cominetti, R.: There is no variational characterization of the cycles in the method of periodic projections. J. Funct. Anal. 262, 400–408 (2012)

    Article  MathSciNet  Google Scholar 

  5. Bargetz, C., Reich, S., Zalas, R.: Convergence properties of dynamic string-averaging projection methods in the presence of perturbations. Numer. Algor. 77, 185–209 (2018)

    Article  MathSciNet  Google Scholar 

  6. Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930–941 (1993)

    Article  MathSciNet  Google Scholar 

  7. Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 367–426 (1996)

    Article  MathSciNet  Google Scholar 

  8. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)

  9. Bauschke, H.H., Noll, D., Phan, H.M.: Linear and strong convergence of algorithms involving averaged nonexpansive operators. J. Math. Anal. Appl. 421, 1–20 (2015)

    Article  MathSciNet  Google Scholar 

  10. Bilski, J.: The backpropagation learning with logarithmic transfer function. In: Proc. 5th Conf. Neural Netw. Soft Comput., pp. 71–76 (2000)

  11. Borwein, J.M., Li, G., Tam, M.K.: Convergence rate analysis for averaged fixed point iterations in common fixed point problems. SIAM J. Optim. 27, 1–33 (2017)

    Article  MathSciNet  Google Scholar 

  12. Borwein, J., Reich, S., Shafrir, I.: Krasnoselski-Mann iterations in normed spaces. Canad. Math. Bull. 35, 21–28 (1992)

    Article  MathSciNet  Google Scholar 

  13. Boţ, R.I., Csetnek, E.R.: A dynamical system associated with the fixed points set of a nonexpansive operator. J. Dynam. Diff. Equ. 29, 155–168 (2017)

    Article  MathSciNet  Google Scholar 

  14. Bravo, M., Cominetti, R.: Sharp convergence rates for averaged nonexpansive maps. Israel J. Math. 227, 163–188 (2018)

    Article  MathSciNet  Google Scholar 

  15. Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing, NATO ASI Series, Series F, vol. 68, pp 227–236. Springer, Berlin (1990)

  16. Carlile, B., Delamarter, G., Kinney, P., Marti, A., Whitney, B.: Improving deep learning by inverse square root linear units (ISRLUs). https://arxiv.org/abs/1710.09967 (2017)

  17. Cegielski, A.: Iterative Methods for Fixed Point Problems in Hilbert Spaces. Lecture Notes in Mathematics, vol. 2057. Springer, Heidelberg (2012)

    Google Scholar 

  18. Censor, Y., Mansour, R.: New Douglas–Rachford algorithmic structures and their convergence analyses. SIAM J. Optim. 26, 474–487 (2016)

    Article  MathSciNet  Google Scholar 

  19. Combettes, P.L.: Construction d’un point fixe commun à une famille de contractions fermes. C. R. Acad. Sci. Paris Sér. I Math., 320, 1385–1390 (1995)

    MathSciNet  MATH  Google Scholar 

  20. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53, 475–504 (2004)

    Article  MathSciNet  Google Scholar 

  21. Combettes, P.L.: Monotone operator theory in convex optimization. Math. Programming B170, 177–206 (2018)

    Article  MathSciNet  Google Scholar 

  22. Combettes, P.L., Pesquet, J.-C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18, 1351–1376 (2007)

    Article  MathSciNet  Google Scholar 

  23. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

    Article  MathSciNet  Google Scholar 

  24. Combettes, P.L., Yamada, I.: Compositions and convex combinations of averaged nonexpansive operators. J. Math. Anal. Appl. 425, 55–70 (2015)

    Article  MathSciNet  Google Scholar 

  25. Condat, L.: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013)

    Article  MathSciNet  Google Scholar 

  26. Cybenko, G.: Approximation by superposition of sigmoidal functions. Math. Control Signals Syst. 2, 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  27. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  Google Scholar 

  28. Elliot, D.L.: A better activation function for artificial neural networks, Institute for Systems Research, University of Maryland, Tech. Rep., pp. 93–8 (1993)

  29. Funahashi, K.-I.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989)

    Article  Google Scholar 

  30. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proc. 14th Int. Conf. Artificial Intell. Stat., pp. 315–323 (2011)

  31. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Pearson Education, Singapore (1998)

    MATH  Google Scholar 

  32. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proc. Int. Conf. Comput. Vision, pp. 1026–1034 (2015)

  33. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vision Pattern Recogn., pp. 770–778 (2016)

  34. LeCun, Y.A., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  35. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. Lect. Notes Comput. Sci. 1524, 9–50 (1998)

    Article  Google Scholar 

  36. Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. Cas de l’application prox. C. R. Acad. Sci. Paris A274, 163–165 (1972)

    MATH  Google Scholar 

  37. McCulloch, W.S., Pitts, W.H.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943)

    Article  MathSciNet  Google Scholar 

  38. Moursi, W.M.: The forward-backward algorithm and the normal problem. J. Optim. Theory Appl. 176, 605–624 (2018)

    Article  MathSciNet  Google Scholar 

  39. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proc. 27st Int. Conf. Machine Learn., pp. 807–814 (2010)

  40. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  41. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  42. Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Rev. 65, 386–408 (1958)

    Article  Google Scholar 

  43. Ryu, E.K., Hannah, R., Yin, W.: Scaled relative graph: Nonexpansive operators via 2D Euclidean geometry. https://arxiv.org/abs/1902.09788

  44. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. Proc. Neural Inform. Process. Syst. Conf. 28, 2377–2385 (2015)

    Google Scholar 

  45. Tariyal, S., Majumdar, A., Singh, R., Vatsa, M.: Deep dictionary learning. IEEE Access 4, 10096–10109 (2016)

    Article  Google Scholar 

  46. Tseng, P.: On the convergence of products of firmly nonexpansive mappings. SIAM J. Optim. 2, 425–434 (1992)

    Article  MathSciNet  Google Scholar 

  47. Yamagishi, M., Yamada, I.: Nonexpansiveness of a linearized augmented Lagrangian operator for hierarchical convex optimization. Inverse Problems, vol. 33, art. 044003, 35 pp. (2017)

  48. Zhang, X.-P.: Thresholding neural network for adaptive noise reduction. IEEE Trans. Neural Netw. 12, 567–584 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick L. Combettes.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of P. L. Combettes was supported by the National Science Foundation under grant CCF-1715671. The work of J.-C. Pesquet was supported by Institut Universitaire de France.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Combettes, P.L., Pesquet, JC. Deep Neural Network Structures Solving Variational Inequalities. Set-Valued Var. Anal 28, 491–518 (2020). https://doi.org/10.1007/s11228-019-00526-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11228-019-00526-z

Keywords

Navigation