Skip to main content
Log in

Universal Approximations of Invariant Maps by Neural Networks

  • Published:
Constructive Approximation Aims and scope

Abstract

We describe generalizations of the universal approximation theorem for neural networks to maps invariant or equivariant with respect to linear representations of groups. Our goal is to establish network-like computational models that are both invariant/equivariant and provably complete in the sense of their ability to approximate any continuous invariant/equivariant map. Our contribution is three-fold. First, in the general case of compact groups we propose a construction of a complete invariant/equivariant network using an intermediate polynomial layer. We invoke classical theorems of Hilbert and Weyl to justify and simplify this construction; in particular, we describe an explicit complete ansatz for approximation of permutation-invariant maps. Second, we consider groups of translations and prove several versions of the universal approximation theorem for convolutional networks in the limit of continuous signals on euclidean spaces. Finally, we consider 2D signal transformations equivariant with respect to the group SE(2) of rigid euclidean motions. In this case we introduce the “charge–conserving convnet”—a convnet-like computational model based on the decomposition of the feature space into isotypic representations of SO(2). We prove this model to be a universal approximator for continuous SE(2)—equivariant signal transformations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Another approach to ensure a well-defined value \({\varvec{\Phi }}({\mathbf {x}})\) is to work with shift-invariant reproducing kernel Hilbert spaces (RKHS) instead of \(L^2\) spaces. Definition of RKHS requires the signal evaluation \({\varvec{\Phi }}\mapsto {\varvec{\Phi }}({\mathbf {x}})\) to be continuous in \({\varvec{\Phi }}\) and in particular well-defined. An example of a shift-invariant RKHS is the space of band-limited signals with a particular bandwidth. We thank the anonymous reviewer for pointing out this approach.

References

  1. Anselmi, F., Rosasco, L., Poggio, T.: On invariance and selectivity in representation learning. Inf. Inference 5(2), 134–158 (2016)

    Article  MathSciNet  Google Scholar 

  2. Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)

    Article  Google Scholar 

  3. Burkhardt, H., Siggelkow, S.: Invariant features in pattern recognition-fundamentals and applications. In: Nonlinear Model-Based Image/Video Processing and Analysis, pp. 269–307 (2001)

  4. Cohen, N., Shashua, A.: Convolutional rectifier networks as generalized tensor decompositions. In: International Conference on Machine Learning, pp. 955–963 (2016)

  5. Cohen, N., Sharir, O., Levine, Y., Tamari, R., Yakira, D., Shashua, A.: Analysis and design of convolutional networks via hierarchical tensor decompositions (2017). arXiv preprint arXiv:1705.02302

  6. Cohen, T., Welling, M.: Group equivariant convolutional networks. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 2990–2999 (2016)

  7. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  8. Dieleman, S., De Fauw, J., Kavukcuoglu, K.: Exploiting cyclic symmetry in convolutional neural networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1889–1898 (2016)

  9. Esteves, C., Allen-Blanchette, C., Zhou, X., Daniilidis, K.: Polar transformer networks. In: International Conference on Learning Representations (2018)

  10. Funahashi, K.-I.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2(3), 183–192 (1989)

    Article  Google Scholar 

  11. Gens, R., Domingos, P.M.: Deep symmetry networks. In: Advances in Neural Information Processing Systems, pp. 2537–2545 (2014)

  12. Goodfellow, I., Bengio, Y.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  14. Hedlund, G.A.: Endomorphisms and automorphisms of the shift dynamical system. Theory Comput. Syst. 3(4), 320–375 (1969)

    MathSciNet  MATH  Google Scholar 

  15. Henriques, J.F., Vedaldi, A.: Warped convolutions: efficient invariance to spatial transformations. In: International Conference on Machine Learning, pp. 1461–1469 (2017)

  16. Hilbert, D.: Über die Theorie der algebraischen Formen. Mathematische Annalen 36(4), 473–534 (1890)

    Article  MathSciNet  Google Scholar 

  17. Hilbert, D.: Über die vollen Invariantensysteme. Mathematische Annalen 42(3), 313–373 (1893)

    Article  MathSciNet  Google Scholar 

  18. Hornik, K.: Some new results on neural network approximation. Neural Netw. 6(8), 1069–1072 (1993)

    Article  Google Scholar 

  19. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  Google Scholar 

  20. Kondor, R., Trivedi, S.: On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: International Conference on Machine Learning, pp. 2747–2755 (2018)

  21. Kraft, H., Procesi, C.: Classical invariant theory, a primer. Lecture Notes (2000)

  22. le Cun, Y.: Generalization and network design strategies. In: Connectionism in Perspective, pp. 143–155 (1989)

  23. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  24. Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993)

    Article  Google Scholar 

  25. Mallat, S.: Group invariant scattering. Commun. Pure Appl. Math. 65(10), 1331–1398 (2012)

    Article  MathSciNet  Google Scholar 

  26. Mallat, S.: Understanding deep convolutional networks. Philos. Trans. R. Soc. A 374(2065), 20150203 (2016)

    Article  Google Scholar 

  27. Manay, S., Cremers, D., Hong, B.-W., Yezzi, A.J., Soatto, S.: Integral invariants for shape matching. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1602–1618 (2006)

    Article  Google Scholar 

  28. Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5048–5057 (2017)

  29. Mhaskar, H.N., Micchelli, C.A.: Approximation by superposition of sigmoidal and radial basis functions. Adv. Appl. Math. 13(3), 350–373 (1992)

    Article  MathSciNet  Google Scholar 

  30. Munkres, J.R.: Topology. Featured Titles for Topology Series. Prentice Hall, Upper Saddle River (2000)

    Google Scholar 

  31. Pinkus, A.: TDI-subspaces of \(C({\mathbb{R}}^d)\) and some density problems from neural networks. J. Approx. Theory 85(3), 269–287 (1996)

    Article  MathSciNet  Google Scholar 

  32. Pinkus, A.: Approximation theory of the mlp model in neural networks. Acta Numerica 8, 143–195 (1999)

    Article  MathSciNet  Google Scholar 

  33. Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 1–17 (2017)

  34. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

  35. Reisert, M.: Group integration techniques in pattern analysis. Ph.D. thesis, Albert-Ludwigs-University (2008)

  36. Schmid, B.J.: Finite groups and invariant theory. In: Topics in Invariant Theory, pp. 35–66. Springer (1991)

  37. Schulz-Mirbach, H.: Invariant features for gray scale images. In: Mustererkennung 1995, pp. 1–14. Springer (1995)

  38. Serre, J.-P.: Linear Representations of Finite Groups, vol. 42. Springer, Berlin (2012)

    Google Scholar 

  39. Sifre, L., Mallat, S.: Rigid-motion scattering for texture classification (2014). arXiv preprint arXiv:1403.1687

  40. Simon, B.: Representations of Finite and Compact Groups. Number 10. American Mathematical Soc, London (1996)

    Google Scholar 

  41. Skibbe, H.: Spherical tensor algebra for biomedical image analysis. Ph.D. thesis, Albert-Ludwigs-University (2013)

  42. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014). arXiv preprint arXiv:1412.6806

  43. Thoma, M.: Analysis and optimization of convolutional neural network architectures. Masters’s thesis, Karlsruhe Institute of Technology, Karlsruhe, Germany, June 2017. https://martin-thoma.com/msthesis/

  44. Vinberg, E.B.: Linear Representations of Groups. Birkhäuser, Basel (2012)

    Google Scholar 

  45. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 37(3), 328–339 (1989)

    Article  Google Scholar 

  46. Weyl, H.: The Classical Groups: Their Invariants and Representations. Princeton Mathematical Series (1) (1946)

  47. Worfolk, P.A.: Zeros of equivariant vector fields: algorithms for an invariant approach. J. Symb. Comput. 17(6), 487–511 (1994)

    Article  MathSciNet  Google Scholar 

  48. Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5028–5037 (2017)

  49. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems, pp. 3391–3401 (2017)

Download references

Acknowledgements

The author thanks the anonymous reviewer for several helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Yarotsky.

Additional information

Communicated by Wolfgang Dahmen, Ronald A. Devore, and Philipp Grohs.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Lemma 4.1

Proof of Lemma 4.1

The proof is a slight modification of the standard proof of Central Limit Theorem via Fourier transform (the CLT can be directly used to prove the lemma in the case \(a=b=0\) when \({\mathcal {L}}_{\lambda }^{(a,b)}\) only includes diffusion factors).

To simplify notation, assume without loss of generality that \(d_V=1\) (in the general case the proof is essentially identical). We will use the appropriately discretized version of the Fourier transform (i.e., the Fourier series expansion). Given a discretized signal \( \Phi :(\lambda {\mathbb {Z}})^2\rightarrow {\mathbb {C}}\), we define \({\mathcal {F}}_\lambda \Phi \) as a function on \([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2\) by

$$\begin{aligned} {\mathcal {F}}_\lambda \Phi ({\mathbf {p}})=\frac{\lambda ^2}{2\pi }\sum _{{\gamma }\in (\lambda {\mathbb {Z}})^2} \Phi ({\gamma })e^{-i{\mathbf {p}}\cdot {\gamma }}. \end{aligned}$$

Then, \({\mathcal {F}}_\lambda : L^2((\lambda {\mathbb {Z}})^2,{\mathbb {C}})\rightarrow L^2([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2,{\mathbb {C}})\) is a unitary isomorphism, assuming that the scalar product in the input space is defined by \(\langle \Phi ,\Psi \rangle =\lambda ^2\sum _{\gamma \in (\lambda {\mathbb {Z}})^2}\overline{\Phi (\gamma )}\Psi (\gamma )\) and in the output space by \(\langle \Phi ,\Psi \rangle =\int _{[-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2} \overline{\Phi ({\mathbf {p}})}\Psi ({\mathbf {p}}) \mathrm{d}^2{\mathbf {p}}\). Let \(P_\lambda \) be the discretization projector (3.6). It is easy to check that \({\mathcal {F}}_\lambda P_\lambda \) strongly converges to the standard Fourier transform as \(\lambda \rightarrow 0:\)

$$\begin{aligned} \lim _{\lambda \rightarrow 0}{\mathcal {F}}_\lambda P_\lambda \Phi = {\mathcal {F}}_0 \Phi , \quad \Phi \in L^2({\mathbb {R}}^2,{\mathbb {C}}), \end{aligned}$$

where

$$\begin{aligned} {\mathcal {F}}_0 \Phi ({\mathbf {p}})=\frac{1}{2\pi }\int _{{\mathbb {R}}^2} \Phi ({\gamma })e^{-i{\mathbf {p}}\cdot {\gamma }}\mathrm{d}^2{\gamma } \end{aligned}$$

and where we naturally embed \(L^2([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2,{\mathbb {C}})\subset L^2({\mathbb {R}}^2,{\mathbb {C}})\). Conversely, let \(P_\lambda '\) denote the orthogonal projection onto the subspace \(L^2([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2,{\mathbb {C}})\) in \(L^2({\mathbb {R}}^2,{\mathbb {C}}):\)

$$\begin{aligned} P_\lambda ':\Phi \mapsto \Phi |_{[-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2}. \end{aligned}$$
(A.1)

Then

$$\begin{aligned} \lim _{\lambda \rightarrow 0}{\mathcal {F}}_\lambda ^{-1} P'_\lambda \Phi = {\mathcal {F}}_0^{-1} \Phi , \quad \Phi \in L^2({\mathbb {R}}^2,{\mathbb {C}}). \end{aligned}$$
(A.2)

Fourier transform gives us the spectral representation of the discrete differential operators (4.15), (4.16), (4.17) as operators of multiplication by function:

$$\begin{aligned} {\mathcal {F}}_\lambda \partial _{z}^{(\lambda )} \Phi&=\Psi _{\partial _{z}^{(\lambda )}}\cdot {\mathcal {F}}_\lambda \Phi ,\\ {\mathcal {F}}_\lambda \partial _{{\overline{z}}}^{(\lambda )} \Phi&=\Psi _{\partial _{{\overline{z}}}^{(\lambda )}}\cdot {\mathcal {F}}_\lambda \Phi ,\\ {\mathcal {F}}_\lambda \Delta ^{(\lambda )} \Phi&=\Psi _{\Delta ^{(\lambda )}}\cdot {\mathcal {F}}_\lambda \Phi , \end{aligned}$$

where, denoting \({\mathbf {p}}=(p_x,p_y)\),

$$\begin{aligned} \Psi _{\partial _{z}^{(\lambda )}}(p_x,p_y)&=\frac{i}{2\lambda }(\sin \lambda p_x-i\sin \lambda p_y),\\ \Psi _{\partial _{{\overline{z}}}^{(\lambda )}}(p_x,p_y)&=\frac{i}{2\lambda }(\sin \lambda p_x+i\sin \lambda p_y),\\ \Psi _{\Delta ^{(\lambda )}}(p_x,p_y)&=-\frac{4}{\lambda ^2}\Big (\sin ^2\frac{\lambda p_x}{2}+\sin ^2\frac{\lambda p_y}{2}\Big ). \end{aligned}$$

The operator \({\mathcal {L}}_\lambda ^{(a,b)}\) defined in (4.19) can then be written as

$$\begin{aligned} {\mathcal {F}}_\lambda {\mathcal {L}}_\lambda ^{(a,b)} \Phi =\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\cdot {\mathcal {F}}_\lambda P_\lambda \Phi , \end{aligned}$$

where the function \(\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\) is given by

$$\begin{aligned} \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}=(\Psi _{\partial _{z}^{(\lambda )}})^a (\Psi _{\partial _{{\overline{z}}}^{(\lambda )}})^b\left( 1+\tfrac{\lambda ^2}{8} \Psi _{\Delta ^{(\lambda )}}\right) ^{\lceil 4/\lambda ^2\rceil }. \end{aligned}$$

We can then write \({\mathcal {L}}_\lambda ^{(a,b)}\Phi \) as a convolution of \(P_\lambda \Phi \) with the kernel

$$\begin{aligned} \Psi _{a,b}^{(\lambda )}=\frac{1}{2\pi }{\mathcal {F}}_\lambda ^{-1} \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}} \end{aligned}$$

on the grid \((\lambda {\mathbb {Z}})^2:\)

$$\begin{aligned} {\mathcal {L}}_\lambda ^{(a,b)}\Phi (\gamma )=\lambda ^2\sum _{\theta \in (\lambda {\mathbb {Z}})^2}P_\lambda \Phi (\gamma -\theta )\Psi _{a,b}^{(\lambda )}(\theta ),\quad \gamma \in (\lambda {\mathbb {Z}})^2. \end{aligned}$$
(A.3)

Now consider the operator \({\mathcal {L}}_0^{(a,b)}\) defined in (4.21). At each \({\mathbf {x}}\in {\mathbb {R}}^2\), the value \({\mathcal {L}}_0^{(a,b)} \Phi ({\mathbf {x}})\) can be written as a scalar product:

$$\begin{aligned} {\mathcal {L}}_0^{(a,b)} \Phi ({\mathbf {x}})=\int _{{\mathbb {R}}^2}\Phi ({\mathbf {x}}-{\mathbf {y}}) \Psi _{a,b}({\mathbf {y}})\mathrm{d}^2{\mathbf {y}}=\langle R_{-{\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}\rangle _{L^2({\mathbb {R}}^2)}, \end{aligned}$$
(A.4)

where \({\widetilde{\Phi }}({\mathbf {x}})={\overline{\Phi }}(-{\mathbf {x}})\), \(\Psi _{a,b}\) is defined by (4.20), and \(R_{{\mathbf {x}}}\) is our standard representation of the group \({\mathbb {R}}^2\), \(R_{{\mathbf {x}}}\Phi ({\mathbf {y}})=\Phi ({\mathbf {y}}-{\mathbf {x}})\). For \(\lambda >0\), we can write \({\mathcal {L}}_\lambda ^{(a,b)} \Phi ({\mathbf {x}})\) in a similar form. Indeed, using (A.3) and naturally extending the discretized signal \(\Psi _{a,b}^{(\lambda )}\) to the whole \({\mathbb {R}}^2\), we have

$$\begin{aligned} {\mathcal {L}}_\lambda ^{(a,b)}\Phi (\gamma )=\int _{{\mathbb {R}}^2}\Phi (\gamma -{\mathbf {y}})\Psi _{a,b}^{(\lambda )}({\mathbf {y}})\mathrm{d}^2{\mathbf {y}}=\langle R_{-\gamma }{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle _{L^2({\mathbb {R}}^2)}. \end{aligned}$$

Then, for any \({\mathbf {x}}\in {\mathbb {R}}^2\) we can write

$$\begin{aligned} {\mathcal {L}}_\lambda ^{(a,b)}\Phi ({\mathbf {x}})=\langle R_{-{\mathbf {x}}+\delta {\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle _{L^2({\mathbb {R}}^2)}, \end{aligned}$$
(A.5)

where \(-{\mathbf {x}}+\delta {\mathbf {x}}\) is the point of the grid \((\lambda {\mathbb {Z}})^2\) nearest to \(-{\mathbf {x}}\).

Now consider the formulas (A.4), (A.5) and observe that, by Cauchy-Schwarz inequality and since R is norm-preserving, to prove statement 1) of the lemma we only need to show that the functions \(\Psi _{a,b},\Psi _{a,b}^{(\lambda )}\) have uniformly bounded \(L^2\)-norms. For \(\lambda >0\) we have

$$\begin{aligned} \Vert \Psi _{a,b}^{(\lambda )}\Vert ^2_{L^2({\mathbb {R}}^2)}&=\Big \Vert \frac{1}{2\pi }{\mathcal {F}}_\lambda ^{-1}\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\Big \Vert ^2_{L^2({\mathbb {R}}^2)}\nonumber \\&=\frac{1}{4\pi ^2}\Vert \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\Vert ^2_{L^2({\mathbb {R}}^2)}\nonumber \\&=\frac{1}{4\pi ^2}\big \Vert (\Psi _{\partial _{z}^{(\lambda )}})^a (\Psi _{\partial _{{\overline{z}}}^{(\lambda )}})^b\left( 1+\tfrac{\lambda ^2}{8}\Psi _{\Delta ^{(\lambda )}}\right) ^{\lceil 4/\lambda ^2\rceil }\big \Vert _{L^2({\mathbb {R}}^2)}^2\nonumber \\&\le \frac{1}{4\pi ^2}\int _{-\pi /\lambda }^{\pi /\lambda }\int _{-\pi /\lambda }^{\pi /\lambda } \big (\tfrac{|p_x|+|p_y|}{2}\big )^{2(a+b)}\nonumber \\&\quad \exp \left( -\lceil 4/\lambda ^2\rceil \left( \sin ^2\tfrac{\lambda p_x}{2}+\sin ^2\tfrac{\lambda p_y}{2}\right) \right) \mathrm{d}p_x\mathrm{d}p_y\nonumber \\&\le \frac{1}{4\pi ^2}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty } \big (\tfrac{|p_x|+|p_y|}{2}\big )^{2(a+b)} \exp (-\tfrac{4}{\pi ^2}(p_x^2+p_y^2))\mathrm{d}p_x\mathrm{d}p_y\\&<\infty ,\nonumber \end{aligned}$$
(A.6)

where we used the inequalities

$$\begin{aligned}&|\sin t| \le |t|,\\&|1+t|\le e^t,\quad t>-1,\\&|\sin t|\ge \tfrac{2|t|}{\pi },\quad t\in [-\tfrac{\pi }{2},\tfrac{\pi }{2}]. \end{aligned}$$

Expression (A.6) provides a finite bound, uniform in \(\lambda \), for the squared norms \(\Vert \Psi _{a,b}^{(\lambda )}\Vert ^2\). This bound also holds for \(\Vert \Psi _{a,b}\Vert ^2\).

Next, observe that to establish the strong convergence in statement 2) of the lemma, it suffices to show that

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\Vert \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\Vert _{L^2({\mathbb {R}}^2)}=0. \end{aligned}$$
(A.7)

Indeed, by (A.4), (A.5), we would then have

$$\begin{aligned} \Vert {\mathcal {L}}_\lambda ^{(a,b)} \Phi -{\mathcal {L}}_0^{(a,b)} \Phi \Vert _\infty&=\sup _{{\mathbf {x}}\in {\mathbb {R}}^2}|\langle R_{-{\mathbf {x}}+\delta {\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle -\langle R_{-{\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}\rangle |\\&=\sup _{{\mathbf {x}}\in {\mathbb {R}}^2}|\langle R_{-{\mathbf {x}}}(R_{\delta {\mathbf {x}}}-1){\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle +\langle R_{-{\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\rangle |\\&\le \sup _{\Vert \delta {\mathbf {x}}\Vert \le \lambda }\Vert R_{\delta {\mathbf {x}}}{\widetilde{\Phi }}-{\widetilde{\Phi }}\Vert _2\sup _{\lambda }\Vert \Psi _{a,b}^{(\lambda )}\Vert _2+\Vert {\widetilde{\Phi }}\Vert _2\Vert \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\Vert _2\\&{\mathop {\longrightarrow }\limits ^{\lambda \rightarrow 0}}0 \end{aligned}$$

thanks to the unitarity of R, convergence \(\lim _{\delta {\mathbf {x}}\rightarrow 0}\Vert R_{\delta {\mathbf {x}}}{\widetilde{\Phi }}-{\widetilde{\Phi }}\Vert _2=0,\) uniform boundedness of \(\Vert \Psi _{a,b}^{(\lambda )}\Vert _2\) and convergence (A.7).

To establish (A.7), we write

$$\begin{aligned} \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}=\frac{1}{2\pi }({\mathcal {F}}_\lambda ^{-1} \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}-{\mathcal {F}}_0^{-1}\Psi _{{\mathcal {L}}_0^{(a,b)}}), \end{aligned}$$

where \(\Psi _{{\mathcal {L}}_0^{(a,b)}}=2\pi {\mathcal {F}}_\lambda \Psi _{a,b}.\) By definition (4.20) of \(\Psi _{a,b}\) and standard properties of Fourier transform, the explicit form of the function \(\Psi _{{\mathcal {L}}_0^{(a,b)}}\) is

$$\begin{aligned} \Psi _{{\mathcal {L}}_0^{(a,b)}}(p_x,p_y)=\big (\tfrac{i(p_x-ip_y)}{2}\big )^a \big (\tfrac{i(p_x+ip_y)}{2}\big )^b \exp \big (-\tfrac{p_x^2+p_y^2}{2}\big ). \end{aligned}$$

Observe that the function \(\Psi _{{\mathcal {L}}_0^{(a,b)}}\) is the pointwise limit of the functions \(\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\) as \(\lambda \rightarrow 0\). The functions \(|\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}|^2\) are bounded uniformly in \(\lambda \) by the integrable function appearing in the integral (A.6). Therefore we can use the dominated convergence theorem and conclude that

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\big \Vert \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}-P'_\lambda \Psi _{{\mathcal {L}}_0^{(a,b)}}\big \Vert _2=0, \end{aligned}$$
(A.8)

where \(P_\lambda '\) is the cut-off projector (A.1). We then have

$$\begin{aligned} \Vert \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\Vert _2&=\frac{1}{2\pi } \big \Vert {\mathcal {F}}_\lambda ^{-1}\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}} -{\mathcal {F}}_0^{-1}\Psi _{{\mathcal {L}}_0^{(a,b)}}\big \Vert _2\\&\le \frac{1}{2\pi }\big \Vert {\mathcal {F}}_\lambda ^{-1}(\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}-P'_\lambda \Psi _{{\mathcal {L}}_0^{(a,b)}})\big \Vert _2+ \frac{1}{2\pi }\big \Vert ({\mathcal {F}}_\lambda ^{-1}P'_\lambda -{\mathcal {F}}_0^{-1}) \Psi _{{\mathcal {L}}_0^{(a,b)}}\big \Vert _2\\&{\mathop {\longrightarrow }\limits ^{\lambda \rightarrow 0}}0 \end{aligned}$$

by (A.8) and (A.2). We have thus proved (A.7).

It remains to show that the convergence \({\mathcal {L}}_\lambda ^{(a,b)} \Phi \rightarrow {\mathcal {L}}_0^{(a,b)} \Phi \) is uniform on compact sets \(K\subset V\). This follows by a version of continuity argument. For any \(\epsilon >0\), we can choose finitely many \(\Phi _n,n=1,\ldots ,N,\) such that for any \(\Phi \in K\) there is some \(\Phi _n\) for which \(\Vert \Phi -\Phi _n\Vert <\epsilon .\) Then \(\Vert {\mathcal {L}}_\lambda ^{(a,b)} \Phi - {\mathcal {L}}_0^{(a,b)} \Phi \Vert \le \Vert {\mathcal {L}}_\lambda ^{(a,b)} \Phi _n- {\mathcal {L}}_0^{(a,b)} \Phi _n\Vert +2\sup _{\lambda \ge 0} \Vert {\mathcal {L}}_\lambda ^{(a,b)}\Vert \epsilon \). Since \(\sup _{\lambda \ge 0} \Vert {\mathcal {L}}_\lambda ^{(a,b)}\Vert <\infty \) by statement 1) of the lemma, the desired uniform convergence for \(\Phi \in K\) follows from the convergence for \(\Phi _n,n=1,\ldots ,N\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yarotsky, D. Universal Approximations of Invariant Maps by Neural Networks. Constr Approx 55, 407–474 (2022). https://doi.org/10.1007/s00365-021-09546-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00365-021-09546-1

Keywords

Mathematics Subject Classification

Navigation