Universal Approximations of Invariant Maps by Neural Networks

Yarotsky, Dmitry

doi:10.1007/s00365-021-09546-1

Universal Approximations of Invariant Maps by Neural Networks

Published: 30 April 2021

Volume 55, pages 407–474, (2022)
Cite this article

Constructive Approximation Aims and scope

Dmitry Yarotsky^1,2

1881 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

We describe generalizations of the universal approximation theorem for neural networks to maps invariant or equivariant with respect to linear representations of groups. Our goal is to establish network-like computational models that are both invariant/equivariant and provably complete in the sense of their ability to approximate any continuous invariant/equivariant map. Our contribution is three-fold. First, in the general case of compact groups we propose a construction of a complete invariant/equivariant network using an intermediate polynomial layer. We invoke classical theorems of Hilbert and Weyl to justify and simplify this construction; in particular, we describe an explicit complete ansatz for approximation of permutation-invariant maps. Second, we consider groups of translations and prove several versions of the universal approximation theorem for convolutional networks in the limit of continuous signals on euclidean spaces. Finally, we consider 2D signal transformations equivariant with respect to the group SE(2) of rigid euclidean motions. In this case we introduce the “charge–conserving convnet”—a convnet-like computational model based on the decomposition of the feature space into isotypic representations of SO(2). We prove this model to be a universal approximator for continuous SE(2)—equivariant signal transformations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of (sub-)Riemannian PDE-G-CNNs

Article Open access 16 April 2023

Geometric deep learning and equivariant neural networks

Article Open access 04 June 2023

Equivariant Representation Learning in the Presence of Stabilizers

Notes

Another approach to ensure a well-defined value ${\varvec{\Phi }}({\mathbf {x}})$ is to work with shift-invariant reproducing kernel Hilbert spaces (RKHS) instead of $L^2$ spaces. Definition of RKHS requires the signal evaluation ${\varvec{\Phi }}\mapsto {\varvec{\Phi }}({\mathbf {x}})$ to be continuous in ${\varvec{\Phi }}$ and in particular well-defined. An example of a shift-invariant RKHS is the space of band-limited signals with a particular bandwidth. We thank the anonymous reviewer for pointing out this approach.

References

Anselmi, F., Rosasco, L., Poggio, T.: On invariance and selectivity in representation learning. Inf. Inference 5(2), 134–158 (2016)
Article MathSciNet Google Scholar
Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)
Article Google Scholar
Burkhardt, H., Siggelkow, S.: Invariant features in pattern recognition-fundamentals and applications. In: Nonlinear Model-Based Image/Video Processing and Analysis, pp. 269–307 (2001)
Cohen, N., Shashua, A.: Convolutional rectifier networks as generalized tensor decompositions. In: International Conference on Machine Learning, pp. 955–963 (2016)
Cohen, N., Sharir, O., Levine, Y., Tamari, R., Yakira, D., Shashua, A.: Analysis and design of convolutional networks via hierarchical tensor decompositions (2017). arXiv preprint arXiv:1705.02302
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 2990–2999 (2016)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Article MathSciNet Google Scholar
Dieleman, S., De Fauw, J., Kavukcuoglu, K.: Exploiting cyclic symmetry in convolutional neural networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 1889–1898 (2016)
Esteves, C., Allen-Blanchette, C., Zhou, X., Daniilidis, K.: Polar transformer networks. In: International Conference on Learning Representations (2018)
Funahashi, K.-I.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2(3), 183–192 (1989)
Article Google Scholar
Gens, R., Domingos, P.M.: Deep symmetry networks. In: Advances in Neural Information Processing Systems, pp. 2537–2545 (2014)
Goodfellow, I., Bengio, Y.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hedlund, G.A.: Endomorphisms and automorphisms of the shift dynamical system. Theory Comput. Syst. 3(4), 320–375 (1969)
MathSciNet MATH Google Scholar
Henriques, J.F., Vedaldi, A.: Warped convolutions: efficient invariance to spatial transformations. In: International Conference on Machine Learning, pp. 1461–1469 (2017)
Hilbert, D.: Über die Theorie der algebraischen Formen. Mathematische Annalen 36(4), 473–534 (1890)
Article MathSciNet Google Scholar
Hilbert, D.: Über die vollen Invariantensysteme. Mathematische Annalen 42(3), 313–373 (1893)
Article MathSciNet Google Scholar
Hornik, K.: Some new results on neural network approximation. Neural Netw. 6(8), 1069–1072 (1993)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Kondor, R., Trivedi, S.: On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: International Conference on Machine Learning, pp. 2747–2755 (2018)
Kraft, H., Procesi, C.: Classical invariant theory, a primer. Lecture Notes (2000)
le Cun, Y.: Generalization and network design strategies. In: Connectionism in Perspective, pp. 143–155 (1989)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993)
Article Google Scholar
Mallat, S.: Group invariant scattering. Commun. Pure Appl. Math. 65(10), 1331–1398 (2012)
Article MathSciNet Google Scholar
Mallat, S.: Understanding deep convolutional networks. Philos. Trans. R. Soc. A 374(2065), 20150203 (2016)
Article Google Scholar
Manay, S., Cremers, D., Hong, B.-W., Yezzi, A.J., Soatto, S.: Integral invariants for shape matching. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1602–1618 (2006)
Article Google Scholar
Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5048–5057 (2017)
Mhaskar, H.N., Micchelli, C.A.: Approximation by superposition of sigmoidal and radial basis functions. Adv. Appl. Math. 13(3), 350–373 (1992)
Article MathSciNet Google Scholar
Munkres, J.R.: Topology. Featured Titles for Topology Series. Prentice Hall, Upper Saddle River (2000)
Google Scholar
Pinkus, A.: TDI-subspaces of $C({\mathbb{R}}^d)$ and some density problems from neural networks. J. Approx. Theory 85(3), 269–287 (1996)
Article MathSciNet Google Scholar
Pinkus, A.: Approximation theory of the mlp model in neural networks. Acta Numerica 8, 143–195 (1999)
Article MathSciNet Google Scholar
Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 1–17 (2017)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Reisert, M.: Group integration techniques in pattern analysis. Ph.D. thesis, Albert-Ludwigs-University (2008)
Schmid, B.J.: Finite groups and invariant theory. In: Topics in Invariant Theory, pp. 35–66. Springer (1991)
Schulz-Mirbach, H.: Invariant features for gray scale images. In: Mustererkennung 1995, pp. 1–14. Springer (1995)
Serre, J.-P.: Linear Representations of Finite Groups, vol. 42. Springer, Berlin (2012)
Google Scholar
Sifre, L., Mallat, S.: Rigid-motion scattering for texture classification (2014). arXiv preprint arXiv:1403.1687
Simon, B.: Representations of Finite and Compact Groups. Number 10. American Mathematical Soc, London (1996)
Google Scholar
Skibbe, H.: Spherical tensor algebra for biomedical image analysis. Ph.D. thesis, Albert-Ludwigs-University (2013)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014). arXiv preprint arXiv:1412.6806
Thoma, M.: Analysis and optimization of convolutional neural network architectures. Masters’s thesis, Karlsruhe Institute of Technology, Karlsruhe, Germany, June 2017. https://martin-thoma.com/msthesis/
Vinberg, E.B.: Linear Representations of Groups. Birkhäuser, Basel (2012)
Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 37(3), 328–339 (1989)
Article Google Scholar
Weyl, H.: The Classical Groups: Their Invariants and Representations. Princeton Mathematical Series (1) (1946)
Worfolk, P.A.: Zeros of equivariant vector fields: algorithms for an invariant approach. J. Symb. Comput. 17(6), 487–511 (1994)
Article MathSciNet Google Scholar
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5028–5037 (2017)
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems, pp. 3391–3401 (2017)

Download references

Acknowledgements

The author thanks the anonymous reviewer for several helpful suggestions.

Author information

Authors and Affiliations

Skolkovo Institute of Science and Technology, Moscow, Russia
Dmitry Yarotsky
Institute for Information Transmission Problems, Moscow, Russia
Dmitry Yarotsky

Authors

Dmitry Yarotsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Yarotsky.

Additional information

Communicated by Wolfgang Dahmen, Ronald A. Devore, and Philipp Grohs.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Lemma 4.1

The proof is a slight modification of the standard proof of Central Limit Theorem via Fourier transform (the CLT can be directly used to prove the lemma in the case $a=b=0$ when ${\mathcal {L}}_{\lambda }^{(a,b)}$ only includes diffusion factors).

To simplify notation, assume without loss of generality that $d_V=1$ (in the general case the proof is essentially identical). We will use the appropriately discretized version of the Fourier transform (i.e., the Fourier series expansion). Given a discretized signal $ \Phi :(\lambda {\mathbb {Z}})^2\rightarrow {\mathbb {C}}$, we define ${\mathcal {F}}_\lambda \Phi $ as a function on $[-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2$ by

$$\begin{aligned} {\mathcal {F}}_\lambda \Phi ({\mathbf {p}})=\frac{\lambda ^2}{2\pi }\sum _{{\gamma }\in (\lambda {\mathbb {Z}})^2} \Phi ({\gamma })e^{-i{\mathbf {p}}\cdot {\gamma }}. \end{aligned}$$

Then, ${\mathcal {F}}_\lambda : L^2((\lambda {\mathbb {Z}})^2,{\mathbb {C}})\rightarrow L^2([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2,{\mathbb {C}})$ is a unitary isomorphism, assuming that the scalar product in the input space is defined by $\langle \Phi ,\Psi \rangle =\lambda ^2\sum _{\gamma \in (\lambda {\mathbb {Z}})^2}\overline{\Phi (\gamma )}\Psi (\gamma )$ and in the output space by $\langle \Phi ,\Psi \rangle =\int _{[-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2} \overline{\Phi ({\mathbf {p}})}\Psi ({\mathbf {p}}) \mathrm{d}^2{\mathbf {p}}$. Let $P_\lambda $ be the discretization projector (3.6). It is easy to check that ${\mathcal {F}}_\lambda P_\lambda $ strongly converges to the standard Fourier transform as $\lambda \rightarrow 0:$

$$\begin{aligned} \lim _{\lambda \rightarrow 0}{\mathcal {F}}_\lambda P_\lambda \Phi = {\mathcal {F}}_0 \Phi , \quad \Phi \in L^2({\mathbb {R}}^2,{\mathbb {C}}), \end{aligned}$$

where

$$\begin{aligned} {\mathcal {F}}_0 \Phi ({\mathbf {p}})=\frac{1}{2\pi }\int _{{\mathbb {R}}^2} \Phi ({\gamma })e^{-i{\mathbf {p}}\cdot {\gamma }}\mathrm{d}^2{\gamma } \end{aligned}$$

and where we naturally embed $L^2([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2,{\mathbb {C}})\subset L^2({\mathbb {R}}^2,{\mathbb {C}})$. Conversely, let $P_\lambda '$ denote the orthogonal projection onto the subspace $L^2([-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2,{\mathbb {C}})$ in $L^2({\mathbb {R}}^2,{\mathbb {C}}):$

$$\begin{aligned} P_\lambda ':\Phi \mapsto \Phi |_{[-\frac{\pi }{\lambda },\frac{\pi }{\lambda }]^2}. \end{aligned}$$

(A.1)

Then

$$\begin{aligned} \lim _{\lambda \rightarrow 0}{\mathcal {F}}_\lambda ^{-1} P'_\lambda \Phi = {\mathcal {F}}_0^{-1} \Phi , \quad \Phi \in L^2({\mathbb {R}}^2,{\mathbb {C}}). \end{aligned}$$

(A.2)

Fourier transform gives us the spectral representation of the discrete differential operators (4.15), (4.16), (4.17) as operators of multiplication by function:

$$\begin{aligned} {\mathcal {F}}_\lambda \partial _{z}^{(\lambda )} \Phi&=\Psi _{\partial _{z}^{(\lambda )}}\cdot {\mathcal {F}}_\lambda \Phi ,\\ {\mathcal {F}}_\lambda \partial _{{\overline{z}}}^{(\lambda )} \Phi&=\Psi _{\partial _{{\overline{z}}}^{(\lambda )}}\cdot {\mathcal {F}}_\lambda \Phi ,\\ {\mathcal {F}}_\lambda \Delta ^{(\lambda )} \Phi&=\Psi _{\Delta ^{(\lambda )}}\cdot {\mathcal {F}}_\lambda \Phi , \end{aligned}$$

where, denoting ${\mathbf {p}}=(p_x,p_y)$,

$$\begin{aligned} \Psi _{\partial _{z}^{(\lambda )}}(p_x,p_y)&=\frac{i}{2\lambda }(\sin \lambda p_x-i\sin \lambda p_y),\\ \Psi _{\partial _{{\overline{z}}}^{(\lambda )}}(p_x,p_y)&=\frac{i}{2\lambda }(\sin \lambda p_x+i\sin \lambda p_y),\\ \Psi _{\Delta ^{(\lambda )}}(p_x,p_y)&=-\frac{4}{\lambda ^2}\Big (\sin ^2\frac{\lambda p_x}{2}+\sin ^2\frac{\lambda p_y}{2}\Big ). \end{aligned}$$

The operator ${\mathcal {L}}_\lambda ^{(a,b)}$ defined in (4.19) can then be written as

$$\begin{aligned} {\mathcal {F}}_\lambda {\mathcal {L}}_\lambda ^{(a,b)} \Phi =\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\cdot {\mathcal {F}}_\lambda P_\lambda \Phi , \end{aligned}$$

where the function $\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}$ is given by

$$\begin{aligned} \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}=(\Psi _{\partial _{z}^{(\lambda )}})^a (\Psi _{\partial _{{\overline{z}}}^{(\lambda )}})^b\left( 1+\tfrac{\lambda ^2}{8} \Psi _{\Delta ^{(\lambda )}}\right) ^{\lceil 4/\lambda ^2\rceil }. \end{aligned}$$

We can then write ${\mathcal {L}}_\lambda ^{(a,b)}\Phi $ as a convolution of $P_\lambda \Phi $ with the kernel

$$\begin{aligned} \Psi _{a,b}^{(\lambda )}=\frac{1}{2\pi }{\mathcal {F}}_\lambda ^{-1} \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}} \end{aligned}$$

on the grid $(\lambda {\mathbb {Z}})^2:$

$$\begin{aligned} {\mathcal {L}}_\lambda ^{(a,b)}\Phi (\gamma )=\lambda ^2\sum _{\theta \in (\lambda {\mathbb {Z}})^2}P_\lambda \Phi (\gamma -\theta )\Psi _{a,b}^{(\lambda )}(\theta ),\quad \gamma \in (\lambda {\mathbb {Z}})^2. \end{aligned}$$

(A.3)

Now consider the operator ${\mathcal {L}}_0^{(a,b)}$ defined in (4.21). At each ${\mathbf {x}}\in {\mathbb {R}}^2$, the value ${\mathcal {L}}_0^{(a,b)} \Phi ({\mathbf {x}})$ can be written as a scalar product:

$$\begin{aligned} {\mathcal {L}}_0^{(a,b)} \Phi ({\mathbf {x}})=\int _{{\mathbb {R}}^2}\Phi ({\mathbf {x}}-{\mathbf {y}}) \Psi _{a,b}({\mathbf {y}})\mathrm{d}^2{\mathbf {y}}=\langle R_{-{\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}\rangle _{L^2({\mathbb {R}}^2)}, \end{aligned}$$

(A.4)

where ${\widetilde{\Phi }}({\mathbf {x}})={\overline{\Phi }}(-{\mathbf {x}})$, $\Psi _{a,b}$ is defined by (4.20), and $R_{{\mathbf {x}}}$ is our standard representation of the group ${\mathbb {R}}^2$, $R_{{\mathbf {x}}}\Phi ({\mathbf {y}})=\Phi ({\mathbf {y}}-{\mathbf {x}})$. For $\lambda >0$, we can write ${\mathcal {L}}_\lambda ^{(a,b)} \Phi ({\mathbf {x}})$ in a similar form. Indeed, using (A.3) and naturally extending the discretized signal $\Psi _{a,b}^{(\lambda )}$ to the whole ${\mathbb {R}}^2$, we have

$$\begin{aligned} {\mathcal {L}}_\lambda ^{(a,b)}\Phi (\gamma )=\int _{{\mathbb {R}}^2}\Phi (\gamma -{\mathbf {y}})\Psi _{a,b}^{(\lambda )}({\mathbf {y}})\mathrm{d}^2{\mathbf {y}}=\langle R_{-\gamma }{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle _{L^2({\mathbb {R}}^2)}. \end{aligned}$$

Then, for any ${\mathbf {x}}\in {\mathbb {R}}^2$ we can write

$$\begin{aligned} {\mathcal {L}}_\lambda ^{(a,b)}\Phi ({\mathbf {x}})=\langle R_{-{\mathbf {x}}+\delta {\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle _{L^2({\mathbb {R}}^2)}, \end{aligned}$$

(A.5)

where $-{\mathbf {x}}+\delta {\mathbf {x}}$ is the point of the grid $(\lambda {\mathbb {Z}})^2$ nearest to $-{\mathbf {x}}$.

Now consider the formulas (A.4), (A.5) and observe that, by Cauchy-Schwarz inequality and since R is norm-preserving, to prove statement 1) of the lemma we only need to show that the functions $\Psi _{a,b},\Psi _{a,b}^{(\lambda )}$ have uniformly bounded $L^2$-norms. For $\lambda >0$ we have

$$\begin{aligned} \Vert \Psi _{a,b}^{(\lambda )}\Vert ^2_{L^2({\mathbb {R}}^2)}&=\Big \Vert \frac{1}{2\pi }{\mathcal {F}}_\lambda ^{-1}\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\Big \Vert ^2_{L^2({\mathbb {R}}^2)}\nonumber \\&=\frac{1}{4\pi ^2}\Vert \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}\Vert ^2_{L^2({\mathbb {R}}^2)}\nonumber \\&=\frac{1}{4\pi ^2}\big \Vert (\Psi _{\partial _{z}^{(\lambda )}})^a (\Psi _{\partial _{{\overline{z}}}^{(\lambda )}})^b\left( 1+\tfrac{\lambda ^2}{8}\Psi _{\Delta ^{(\lambda )}}\right) ^{\lceil 4/\lambda ^2\rceil }\big \Vert _{L^2({\mathbb {R}}^2)}^2\nonumber \\&\le \frac{1}{4\pi ^2}\int _{-\pi /\lambda }^{\pi /\lambda }\int _{-\pi /\lambda }^{\pi /\lambda } \big (\tfrac{|p_x|+|p_y|}{2}\big )^{2(a+b)}\nonumber \\&\quad \exp \left( -\lceil 4/\lambda ^2\rceil \left( \sin ^2\tfrac{\lambda p_x}{2}+\sin ^2\tfrac{\lambda p_y}{2}\right) \right) \mathrm{d}p_x\mathrm{d}p_y\nonumber \\&\le \frac{1}{4\pi ^2}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty } \big (\tfrac{|p_x|+|p_y|}{2}\big )^{2(a+b)} \exp (-\tfrac{4}{\pi ^2}(p_x^2+p_y^2))\mathrm{d}p_x\mathrm{d}p_y\\&<\infty ,\nonumber \end{aligned}$$

(A.6)

where we used the inequalities

$$\begin{aligned}&|\sin t| \le |t|,\\&|1+t|\le e^t,\quad t>-1,\\&|\sin t|\ge \tfrac{2|t|}{\pi },\quad t\in [-\tfrac{\pi }{2},\tfrac{\pi }{2}]. \end{aligned}$$

Expression (A.6) provides a finite bound, uniform in $\lambda $, for the squared norms $\Vert \Psi _{a,b}^{(\lambda )}\Vert ^2$. This bound also holds for $\Vert \Psi _{a,b}\Vert ^2$.

Next, observe that to establish the strong convergence in statement 2) of the lemma, it suffices to show that

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\Vert \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\Vert _{L^2({\mathbb {R}}^2)}=0. \end{aligned}$$

(A.7)

Indeed, by (A.4), (A.5), we would then have

$$\begin{aligned} \Vert {\mathcal {L}}_\lambda ^{(a,b)} \Phi -{\mathcal {L}}_0^{(a,b)} \Phi \Vert _\infty&=\sup _{{\mathbf {x}}\in {\mathbb {R}}^2}|\langle R_{-{\mathbf {x}}+\delta {\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle -\langle R_{-{\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}\rangle |\\&=\sup _{{\mathbf {x}}\in {\mathbb {R}}^2}|\langle R_{-{\mathbf {x}}}(R_{\delta {\mathbf {x}}}-1){\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}\rangle +\langle R_{-{\mathbf {x}}}{\widetilde{\Phi }},\Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\rangle |\\&\le \sup _{\Vert \delta {\mathbf {x}}\Vert \le \lambda }\Vert R_{\delta {\mathbf {x}}}{\widetilde{\Phi }}-{\widetilde{\Phi }}\Vert _2\sup _{\lambda }\Vert \Psi _{a,b}^{(\lambda )}\Vert _2+\Vert {\widetilde{\Phi }}\Vert _2\Vert \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\Vert _2\\&{\mathop {\longrightarrow }\limits ^{\lambda \rightarrow 0}}0 \end{aligned}$$

thanks to the unitarity of R, convergence $\lim _{\delta {\mathbf {x}}\rightarrow 0}\Vert R_{\delta {\mathbf {x}}}{\widetilde{\Phi }}-{\widetilde{\Phi }}\Vert _2=0,$ uniform boundedness of $\Vert \Psi _{a,b}^{(\lambda )}\Vert _2$ and convergence (A.7).

To establish (A.7), we write

$$\begin{aligned} \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}=\frac{1}{2\pi }({\mathcal {F}}_\lambda ^{-1} \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}-{\mathcal {F}}_0^{-1}\Psi _{{\mathcal {L}}_0^{(a,b)}}), \end{aligned}$$

where $\Psi _{{\mathcal {L}}_0^{(a,b)}}=2\pi {\mathcal {F}}_\lambda \Psi _{a,b}.$ By definition (4.20) of $\Psi _{a,b}$ and standard properties of Fourier transform, the explicit form of the function $\Psi _{{\mathcal {L}}_0^{(a,b)}}$ is

$$\begin{aligned} \Psi _{{\mathcal {L}}_0^{(a,b)}}(p_x,p_y)=\big (\tfrac{i(p_x-ip_y)}{2}\big )^a \big (\tfrac{i(p_x+ip_y)}{2}\big )^b \exp \big (-\tfrac{p_x^2+p_y^2}{2}\big ). \end{aligned}$$

Observe that the function $\Psi _{{\mathcal {L}}_0^{(a,b)}}$ is the pointwise limit of the functions $\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}$ as $\lambda \rightarrow 0$. The functions $|\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}|^2$ are bounded uniformly in $\lambda $ by the integrable function appearing in the integral (A.6). Therefore we can use the dominated convergence theorem and conclude that

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\big \Vert \Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}-P'_\lambda \Psi _{{\mathcal {L}}_0^{(a,b)}}\big \Vert _2=0, \end{aligned}$$

(A.8)

where $P_\lambda '$ is the cut-off projector (A.1). We then have

$$\begin{aligned} \Vert \Psi _{a,b}^{(\lambda )}-\Psi _{a,b}\Vert _2&=\frac{1}{2\pi } \big \Vert {\mathcal {F}}_\lambda ^{-1}\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}} -{\mathcal {F}}_0^{-1}\Psi _{{\mathcal {L}}_0^{(a,b)}}\big \Vert _2\\&\le \frac{1}{2\pi }\big \Vert {\mathcal {F}}_\lambda ^{-1}(\Psi _{{\mathcal {L}}_\lambda ^{(a,b)}}-P'_\lambda \Psi _{{\mathcal {L}}_0^{(a,b)}})\big \Vert _2+ \frac{1}{2\pi }\big \Vert ({\mathcal {F}}_\lambda ^{-1}P'_\lambda -{\mathcal {F}}_0^{-1}) \Psi _{{\mathcal {L}}_0^{(a,b)}}\big \Vert _2\\&{\mathop {\longrightarrow }\limits ^{\lambda \rightarrow 0}}0 \end{aligned}$$

by (A.8) and (A.2). We have thus proved (A.7).

It remains to show that the convergence ${\mathcal {L}}_\lambda ^{(a,b)} \Phi \rightarrow {\mathcal {L}}_0^{(a,b)} \Phi $ is uniform on compact sets $K\subset V$. This follows by a version of continuity argument. For any $\epsilon >0$, we can choose finitely many $\Phi _n,n=1,\ldots ,N,$ such that for any $\Phi \in K$ there is some $\Phi _n$ for which $\Vert \Phi -\Phi _n\Vert <\epsilon .$ Then $\Vert {\mathcal {L}}_\lambda ^{(a,b)} \Phi - {\mathcal {L}}_0^{(a,b)} \Phi \Vert \le \Vert {\mathcal {L}}_\lambda ^{(a,b)} \Phi _n- {\mathcal {L}}_0^{(a,b)} \Phi _n\Vert +2\sup _{\lambda \ge 0} \Vert {\mathcal {L}}_\lambda ^{(a,b)}\Vert \epsilon $. Since $\sup _{\lambda \ge 0} \Vert {\mathcal {L}}_\lambda ^{(a,b)}\Vert <\infty $ by statement 1) of the lemma, the desired uniform convergence for $\Phi \in K$ follows from the convergence for $\Phi _n,n=1,\ldots ,N$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yarotsky, D. Universal Approximations of Invariant Maps by Neural Networks. Constr Approx 55, 407–474 (2022). https://doi.org/10.1007/s00365-021-09546-1

Download citation

Received: 15 June 2019
Accepted: 20 August 2020
Published: 30 April 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00365-021-09546-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Universal Approximations of Invariant Maps by Neural Networks

Abstract

Access this article

Similar content being viewed by others

Analysis of (sub-)Riemannian PDE-G-CNNs

Geometric deep learning and equivariant neural networks

Equivariant Representation Learning in the Presence of Stabilizers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proof of Lemma 4.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Universal Approximations of Invariant Maps by Neural Networks

Abstract

Access this article

Similar content being viewed by others

Analysis of (sub-)Riemannian PDE-G-CNNs

Geometric deep learning and equivariant neural networks

Equivariant Representation Learning in the Presence of Stabilizers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proof of Lemma 4.1

Proof of Lemma 4.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation