The Universal Approximation Property

Kratsios, Anastasis

doi:10.1007/s10472-020-09723-1

The Universal Approximation Property

Characterization, Construction, Representation, and Existence

Open access
Published: 22 January 2021

Volume 89, pages 435–469, (2021)
Cite this article

Download PDF

You have full access to this open access article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

The Universal Approximation Property

Download PDF

Anastasis Kratsios ORCID: orcid.org/0000-0001-6791-3371¹

2644 Accesses
37 Citations
6 Altmetric
Explore all metrics

Abstract

The universal approximation property of various machine learning models is currently only understood on a case-by-case basis, limiting the rapid development of new theoretically justified neural network architectures and blurring our understanding of our current models’ potential. This paper works towards overcoming these challenges by presenting a characterization, a representation, a construction method, and an existence result, each of which applies to any universal approximator on most function spaces of practical interest. Our characterization result is used to describe which activation functions allow the feed-forward architecture to maintain its universal approximation capabilities when multiple constraints are imposed on its final layers and its remaining layers are only sparsely connected. These include a rescaled and shifted Leaky ReLU activation function but not the ReLU activation function. Our construction and representation result is used to exhibit a simple modification of the feed-forward architecture, which can approximate any continuous function with non-pathological growth, uniformly on the entire Euclidean input space. This improves the known capabilities of the feed-forward architecture.

Article PDF

Siamese Neural Networks: An Overview

Fundamentals of Artificial Neural Networks and Deep Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943)
Article MathSciNet MATH Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psych. Rev. 65(6), 386 (1958)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)
Article Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Article MathSciNet MATH Google Scholar
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Article MathSciNet Google Scholar
Kolmogorov, A.N.: On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953–956 (1957)
MathSciNet MATH Google Scholar
Webb, S.: Deep learning for biology. Nature 554(7693) (2018)
Eraslan, G., Avsec, Z., Gagneur, J., Theis, F.J.: Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20(7), 389–403 (2019)
Article Google Scholar
Plis, S.M.: Deep learning for neuroimaging: a validation study. Front. Neurosci. 8, 229 (2014)
Article Google Scholar
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Trans. Intell. Syst. Technol. 11(3) (2020)
Buehler, H., Gonon, L., Teichmann, J., Wood, B.: Deep hedging. Quant. Finance 19(8), 1271–1291 (2019)
Article MathSciNet MATH Google Scholar
Becker, S., Cheridito, P., Jentzen, A.: Deep optimal stopping. J. Mach. Learn. Res. 20, Paper No. 74, 25 (2019)
MathSciNet MATH Google Scholar
Cuchiero, C., Khosrawi, W., Teichmann, J.: A generative adversarial network approach to calibration of local stochastic volatility models. Risks 8(4), 101 (2020)
Article Google Scholar
Kratsios, A., Hyndman, C.: Deep arbitrage-free learning in a generalized HJM framework via arbitrage-regularization. Risks 8(2), 40 (2020)
Article Google Scholar
Horvath, B., Muguruza, A., Tomas, M.: Deep learning volatility: a deep neural network perspective on pricing and calibration in (rough) volatility models. Quant. Finance 0(0), 1–17 (2020)
Google Scholar
Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993)
Article Google Scholar
Kidger, P., Lyons, T. In: Abernethy, J, Agarwal, S (eds.) : Universal Approximation with Deep Narrow Networks, vol. 125, pp 2306–2327. PMLR, USA (2020)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article MATH Google Scholar
Park, S., Yun, C., Lee, J., Shin, J.: Minimum width for universal approximation. ICLR (2021)
Hanin, B.: Universal function approximation by deep neural nets with bounded width and relu activations. Math. - MDPI 7(10) (2019)
Lu, Z., Pu, H., Wang, F., Hu, Z., Wang, L.: The expressive power of neural networks: A view from the width. In: Advances in Neural Information Processing Systems, vol. 30, pp 6231–6239. Curran Associates, Inc. (2017)
Fletcher, P.T., Venkatasubramanian, S., Joshi, S.: The geometric median on riemannian manifolds with application to robust atlas estimation. Neuroimage 45(1), S143–S152 (2009). Mathematics in Brain Imaging
Article Google Scholar
Keller-Ressel, M., Nargang, S.: Hydra: a method for strain-minimizing hyperbolic embedding of network- and distance-based data. J. Complex Netw. 8(1), cnaa002, 18 (2020)
MathSciNet Google Scholar
Ganea, O., Becigneul, G., Hofmann, T.: Hyperbolic neural networks. In: Bengio, S, Wallach, H, Larochelle, H, Grauman, K, Cesa-Bianchi, N, Garnett, R (eds.) Advances in Neural Information Processing Systems, vol. 31, pp 5345–5355. Curran Associates, Inc. (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp 7354–7363. PMLR (2019)
Arens, R.F., Eells, J.: On embedding uniform and topological spaces. Pacific J. Math. 6, 397–403 (1956)
Article MathSciNet MATH Google Scholar
von Luxburg, U., Bousquet, O.: Distance-based classification with Lipschitz functions. J. Mach. Learn. Res. 5, 669–695 (2003/04)
MathSciNet MATH Google Scholar
Ambrosio, L., Puglisi, D.: Linear extension operators between spaces of Lipschitz maps and optimal transport. J. Reine Angew. Math. 764, 1–21 (2020)
Article MathSciNet MATH Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks, pp. 214–223. PMLR, International Convention Centre, Sydney, Australia (2017)
Xu, T., Le, W., Munn, M., Acciaio, B.: Cot-gan: Generating sequential data via causal optimal transport. Advances in Neural Information Processing Systems 33 (2020)
Godefroy, G., Kalton, N.J.: Lipschitz-free Banach spaces. pp. 121–141. Dedicated to Professor Aleksander Pełczyński on the occasion of his 70th birthday (2003)
Weaver, N.: Lipschitz algebras. World Scientific Publishing Co. Pte. Ltd., Hackensack (2018)
Godefroy, G.: A survey on Lipschitz-free Banach spaces. Comment. Math. 55(2), 89–118 (2015)
MathSciNet MATH Google Scholar
Jost, J.: Riemannian Geometry and Geometric Analysis, 6th edn. Universitext, Springer, Heidelberg (2011)
Book MATH Google Scholar
Basso, G.: Extending and improving conical bicombings. preprint 2005.13941 (2020)
Nagata, J-: Modern general topology, revised. North-Holland Publishing Co., Amsterdam (1974). Wolters-Noordhoff Publishing, Groningen; American Elsevier Publishing Co., New York (1974). Bibliotheca Mathematica, Vol. VII
MATH Google Scholar
Munkres, J.R.: Topology. Prentice Hall, Inc., Upper Saddle River (2000). 2
MATH Google Scholar
Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)
MathSciNet MATH Google Scholar
Kontorovich, L., Nadler, B.: Universal kernel-based learning with applications to regular languages. J. Mach. Learn. Res. 10, 1095–1129 (2009)
MathSciNet MATH Google Scholar
Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 9, 1615–1646 (2008)
MathSciNet MATH Google Scholar
Grigoryeva, L., Ortega, J-P: Differentiable reservoir computing. J. Mach. Learn. Res. 20, Paper No. 179, 62 (2019)
MathSciNet MATH Google Scholar
Cuchiero, C., Gonon, L., Grigoryeva, L., Ortega, J-P, Teichmann, J.: Discrete-time signatures and randomness in reservoir computing. pre-print 2010.14615 (2020)
Fletcher, P.T.: Geodesic regression and the theory of least squares on Riemannian manifolds. Int. J. Comput. Vis. 105(2), 171–185 (2013)
Article MathSciNet MATH Google Scholar
Kratsios, A., Bilokopytov, E.: Non-euclidean universal approximation (2020)
Osborne, M.S.: Locally convex spaces, Graduate Texts in Mathematics, vol. 269. Springer, Cham (2014)
Book Google Scholar
Petersen, P., Raslan, M., Voigtlaender, F.: Topological properties of the set of functions generated by neural networks of fixed size. Found Comput Math. https://doi.org/10.1007/s10208-020-09461-0 (2020)
Gribonval, R., Kutyniok, G., Nielsen, M., Voigtlaender, F.: Approximation spaces of deep neural networks. Constr. Approx forthcoming (2020)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2016)
Gelfand, I.: Normierte Ringe. Rec. Math. N. S. 9(51), 3–24 (1941)
MathSciNet MATH Google Scholar
Isbell, J.R.: Structure of categories. Bull. Amer. Math. Soc. 72, 619–655 (1966)
Article MathSciNet MATH Google Scholar
Dimov, G.D.: Some generalizations of the Stone duality theorem. Publ. Math. Debrecen 80(3-4), 255–293 (2012)
Article MathSciNet MATH Google Scholar
Tuitman, J.: A refinement of a mixed sparse effective Nullstellensatz. Int. Math. Res. Not. IMRN 7, 1560–1572 (2011)
Fletcher, P.T.: Geodesic regression and the theory of least squares on Riemannian manifolds. Int. J. Comput. Vis. 105(2), 171–185 (2013)
Article MathSciNet MATH Google Scholar
Meyer, G., Bonnabel, S., Sepulchre, R.: Regression on fixed-rank positive semidefinite matrices: a Riemannian approach. J. Mach. Learn. Res. 12, 593–625 (2011)
MathSciNet MATH Google Scholar
Baes, M., Herrera, C., Neufeld, A., Ruyssen, P.: Low-rank plus sparse decomposition of covariance matrices using neural network parametrization. pre-print 1908.00461 (2019)
Hummel, J., Biederman, I.: Dynamic binding in a neural network for shape recognition. Psych. Rev. 99, 480–517 (1992)
Article Google Scholar
Bishop, C.M.: Mixture density networks (1994)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ICLR (2017)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. Neural Netw. Learn Syst. 20(1), 61–80 (2009)
Article Google Scholar
PrajitRamachandran, Q.V.L.: Searching for activation functions. ICLR (2018)
Pinkus, A.: Approximation theory of the MLP model in neural networks 8, 143–195 (1999)
Koopman, B.O.: Hamiltonian systems and transformation in hilbert space. Proc. Natl. Acad. Sci. 17(5), 315–318 (1931)
Article MATH Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. ICML 30(1), 3 (2013)
Google Scholar
Singh, R.K., Manhas, J.S.: Composition operators on function spaces, North-Holland Mathematics Studies, vol. 179. North-Holland Publishing Co., Amsterdam (1993)
Google Scholar
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, vol. 27, pp 17–36. JMLR Workshop and Conference Proceedings (2012)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y, Hammer, B, Iliadis, L, Maglogiannis, I (eds.) Artificial Neural Networks and Machine Learning – ICANN 2018, pp 270–279. Springer (2018)
Chollet, F., et al.: Keras. https://keras.io/guides/transfer_learning/ (2015)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39(3), 930–945 (1993)
Article MathSciNet MATH Google Scholar
Darken, C., Donahue, M., Gurvits, L., Sontag, E.: Rate of approximation results motivated by robust neural network learning. In: Proceedings of the Sixth Annual Conference on Computational Learning Theory, pp 303–309. Association for Computing Machinery, New York (1993)
Prolla, J.B.: Weighted spaces of vector-valued continuous functions. Ann. Mat. Pura Appl. (4) 89, 145–157 (1971)
Article MathSciNet MATH Google Scholar
Bourbaki, N.: Éléments de mathématique. Topologie générale. Chapitres 1 à 4. Hermann, Paris (1971)
MATH Google Scholar
Phelps, R.R.: Subreflexive normed linear spaces. Arch. Math. (Basel) 8, 444–450 (1957)
Article MathSciNet MATH Google Scholar
Kadec, M.I.: A proof of the topological equivalence of all separable infinite-dimensional Banach spaces. Funkcional. Anal. i Priložen. 1, 61–70 (1967)
MathSciNet Google Scholar
Grosse-Erdmann, K.-G., Peris Manguillot, A.: Linear chaos. Universitext, Springer, London (2011)
Pérez Carreras, P., Bonet, J.: Barrelled locally convex spaces, North-Holland Mathematics Studies, vol. 131. North-Holland Publishing Co., Amsterdam. Notas de Matemática [Mathematical Notes], 113 (1987)
Kreyszig, E.: Introductory functional analysis with applications, Wiley Classics Library. Wiley, New York (1989)
Bourbaki, N.: Espaces vectoriels topologiques. Chapitres 1 à 5, New. Masson, Paris (1981). Éléments de mathématique
MATH Google Scholar
Kalmes, T.: Dynamics of weighted composition operators on function spaces defined by local properties. Studia Math. 249(3), 259–301 (2019)
Article MathSciNet MATH Google Scholar
Przestacki, A.: Dynamical properties of weighted composition operators on the space of smooth functions. J. Math. Anal. Appl. 445(1), 1097–1113 (2017)
Article MathSciNet MATH Google Scholar
Bayart, F., Darji, U.B., Pires, B.: Topological transitivity and mixing of composition operators. J. Math. Anal. Appl. 465(1), 125–139 (2018)
Article MathSciNet MATH Google Scholar
Hoffmann, H.: On the continuity of the inverses of strictly monotonic functions. Irish Math. Soc. Bull. (75), 45–57 (2015)
Behrends, E., Schmidt-Bichler, U.: M-structure and the Banach-Stone theorem. Studia Math. 69(1), 33–40 (1980/81)
Article MathSciNet MATH Google Scholar
Jarchow, H.: Locally convex spaces. B. G. Teubner, Stuttgart. Mathematische Leitfäden. [Mathematical Textbooks] (1981)
Dieudonné, J., Schwartz, L.: La dualité dans les espaces F et LF. Ann. Inst. Fourier (Grenoble) 1, 61–101 (1949)
Article MathSciNet MATH Google Scholar

Download references

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich.

Author information

Authors and Affiliations

(ETH) Eidgenössische Technische Hochschule Zürich, Rämistrasse 101, CH-8092, Zürich, Switzerland
Anastasis Kratsios

Authors

Anastasis Kratsios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasis Kratsios.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of Main Results

Theorem 1 is encompassed by the following broader but more technical result.

Lemma 2 (Characterization of the Universal Approximation Property)

Let $\mathcal {X}$ be a function space, E is an infinite-dimensional Fréchet space for which there exits some homeomorphism ${\Phi }:\mathcal {X}\rightarrow E$, and $\left ({{\mathscr{F}}},\circlearrowleft \right )$ be an architecture on $\mathcal {X}$. Then the following are equivalent:

(i)
UAP: $\left ({{\mathscr{F}}},\circlearrowleft \right )$ has the UAP,
(ii)
Decomposition of UAP via Subspaces: There exist subspaces $\{\mathcal {X}_{i}\}_{i \in I}$ of $\mathcal {X}$ such that:
1. (a)
  $\bigcup _{i \in I} \mathcal {X}_{i}$ is dense in $\mathcal {X}$,
2. (b)
  For each i ∈ I, ${\Phi }(\mathcal {X}_{i})$ is a separable infinite-dimensional Fréchet subspace of E and ${\Phi }\left ({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\right )$ contains a countable, dense, and linearly-independent subset of ${\Phi }(\mathcal {X}_{i})$,
3. (c)
  For each i ∈ I, there exists a homeomorphism ${\Phi }_{i}:\mathcal {X}_{i} \rightarrow L^{2}({\mathbb {R}})$.
(iii)
Decomposition of UAP via Topologically Transitive Dynamics: There exist subspaces $\{\mathcal {X}_{i}\}_{i \in I}$ of $\mathcal {X}$ and continuous functions {ϕ_i}_i∈I with $\phi _{i}:\mathcal {X}_{i}\rightarrow \mathcal {X}_{i}$ such that:
1. (a)
  $\bigcup _{i \in I} \mathcal {X}_{i}$ is dense in $\mathcal {X}$,
2. (b)
  For every pair of non-empty open subsets U,V of $\mathcal {X}$ and every i ∈ I, there is some $N_{i,U,V}\in {\mathbb {N}}$ such that $\phi ^{N_{i,U,V}}(U\cap \mathcal {X}_{i})\cap (V\cap \mathcal {X}_{i}) \neq \emptyset $,
3. (c)
  For every i ∈ I, there is some $g_{i} \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}$ such that $\{{\phi _{i}^{n}}(g_{i})\}_{n \in {\mathbb {N}}}$ is a dense subset of ${\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}$, and in particular, it is a dense subset of $\mathcal {X}_{i}$,
4. (d)
  For each i ∈ I, $\mathcal {X}_{i}$ is homeomorphic to $C({\mathbb {R}})$.
(iv)
Parameterization of UAP on Subspaces: There are triples {(X_i,Φ_i,ψ_i)}_i∈I of separable topological spaces X_i, non-constant continuous functions ${\Phi }_{i}:X_{i}\to \mathcal {X}$, and functions $\psi _{i}:X_{i}\rightarrow X_{i}$ satisfying the following:
1. (a)
  $\bigcup _{i \in I} {\Phi }_{i}(X_{i})$ is dense in $\mathcal {X}$,
2. (b)
  For every i ∈ I and every pair of non-empty open subsets U,V of X_i, there is some $N_{i,U,V}\in {\mathbb {N}}$ such that $\psi ^{N_{i,U,V}}(U\cap X_{i})\cap (V\cap X_{i}) \neq \emptyset $,
3. (c)
  For every i ∈ I, there is some $x_{i} \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap X_{i}$ such that $\{{\Phi }_{i}\circ {\psi _{i}^{n}}(x_{i})\}_{n \in {\mathbb {N}}}$ is a dense subset of ${\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap {\Phi }_{i}(X_{i})$, and in particular, it is a dense subset of Φ_i(X_i).

Moreover, if $\mathcal {X}$ is separable, then I may be taken to be a singleton.

Proof of Lemma 2

Suppose that (ii) holds. Since $\bigcup _{i \in I} \mathcal {X}_{i}$ is dense in $\mathcal {X}$ and since $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\subseteq {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$, then, it is sufficient to show that $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )} \cap \mathcal {X}_{i}$ is dense in $\bigcup _{i \in I} \mathcal {X}_{i}$ to conclude that is is dense in $\mathcal {X}$. Since each $\mathcal {X}_{i}$ is a subspace of $\mathcal {X}$ then, by restriction, each $\mathcal {X}_{i}$ is a subspace of $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap \mathcal {X}_{i}$ with its relative topology.

Let $\tilde {\mathcal {X}}$ denote the set $\bigcup _{i \in I} \mathcal {X}_{i}$ equipped with the finest topology making each $\mathcal {X}_{i}$ into a subspace, such a topology exists by [71, Proposition 2.6]. Since each $\mathcal {X}_{i}$ is also a subspace of $\bigcup _{i \in I} \mathcal {X}_{i}$ with its relative topology and since, by definition, that topology is no finer than the topology of $\tilde {\mathcal {X}}$ then it is sufficient to show that $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap \mathcal {X}_{i}$ is dense in $\tilde {X}$ to conclude that it is dense in $\bigcup _{i \in I} \mathcal {X}_{i}$ equipped with its relative topology.

Indeed, by [71, Proposition 2.7] the space $\tilde {X}$ is given by the (topological) quotient of the disjoint union $\sqcup _{i \in I} \mathcal {X}_{i}$, in the sense of topological spaces (see [71, Example 3, Section 2.4]), under the equivalence relation $f_{i}\sim f_{j}$ if f_i = f_j in $\mathcal {X}$. Denote the corresponding quotient map by $Q_{\tilde {\mathcal {X}}}$. Since a subset U of the quotient topology is open (see [71, Example 2, Section 2.4]) if and only if $Q_{\tilde {\mathcal {X}}}^{-1}[U]$ is an open subset of $\sqcup _{i \in I} \mathcal {X}_{i}$ and since a subset V of $\sqcup _{i \in I} \mathcal {X}_{i}$ is open if and only if $V\cap \mathcal {X}_{i}$ is open for each i ∈ I in the topology of $\mathcal {X}_{i}$ then $U\subseteq \tilde {\mathcal {X}}$ is open if and only if $Q_{\tilde {\mathcal {X}}}^{-1}[U] \cap \mathcal {X}_{i}$ is open for each i ∈ I. Since $\{{\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\}_{n \in {\mathbb {N}}^{+}}$ is dense in $\mathcal {X}_{i}$ then for every open subset $U'\subseteq \mathcal {X}_{i}$

$$ \emptyset \neq U' \cap {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap \mathcal{X}_{i} \subseteq U' \cap \bigcup\limits_{i \in I} {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap \mathcal{X}_{i} . $$

(11)

In particular, (11) implies that for every open subset $U\subseteq \tilde {\mathcal {X}}$

$$ \emptyset \neq {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap \mathcal{X}_{i} \cap \left[Q_{\tilde{\mathcal{X}}}^{-1}[U]\cap \mathcal{X}_{i}\right] \subseteq U \cap \bigcup\limits_{i \in I} {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap \mathcal{X}_{i} . $$

(12)

Therefore, $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}$ is dense in $\tilde {\mathcal {X}}$ and therefore it is dense in $\bigcup _{i \in I} \mathcal {X}_{i}$ equipped with its relative topology. Hence, ${{\mathscr{F}}}$ has the UAP and therefore (i) holds.

In the next portion of the proof, we denote the (linear algebraic) dimension of any vector space V by dim(V ). Recall, that this is the cardinality of the smallest basis for V. We follow the Von Neumann convention and, whenever required by the context, we identify the natural number n with the ordinal $\{1,\dots ,n\}$.

Assume that (i) holds. For the first part of this proof, we would like to show that D contains a linearly independent and dense subset D^′. Since $\mathcal {X}$ is homeomorphic to some infinite-dimensional Fréchet space E, then there exists a homeomorphism ${\Phi }:\mathcal {X}\to E$ mapping ${\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}$ to a dense subset D of E. We denote the metric on E by d. A consequence of [72, Theorem 3.1], discussed thereafter by the authors, implies that since E is an infinite dimensional Fréchet space then it has a dense Hamel basis, which we denote by {b_a}_a∈A. By definition of the Hamel basis of E we may assume that the cardinality of A, denoted by Card(A), is equal to dim(E). Next, we use {b_a}_a∈A to produce a base of open sets for the topology of E of cardinality equal to dim(E).

Since E is a metric space, then its topology is generated by the open sets $\{\operatorname {Ball}_{E}(b_{a},q)\}_{a \in A, r \in (0,\infty )}$, where $ \operatorname {Ball}_{E}(b_{a},r) \triangleq \left \{ d(b_{a},x)<r \right \}. $ Indeed, since ${\mathbb {Q}}$ is dense in ${\mathbb {R}}$, then for every a ∈ A and $r \in (0,\infty )$ the basic open set Ball _E(b_a,r) can be expressed by $ \operatorname {Ball}_{E}(b_{a},r) = \bigcup _{q \in {\mathbb {Q}}\cap (0,r)} \operatorname {Ball}_{E}(b_{a},q). $ Hence, $\{\operatorname {Ball}_{E}(b_{a},q)\}_{a \in A, q \in {\mathbb {Q}}\cap (0,\infty )}$ generates the topology on E. Moreover, the cardinality the indexing set $A\times {\mathbb {Q}}$ is computed by

$$ Card(A\times {\mathbb{Q}}\cap (0,\infty)) = \max\{Card(A),Card({\mathbb{Q}})\} = \max\{\textup{dim}(E),Card({\mathbb{Q}})\}=\textup{dim}(E), $$

since E is infinite and therefore at-least countable. Therefore, $\{\operatorname {Ball}_{E}(b_{a},q)\}_{a \in A, q \in {\mathbb {Q}}\cap (0,\infty )}$ is a base for the topology on E of Cardinality equal to dim(E). Let ω be the smallest ordinal with $Card(\omega )=\textup {dim}(E)=Card(A\times {\mathbb {Q}} \cap (0,\infty ))$. In particular, there exists a bijection $F:\omega \to A\times {\mathbb {Q}} \cap (0,\infty )$ which allows us to canonically order the open sets {Ball _E(F(j)₁,F(j)₂)}_j≤ω, where for any j < ω we denote F(j)₁ ∈ A and $F(j)_{2} \in {\mathbb {Q}} \cap (0,\infty )$.

We construct D^′ by transfinite induction using ω. Indeed since 1 < ω, then since D is dense in E and {Ball _E(F(j)₁,F(j)₂)}_j≤ω defines a base for the topology of E, then there exists some U₁ ∈{Ball _E(F(j)₁,F(j)₂)}_j≤ω containing some d₁ ∈ D. For the inductive step, suppose that for all i ≤ j for some j < ω, we have constructed a linearly independent set {d_i}_i<j with d_i ∈{Ball _E(F(i)₁,F(i)₂)} for every i ≤ j. Since j < ω and {d_i}_i<j contains Card(j) and {d_i}_i<j is a Hamel basis of span({x_i}_i<j) then $ \textup {dim}\left (\operatorname {span}(\{x_{i}\}_{i < j}) \right ) < \textup {dim}(E). $ Hence, span({x_i}_i<j) has empty interior and therefore it cannot contain any {Ball _E(F(j)₁,F(j)₂)}_j≤ω. In particular, there is an open subset $V'\subseteq \operatorname {Ball}_{E}(F(j)_{1},F(j)_{2}) - \operatorname {span}(\{x_{i}\}_{i < j})$ and since D was assumed to be dense in E then there must be some $d_{j} \in V'\subseteq \operatorname {Ball}_{E}(F(j)_{1},F(j)_{2})$. This completes the inductive step and therefore there is a linearly independent and dense subset $D'\triangleq \{d_{j}\}_{j \leq \omega }$ contained in D of cardinality Card(ω) = dim(E).

Next, let I be the set of all countable sequences of distinct elements in ω. For every i ∈ I, let $E_{i}\triangleq \overline {\operatorname {span}_{j \in i}(d_{j})}$, where $\overline {A}$ denotes the closure of a subset $A\subseteq E$ in the topology of E. Then, each E_i is a linear subspace of E with countable basis {d_j}_j∈i. Since any Fréchet space with countable basis is separable and therefore each E_i is a separable Fréchet space. Moreover, by construction,

$$ D'\subseteq \bigcup\limits_{i \in I} E_{i} \subseteq E $$

(13)

and therefore $\bigcup _{i \in I} E_{i}$ is dense in E since D^′ is dense in E. Since Φ is a homeomorphism then ${\Phi }^{-1}:E\to \mathcal {X}$ is a continuous surjection, and since the image of a dense set under any continuous map is dense in the range of that map then Φ^− 1(D^′) is dense in $\mathcal {X}$. Moreover, using the fact that inverse images commute with unions and the fact that that Φ is a bijection, we compute that

$$ {\Phi}^{-1}(D')\subseteq {\Phi}^{-1}\left[\bigcup\limits_{i \in I} E_{i}\right] = \bigcup\limits_{i \in I} {\Phi}^{-1}\left[E_{i}\right]. $$

(14)

Since Φ as a bijection and D was defined as the image of ${\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$ in E under Φ, then $D'\subset {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$ and D^′ is dense in $\mathcal {X}$. In particular, (14) implies that $\bigcup _{i \in I} {\Phi }^{-1}[E_{i}] \subseteq \bigcup _{i \in I} ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap {\Phi }^{-1}[E_{i}])$ and therefore $\bigcup _{i \in I} ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap {\Phi }^{-1}[E_{i}])$ is dense in $\mathcal {X}$. In particular, $\bigcup _{i \in I} {\Phi }^{-1}[E_{i}]$ is dense in $\mathcal {X}$, and for each i ∈ I, if we define $\mathcal {X}_{i}\triangleq {\Phi }^{-1}[E_{i}]$ then we obtain (ii.a).

Since Φ is a homeomorphism then it preserves dense sets and in particular since {d_i}_j∈i is a countable, dense, and linearly independent subset of ${\Phi }^{-1}[\{d_{j}\}_{j \in i}]$ then it is a dense countable subset of $\mathcal {X}_{i}$. Hence, each $\mathcal {X}_{i}$ is separable.

This gives (ii.b). Lastly, by [73] any two separable infinite-dimensional Fréchet space are homeomorphic. In particular, since $L^{2}({\mathbb {R}})$ is a separable Hilbert space is a separable Fréchet space. Therefore, for each i ∈ I, there is a homeomorphism ${\Phi }_{i}: E_{i} \to L^{2}({\mathbb {R}})$. In particular, ${\Phi }_{i}\circ {\Phi }:\mathcal {X}_{i}\to L^{2}({\mathbb {R}})$ must be a homeomorphism and therefore (ii.b) holds. Therefore, (i) implies (ii).

Suppose that (ii) holds. Then, (iii.a) holds by (ii.a). For each i ∈ I, let $\{d_{n,i}\}_{n \in {\mathbb {N}}}$ be a countable dense subset of $\mathcal {X}_{i}\cap {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$ for which ${\Phi }(\{d_{n,i}\}_{n \in {\mathbb {N}}})$ is a linearly independent, and let $E_{i}=\overline {\operatorname {span}(\{d_{n,i}\}_{n \in {\mathbb {N}}})}$. Let $D\triangleq \bigcup _{i \in I} \{d_{n,i}\}_{n \in {\mathbb {N}}}$ and $D'\triangleq {\Phi }(D)$. Thus, for every i ∈ I, D^′∩ E_i is a countably infinite linearly independent and dense subset of E_i then by [74, Theorem 8.24] there exists a continuous linear operator T_i : D ∩ E_i → D ∩ E_i satisfying

$$ {T_{i}^{n}}(d_{n,i})=d_{n+1,i}, $$

for each $n \in {\mathbb {N}}$ and each i ∈ I. In particular, $\left \{{T^{n}_{i}}(d_{0,i})\right \}$ is dense in E_i. For each i ∈ I, define $\phi _{i}\triangleq {\Phi }^{-1}\circ T_{i} \circ {\Phi }$ and $g_{i}\triangleq {\Phi }^{-1}(d_{0,i})$ and observe that for every $n \in {\mathbb {N}}$

$$ \begin{array}{@{}rcl@{}} {\phi^{n}_{i}}(g_{i}) &= & \underbrace{({\Phi}^{-1}\circ T_{i}\circ {\Phi})\circ {\dots} \circ ({\Phi}^{-1}\circ T_{i}\circ {\Phi})}_{n-times} ({\Phi}^{-1}(d_{i,0})) \\ &= & {\Phi}^{-1}\circ {T_{i}^{n}}(d_{0,i}) . \end{array} $$

(15)

Since $\{{T_{i}^{n}}(d_{0,i})\}_{n \in {\mathbb {N}}}$ is dense in E_i and Φ is a homeomorphism from $\mathcal {X}_{i}$ to E_i then

$$ {\Phi}^{-1}\left[ \{{T_{i}^{n}}(d_{0,i})\}_{n \in {\mathbb{N}}} \right]= \left\{ {\phi_{i}^{n}}(g_{i}) \right\}_{n \in {\mathbb{N}}} $$

is dense in $\mathcal {X}_{i}$. Thus, (iii.c) holds. For any i ∈ I, define the map $\psi _{i}:L^{2}({\mathbb {R}})\to L^{2}({\mathbb {R}})$ by

$$ \psi_{i} \triangleq ({\Phi}_{i}\circ {\Phi})^{-1} \circ \phi_{i} \circ ({\Phi}_{i}\circ {\Phi}), $$

and define the vector $\tilde {g}_{i} \in L^{2}({\mathbb {R}})$ by $\tilde {g}_{i}\triangleq {\Phi }_{i}\circ {\Phi }(g_{i})$. Since Φ and Φ_i are homeomorphisms and since ϕ_i is continuous then ψ_i is well-defined and continuous. Moreover, analogously to (15) we compute that $ \left \{ {\psi _{i}^{n}}(\tilde {g}_{i}) \right \}_{n \in {\mathbb {N}}} $ is dense in $L^{2}({\mathbb {R}})$. Since $L^{2}({\mathbb {R}})$ is a complete separable metric space with no isolated points and ψ_i is continuous self-map of $L^{2}({\mathbb {R}})$ for which there is a vector $\tilde {g}_{i} \in L^{2}({\mathbb {R}})$ such that the set of iterates $\{{\psi _{i}^{n}}(\tilde {g}_{i})\}_{n \in {\mathbb {N}}}$ is dense in $L^{2}({\mathbb {R}})$ then Birkhoff Transitivity Theorem, see the formulation of [74, Theorem 1.16], implies that for every pair of non-empty open subsets $\tilde {U},\tilde {V}\subseteq L^{2}({\mathbb {R}})$ there is some $n_{\tilde {U},\tilde {V}}$ satisfying

$$ \phi^{n_{\tilde{U},\tilde{V}}}(\tilde{U})\cap \tilde{V} \neq \emptyset . $$

(16)

Since Φ_i ∘Φ is a homeomorphism, then [74, Proposition 1.13] and (16) imply that for every pair of non-empty open subsets $U',V'\subseteq \mathcal {X}_{i}$ there exists some $n_{U',V'} \in {\mathbb {N}}$ satisfying

$$ \phi^{n_{U',V'}}(U')\cap V' \neq \emptyset . $$

(17)

Since $\mathcal {X}_{i}$ is equipped with the subspace topology then every non-empty open subset $U'\subseteq \mathcal {X}_{i}$ is of the form $U\cap \mathcal {X}_{i}$ for some non-empty open subset $U\subseteq \mathcal {X}$. Therefore, (17) implies (iii.b). Since both $L^{2}({\mathbb {R}})$ and $C({\mathbb {R}})$ are separable infinite-dimensional Fréchet spaces then the [73, Anderson-Kadec Theorem] implies that there exists a homeomorphism ${\Psi }:L^{2}({\mathbb {R}})\rightarrow C({\mathbb {R}})$. Therefore, for each i ∈ I, ${\Psi }\circ {\Phi }_{i}\circ {\Phi }:\mathcal {X}\rightarrow C({\mathbb {R}})$ is a homeomorphism and thus (ii.c) implies (iii.d).

Suppose that (iii) holds. For every i ∈ I, set $X_{i}\triangleq \mathcal {X}_{i}$, let ${\Phi }_{i}\triangleq 1_{X_{i}}$ be the identity map on X_i, set $\psi _{i}\triangleq \phi _{i}$, and set $x_{i}\triangleq g_{i}$. Therefore, (iv) holds.

Suppose that (iv) holds. By (iv.c), for each i ∈ I, ${\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}$ is dense in $\mathcal {X}_{i}$. Therefore,

$$ \bigcup\limits_{i \in I} \mathcal{X}_{i} = \bigcup\limits_{i \in I} \overline{{\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)} \cap \mathcal{X}_{i}} \subseteq \overline{\bigcup\limits_{i \in I} {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)} \cap \mathcal{X}_{i}} \subseteq \mathcal{X} . $$

(18)

By (iv.a) since $\bigcup _{i \in I} \mathcal {X}_{i}$ is dense in $\mathcal {X}$ therefore its closure is $\mathcal {X}$ and therefore the smallest, and thus only, closed set containing $\bigcup _{i \in I}\mathcal {X}_{i}$ is $\mathcal {X}$ itself. Therefore, by (18) the smallest set containing $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap \mathcal {X}_{i}$ must be $\mathcal {X}$. Therefore, ${\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}$ is dense in $\mathcal {X}$ and (i) holds. This concludes the proof. □

Proof of Theorem 2

By the [73, Anderson-Kadec Theorem] there is no loss of generality in assuming that m = n = 1, since $C(\mathbb {R}^{m},\mathbb {R}^{n})$ and $C({\mathbb {R}})$ are homeomorphic. Let $\mathcal {X}^{\prime }\triangleq \bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$. By (5), $\mathcal {X}^{\prime }$ is dense in $\mathcal {X}$ and since density is transitive, then it is enough to show that $\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )})$ is dense in $\mathcal {X}^{\prime }$ to conclude that it is dense in $\mathcal {X}$. Since each Φ_i is continuous, then, the topology on $\mathcal {X}^{\prime }$ is no finer than the finest topology on $\bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$ making each Φ_i continuous and by [71, Proposition 2.6] such a topology exists. Let $\mathcal {X}^{\prime \prime }$ denote $\bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$ equipped with the finest topology making each ${\Phi }_{i}(C({\mathbb {R}}))$ into a subspace. By construction, if $U\subseteq \mathcal {X}^{\prime }$ is open then it is open in $\mathcal {X}^{\prime \prime }$ and therefore if $\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}) $ intersects each non-empty open subset of $\mathcal {X}^{\prime \prime }$ then it must do the same for $\mathcal {X}^{\prime }$. Hence, it is enough to show that $\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )})$ is dense in $\mathcal {X}^{\prime \prime }$ to conclude that it is dense in $\mathcal {X}^{\prime }$ and therefore, $\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )})$ is dense in $\mathcal {X}$.

We proceed similarly to the proof of Lemma 2. Indeed, by [71, Proposition 2.7] the space $\mathcal {X}^{\prime \prime }$ is given by the (topological) quotient of the disjoint union $\sqcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$, in the sense of topological spaces (see [71, Example 3, Section 2.4]), under the equivalence relation $f_{i}\sim f_{j}$ if f_i = f_j in $\mathcal {X}$. Denote the corresponding quotient map by $Q_{\mathcal {X}^{\prime }}$. Since a subset U of the quotient topology is open (see [71, Example 2, Section 2.4]) if and only if $Q_{\mathcal {X}^{\prime }}^{-1}[U]$ is an open subset of $\sqcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$ and since a subset V of $\sqcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$ is open if and only if $V\cap {\Phi }_{i}(C({\mathbb {R}}))$ is open for each i ∈ I in the topology of ${\Phi }_{i}(C({\mathbb {R}}))$ then $U\subseteq \mathcal {X}^{\prime \prime }$ is open if and only if $Q_{\mathcal {X}^{\prime }}^{-1}[U] \cap {\Phi }_{i}(C({\mathbb {R}}))$ is open for each i ∈ I. Since $\{{\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap {\Phi }_{i}(C({\mathbb {R}}))\}_{n \in {\mathbb {N}}^{+}}$ is dense in ${\Phi }_{i}(C({\mathbb {R}}))$ then for every open subset $U'\subseteq {\Phi }_{i}(C({\mathbb {R}}))$

$$ \emptyset \neq U' \cap {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap {\Phi}_{i}(C({\mathbb{R}})) \subseteq U' \cap \bigcup\limits_{i \in I} {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap {\Phi}_{i}(C({\mathbb{R}})) . $$

(19)

In particular, (19) implies that for every open subset $U\subseteq \mathcal {X}^{\prime \prime }$

$$ \emptyset \neq {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap {\Phi}_{i}(C({\mathbb{R}})) \cap \left[Q_{\mathcal{X}^{\prime}}^{-1}[U]\cap {\Phi}_{i}(C({\mathbb{R}}))\right] \subseteq U \cap \bigcup\limits_{i \in I} {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)}\cap {\Phi}_{i}(C({\mathbb{R}})) . $$

(20)

Therefore, $\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap {\Phi }_{i}(C({\mathbb {R}}))$ is dense in $\mathcal {X}^{\prime \prime }$ and therefore it is dense in $\bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))$ equipped with its relative topology. Hence, ${({{\mathscr{F}}}_{\Phi },\circlearrowleft _{\Phi })}$ has the UAP on $\mathcal {X}^{\prime \prime }$ and therefore it has the UAP on $\mathcal {X}$ itself. □

Proof of Theorem 3

Let σ be a continuous and non-polynomial activation function. Then [61] implies that the architecture ${\left ({\mathscr{F}}_{0},\circlearrowleft _{0}\right )}$, as defined in Example 4, is a universal approximator on $C({\mathbb {R}})$.

By Theorem 1, since $\left ({{\mathscr{F}}},\circlearrowleft \right )$ has the UAP on $\mathcal {X}$ and since $\mathcal {X}$ is homeomorphic to an infinite-dimensional Fréchet space then there are homeomorphisms {Φ_i}_i∈I from $C({\mathbb {R}})$ onto a family of subspaces $\{\mathcal {X}_{i}\}_{i \in I}$ of $\mathcal {X}$ such that $\bigcup _{i \in I} \mathcal {X}_{i}$ is dense. Fix 𝜖 > 0 and $f \in \mathcal {X}$. Since $\bigcup _{i \in I} \mathcal {X}_{i}$ is dense in $\mathcal {X}$ there exists some i ∈ I and some $f_{i}\in \mathcal {X}_{i}$ such that

$$ d_{\mathcal{X}}(f,f_{i})<\frac{\epsilon}{2} . $$

(21)

Since Φ_i is a homeomorphism then it must map dense sets to dense sets. Since $\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )$ has the UAP on $C({\mathbb {R}})$ then ${\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )}$ is dense in $C({\mathbb {R}})$ and therefore, for each i ∈ I, ${\Phi }_{i}({\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )})$ is dense in $\mathcal {X}_{i}$. Hence, there exists some $\tilde {g}_{\epsilon }\in {\Phi }_{i}({\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )})$ such that $d_{\mathcal {X}}(f_{i},\tilde {g}_{\epsilon })<\frac {\epsilon }{2}$. Since Φ_i is a homeomorphism, it is a bijection, therefore there exists a unique $g_{\epsilon }\in {\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )}$ with ${\Phi }_{i}(g_{\epsilon })=\tilde {g}_{\epsilon }$. Hence, the triangle inequality and (21) imply that

$$ d_{\mathcal{X}}\left( f,{\Phi}_{i}(g_{\epsilon}) \right) \leq d_{\mathcal{X}}\left( f,f_{i} \right) + d_{\mathcal{X}}\left( f_{i},{\Phi}_{i}(g_{\epsilon}) \right) < \epsilon . $$

(122)

This yields the first inequality in the Theorem’s statement.

By Theorem 1 since, for each i ∈ I, ${\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}$ is dense in $\mathcal {X}_{i}$ and since ${\Phi }_{i}^{-1}$ is a homeomorphism on $\mathcal {X}_{i}$ then ${\Phi }_{i}^{-1}\left ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\right )$ is dense in $C({\mathbb {R}})$. In particular, there exits some $\tilde {f}_{\epsilon } \in {\Phi }_{i}^{-1}\left ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\right )$ satisfying

$$ d_{ucc}\left( g_{\epsilon}(x) , \tilde{f}_{\epsilon}(x) \right) <\epsilon . $$

(23)

Since Φ_i is a bijection then there exists a unique $f_{\epsilon }\in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$ such that ${\Phi }_{i}^{-1}(f_{\epsilon })=\tilde {f}_{\epsilon }$. Therefore, (23) and the triangle inequality imply that

$$ d_{ucc}\left( g_{\epsilon}(x) , {\Phi}_{i}^{-1}(f_{\epsilon})(x) \right) <\epsilon . $$

Therefore the conclusion holds. □

Remark 1

By the [73, Anderson-Kadec Theorem], since both $L^{2}({\mathbb {R}})$ and $C({\mathbb {R}})$ are separable infinite-dimensional Fréchet spaces then there exists a homeomorphism ${\Phi }:L^{2}({\mathbb {R}})\rightarrow C({\mathbb {R}})$. Therefore, the proof of Corollary 3 holds (mutatis mutandis) with each Φ replaced by ${\Phi }_{i}\circ {\Phi }^{-1}$ and with $C({\mathbb {R}})$ in place of $L^{2}({\mathbb {R}})$.

The proof of the next result relies on some aspects of inductive limits of Banach spaces. Briefly, an inductive limit of Banach spaces is a locally convex space B for which there exists a pre-ordered set I, a set of Banach sub-spaces {B_i}_i∈I with $B_{i}\subseteq B_{j}$ if i ≤ j. The inductive limit of this direct system is the subset $\bigcup _{i \in I} B_{i}$ equipped with the finest topology which simultaneously makes each B_i into a subspace and makes $\bigcup _{i \in I} B_{i}$ into a locally-convex spaces. Spaces constructed in this way are called ultrabornological spaces and more details about them can be found in [75, Chapter 6].

Proof of Theorem 4

Since $B(\mathcal {X}_{0})$ and B(X) are both infinite-dimensional Banach spaces, then they are infinite-dimensional ultrabornological space, in the sense of [75, Definition 6.1.1]. Since X is separable, then as observed in [33], B(X) is separable. Therefore, [75, Theorem 6.5.8] applies; hence, there exists a directed set I with pre-order ≤, a collection of Banach subspaces {B_i}_i∈I satisfying (i) and (ii), and a collection of continuous linear isomorphisms ${\Phi }_{i}:B(X)\rightarrow B_{i}$. Furthermore, the topology on B is coarser than the inductive limit topology $\varinjlim _{i \in I} B_{i}$. Since each B(X) and B_i are Banach spaces, and in particular normed linear spaces, then by the results of [76, Section 2.7] the maps Φ_i are bounded linear isomorphisms.

Let i ∈ I, and fix any x_i ∈ X −{0_X} then since $\delta ^{X}:X\rightarrow B(X)$ is base-point preserving then $\delta ^{X}_{x_{i}}\neq 0$ and therefore there exists a linearly independent subset ${\mathscr{B}}_{x_{i}}$ of B(X) containing $\delta ^{X}_{x_{i}}$. Since B(X) is separable then ${\mathscr{B}}_{x_{i}}$ is countably infinite and therefore [74, Theorem 8.24] there exists a bounded linear map $\phi _{i}:B(X)\rightarrow B(X)$ such that $\{{\phi _{i}^{n}}(\delta ^{X}_{x_{i}})\}_{n \in {\mathbb {N}}^{+}}$ is a dense subset of B(X).

Since Φ_i is a continuous linear isomorphisms then it is in particular a surjective continuous map from B(X) onto B_i. Since the image of a dense set under a continuous surjection is itself dense then $\left \{{\Phi }_{i}\circ {\phi _{i}^{n}}(\delta _{x_{i}})\right \}_{n \in {\mathbb {N}}^{+}}$ is a dense subset of B_i. Moreover, this holds for each i ∈ I.

By definition, the topology on $\varinjlim _{i \in I} B_{i}$ is at-least as fine as the Banach space topology on $B(\mathcal {X}_{0})$, since each B_i is a linear subspace of $B(\mathcal {X}_{0})$. Moreover, the topology on $\varinjlim _{i \in I} B_{i}$ is no finer than the finest topology on $\bigcup _{i \in I} B_{i}$ making each B_i into a topological space (but not requiring that $\bigcup _{i \in I} B_{i}$ be locally-convex), which exists by [77, Proposition 6]. Denote this latter space by $\tilde {B}$. Therefore, if

$$ \bigcup\limits_{i \in I; n \in {\mathbb{N}}^{+}} \left\{{\Phi}_{i}\circ {\phi_{i}^{n}}(\delta_{x_{i}})\right\} , $$

(24)

is dense in $\tilde {B}$ then it is dense in $\varinjlim _{i \in I} B_{i}$ and in $B(\mathcal {X}_{0})$. Hence, we show that (24) is dense in $\tilde {B}$. That is, it is enough to show that every open subset of $\tilde {B}$ contains an element of (24).

By [71, Proposition 2.7] the space $\tilde {B}$ is given by the topological quotient of the disjoint union ⊔_i∈IB_i, in the sense of topological spaces (see [71, Example 3, Section 2.4]), under the equivalence relation $x_{i}\sim x_{j}$ for any i ≤ j if x_i = x_j in B_j. Denote the corresponding quotient map by $Q_{\tilde {B}}$. Since a subset U of the quotient topology is open (see [71, Example 2, Section 2.4]) if and only if $Q_{\tilde {B}}^{-1}[U]$ is an open subset of ⊔_i∈IB_i and since a subset V of ⊔_i∈IB_i is open if and only if V ∩ B_i is open for each i ∈ I in the topology of B_i then $U\subseteq \tilde {B}$ is open if and only if $Q_{\tilde {B}}^{-1}[U] \cap B_{i}$ is open for each i ∈ I. Since $\{{\Phi }_{i}\circ {\phi _{i}^{n}}(x_{i})\}_{n \in {\mathbb {N}}^{+}}$ is dense in B_i then for every open subset $U'\subseteq B_{i}$

$$ \emptyset \neq U' \cap \{{\Phi}_{i}\circ {\phi_{i}^{n}}(x_{i})\}_{n \in {\mathbb{N}}^{+}} \subseteq U' \cap \bigcup\limits_{i \in I; n \in {\mathbb{N}}^{+}} \left\{{\Phi}_{i}\circ {\phi_{i}^{n}}(\delta_{x_{i}})\right\} . $$

(25)

In particular, (25) implies that for every open subset $U\subseteq \tilde {B}$

$$ \emptyset \neq \{{\Phi}_{i}\circ {\phi_{i}^{n}}(x_{i})\}_{n \in {\mathbb{N}}^{+}} \cap \left[Q_{\tilde{B}}^{-1}[U]\cap B_{i}\right] \subseteq \bigcup\limits_{i \in I; n \in {\mathbb{N}}^{+}} \left\{{\Phi}_{i}\circ {\phi_{i}^{n}}(\delta_{x_{i}})\right\} \cap U . $$

(26)

Therefore, (24) is dense in $\tilde {B}$ and, in particular, it is dense in $B(\mathcal {X}_{0})$.

Since $\mathcal {X}_{0}$ was barycentric, then there exists a continuous linear map $\rho :B(\mathcal {X}_{0})\rightarrow \mathcal {X}_{0}$ which is a left-inverse of $\delta ^{\mathcal {X}_{0}}$. Thus, for every $f \in \mathcal {X}_{0}$, $\rho \circ \delta ^{\mathcal {X}_{0}}_{f} = f$ and therefore ρ is a continuous surjection. Since the image of a dense set under a continuous surjection is dense and since (24) is dense then

$$ \bigcup\limits_{i \in I; n \in {\mathbb{N}}^{+}} \left\{\rho\circ {\Phi}_{i}\circ {\phi_{i}^{n}}(\delta_{x_{i}})\right\} , $$

(27)

is a dense subset of $\mathcal {X}_{0}$. Since $\mathcal {X}_{0}$ has assumed to be dense in $\mathcal {X}$ and since density is transitive then (27) is dense in $\mathcal {X}$. This concludes the main portion of the proof.

The final remark follows from the fact that if $X=\mathcal {X}_{0}$ then the identity map $1_{X}:X\rightarrow \mathcal {X}_{0}$ is an isometry and therefore the universal property of B(X) described in Theorem [32, Theorem 3.6] implies that 1_X uniquely extends to a bounded linear isomorphism L between B(X) and $B(\mathcal {X}_{0})$ satisfying

$$ L\circ \delta^{X} = \delta^{\mathcal{X}_{0}}\circ 1_{X} = \delta^{\mathcal{X}_{0}} \text{ and } L^{-1}\circ \delta^{\mathcal{X}_{0}} = \delta^{X}\circ 1_{X}^{-1} = \delta^{X} . $$

Hence L must be the identity on B(X). □

Appendix B: Proof of Applications of Main Results

Lemma 3

Fix some $b \in {{{{\mathbb {R}}}^{m}}}$, and let $\sigma :{\mathbb {R}}\to {\mathbb {R}}$ be a continuous activation function. Then Φ_A,b is a well-defined and continuous linear map from $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ to itself and the following are equivalent:

(i)
For each δ > 0,𝜖 > 0 and each $f,g\in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ there is some $N_{U,V}\in {\mathbb {N}}^{+}$ such that
$$ \left\{{\Phi}^{N_{U,V}}(\tilde{g}): d_{ucc}(\tilde{g},g)<\delta\right\} \cap \left\{ \tilde{f}: d_{ucc}(\tilde{f},f)<\epsilon \right\} \neq \emptyset , $$
(ii)
σ is injective, A is of full-rank, and for every compact subset $K\subseteq [a,b]$ there is some $N_{K}\in {\mathbb {N}}^{+}$ such that
$$ S^{N}(K)\cap K = \emptyset, $$
where S(x) = σ ∙ (Ax + b).

If A is the m × m-identity matrix I_m and b_i > 0 for $i=1,\dots ,m$ then (i) and (ii) are equivalent to

(iii)
σ is injective and has no fixed-points.

If A is the m × m-identity matrix I_m and b_i > 0 for $i=1,\dots ,m$ then (iii) is equivalent to

(iv)
Either σ(x) > x or σ(x) < x for every $x \in {\mathbb {R}}$.

Proof Lemma 3

By [37, Theorem 46.8] the topology of uniform convergence on compacts is the compact-open topology on $C(\mathbb {R}^{m},\mathbb {R}^{n})$ and by [37, Theorem 46.11] composition is a continuous operation in the compact-open topology. Therefore, Φ_A,b is well-defined and continuous map. Its linearity follows from the fact that

$$ {\Phi}_{A,b}(af+g) = (af_{g})\circ S = a(f\circ S) + g\circ S. $$

Since the topology of uniform convergence on compacts is a metric topology, with metric d_ucc, then $\left \{U_{f,{\epsilon }}:f \in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}), {\epsilon } >0\right \}$ defines a base for this topology, where $U_{f,\epsilon }\triangleq \left \{g \in C(\mathbb {R}^{m},\mathbb {R}^{n}): d_{ucc}(f,g)<\epsilon \right \}$. Therefore, Lemma 3 (i) is equivalent to the statement: for each pair of non-empty open subsets $U,V \in C(\mathbb {R}^{m},\mathbb {R}^{n})$ there is some $N_{U,V}\in {\mathbb {N}}^+$ such that $ {\Phi }_{I,b}^{N_{U,V}}(U)\cap V \neq \emptyset . $ Without loss of generality, we prove this formulation instead.

Next, by [78, Corollary 4.1] Φ_A,b satisfies Theorem 1 (ii.b) if and only if $S(x)\triangleq \sigma (Ax+b)$ is injective and for every compact subset $K\subseteq \mathbb {R}^{m}$ there exists some $N_K \in {\mathbb {N}}^+$ such that

$$ S^{N_{K}}(K)\cap K = \emptyset . $$

(28)

Therefore, A must be injective which is only possible if A is of full-rank. This gives the equivalence between (i) and (ii).

We consider the equivalence between (ii) and (iii) in the case where A is the identity matrix and b_i > 0 for $i=1,\dots ,m$. Since $S(x)=(\sigma (x+b_1),\dots ,\sigma (x+b_m))$ it is sufficient to verify condition (28) in the case where m = 1. Since b_i > 0 for $1,\dots ,m$ then it is clear that S is injective and has no fixed points if and only if σ is injective and has no fixed points. We show that S is injective and has no fixed points if and only if (ii) holds. Indeed, note that if S has not fixed points, then since b_i > 0 for $i=1,\dots ,m$ then S has no fixed points if and only if σ no fixed points.

From here, we proceed analogously to the proof of [79, Lemma 4.1]. If S has a fixed-point then for every $N \in {\mathbb {N}}^+$, S^N(x) = {x} which is a non-empty compact subset of ${\mathbb {R}}$. Therefore, (28) cannot hold. Conversely, suppose that S has no fixed points. The intermediate-value theorem and the fact that S has no fixed-points that either S(x) < x or S(x) > x. Mutatis mutandis, we proceed with the first case. Since σ is injective and S has not fixed points then S must be a strictly increasing function; thus S([a,b]) = [S(a),S(b)] for every a < b.

Let K be a non-empty compact subset of ${\mathbb {R}}$. By the Heine-Borel theorem K is closed and bounded, thus it is contained in some [a,b] for a < b. Therefore, it is sufficient to show the results for the case where K = [a,b]. Since S is increasing then for every $n \in {\mathbb {N}}$, the sequence $\{S^n(a)\}_{n \in {\mathbb {N}}}$ satisfies Sⁿ(a) < S^n+ 1(a). If this sequence is not unbounded then there would exist some $a_0 \in {\mathbb {R}}$ such that $a_0= \lim \limits _{n \to \infty } S^n(a)$. Therefore, by the continuity of S we would find that

$$ a_{0} = \lim\limits_{n \to \infty} S^{n}(a) = \lim\limits_{n \to \infty} S^{n+1}(a) = \lim\limits_{n \to \infty} S(S^{n}(a)) = S\left( \lim\limits_{n \to \infty} S^{n}(a) \right) = S(a_{0}), $$

but since S has not fixed points then there cannot exist such an a₀ since otherwise a₀ = S(a₀). Therefore, a₀ does not exist and thus $\{S^n(a)\}_{n \in {\mathbb {N}}}$ is unbounded. Hence, for every a < b there exists some $N_{[a,b]}\in {\mathbb {N}}^+$ such that

$$ S^{N_{[a,b]}}([a,b])\cap [a,b] = \emptyset. $$

Thus, (ii) and (iii) are equivalent when A = I_m.

Next, assume that any of (i) to (iii) hold, that $\mathcal {X}$ is a non-empty subset of $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$, and that $\left ({{\mathscr{F}}},\circlearrowleft \right )$ has the UAP on $\mathcal {X}$. Then for any other non-empty open subset $U\subseteq C(\mathbb {R}^{m},\mathbb {R}^{n})$ there exists some $N_{\mathcal {X},U}\in {\mathbb {N}}$ such that

$$ {\Phi}_{A,b}^{N_{\mathcal{X},U}}[\mathcal{X}] \cap U \neq \emptyset . $$

(29)

Since Φ_A,b is continuous then so is ${\Phi }_{A,b}^N$ and therefore $({\Phi }_{A,b}^{N_{\mathcal {X},U}})^{-1}[U]$ is a non-empty open subset of $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. Since the finite intersection of open sets is again open, then we have that

$$ ({\Phi}_{A,b}^{N_{\mathcal{X},U}})^{-1}\left[ {\Phi}_{A,b}^{N_{\mathcal{X},U}}[\mathcal{X}] \cap U \right] = \mathcal{X} \cap {\Phi}_{A,b}^{N_{\mathcal{X},U}}[U] . $$

(30)

This implies that $\mathcal {X} \cap {\Phi }_{I_m,b}^{N_{\mathcal {X},U}}[U]$ is a non-empty open subset of $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ contained in $\mathcal {X}$. Since $\left ({{\mathscr{F}}},\circlearrowleft \right )$ has te UAP on $\mathcal {X}$, then there exists some $f \in {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap [\mathcal {X} \cap {\Phi }_{A,b}^{N_{\mathcal {X},U}}[U]]$. Thus, ${\Phi }^{N_{\mathcal {X},U}}(f)\in U$ and, by definition, ${\Phi }^{N_{\mathcal {X},U}}(f)\in {\mathcal {NN}}^{\left ({{\mathscr{F}}}\sigma ;deep,\circlearrowleft \sigma ;deep\right )}$.

Thus, for each U in

$$ \left\{ \left\{ g \in C({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}}) d_{ucc}(g ,f)<\epsilon \right\} \right\}_{f \in C({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}}), \epsilon>0} , $$

(31)

there exists some $N_U \in {\mathbb {N}}^+$ and some $f_U \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$ such that ${\Phi }^{N_U}(f_U)\in U$. In particular, since (31) is a base for the topology on $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ and since the intersection of open sets is again open, then every non-empty open subset of U is contained an element of (31) which, in turn, contains an element of the form ${\Phi }^{N_U}(f_U)$. Thus, ${\mathcal {NN}}^{\left ({{\mathscr{F}}}\sigma ;deep,\circlearrowleft \sigma ;deep\right )} \cap U\neq \emptyset $.

Hence, ${\mathcal {NN}}^{\left ({{\mathscr{F}}}\sigma ;deep,\circlearrowleft \sigma ;deep\right )}$ has the UAP on $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. □

Proof of Theorem 5

The equivalence between (i), (ii), and (iv) follows from Lemma 3. The equivalence between (iii) and (iv) follows from the formulation of Birkhoff’s transitivity theorem described in [74, Theorem 2.19]. □

Proof of Proposition 1

Since α₁ < 1 then σ(x) > x for every x < 0. Since 0 < α₂ then σ(0) = 0 < α₂. Lastly, since $\tilde {\sigma }$ is monotone increasing then for every x > 0 we have that

$$ \sigma(x) > x + \alpha_{2} >x. $$

Therefore, σ cannot have a fixed point. Moreover, since $\tilde {\sigma }$ is strictly increasing it must be injective, since if x < y then σ(x) < σ(y) and therefore σ(x)≠σ(y) if x≠y. Hence, σ is injective. Moreover, since the sum of continuous functions is again continuous, then σ is continuous.

Since α₁x + α₂ is affine then it is continuously differentiable. Thus σ is continuously differentiable on any x < 0. Lastly, setting α₂ not equal to $\tilde {\sigma }'(0)-1$ ensure that σ is not differentiable at 0 and therefore it cannot be polynomial. In particular, it cannot be affine. □

For convenience, we denote the collection of set-functions from ${{{{\mathbb {R}}}^{m}}}$ to ${{{{\mathbb {R}}}^{n}}}$ by $[{{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}]$.

Proof of Corollary 4

Since d_ucc is a metric on $[{{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}]$ and since $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\subseteq [{{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}]$, then the map $F:C(\mathbb {R}^{m},\mathbb {R}^{n})\rightarrow C(\mathbb {R}^{m},\mathbb {R}^{n})$ defined by $F(g)\triangleq d_{ucc}(\tilde {f}_0,g)$ is continuous. Therefore, the set $F^{-1}\left [(-\infty ,\delta )\right ]$ is an open subset of $C(\mathbb {R}^{m},\mathbb {R}^{n})$. In particular, (7) guarantees that it is non-empty. Since σ is non-affine and continuously differentiable at-least at one point with non-zero derivative at that point then [17, Theorem 3.2] applies, whence the set $\mathcal {X}_0$ of continuous functions $h:\mathbb {R}^{m}\rightarrow \mathbb {R}^{n}$ with representation

$$ h(x)= W_{J}\circ \sigma \bullet {\dots} \circ \sigma \bullet W_{1}, $$

where $W_j:{ {{\mathbb {R}}^{d_j} }}\rightarrow { {{\mathbb {R}}^{d_{j+1}} }}$, for $j=1,\dots ,J-1$, are affine and n_m + 2 ≥ d_j if j∉{1,J} and d₁ = m, and d_J = n, is dense in $C({\mathbb {R}^{m}},{\mathbb {R}^{n}})$. Therefore, since $F^{-1}\left [(-\infty ,\delta )\right ]$ is an open subset of $C(\mathbb {R}^{m},\mathbb {R}^{n})$ then $\mathcal {X}_0\cap F^{-1}\left [(-\infty ,\delta )\right ]$ is dense in $F^{-1}\left [(-\infty ,\delta )\right ]$.

Fix some $b \in {{{{\mathbb {R}}}^{m}}}$ with b_i > 0 for $i=1,\dots ,m$. Since σ is continuous, injective, and has no fixed-points then applying Lemma 3 implies that $ \mathcal {X}_1 \triangleq \{{\Phi }_{I_m,b}^N(f): f \in F^{-1}[(-\infty ,\delta )] \cap \mathcal {X}_0, N \in {\mathbb {N}}^+\}, $ is a dense subset of $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. This gives (i). Moreover, by construction, every $g \in \mathcal {X}_1$ admits a representation satisfying (iii) and (iv). Furthermore, since $W_{J}\circ \sigma \bullet {\dots } \circ \sigma \bullet W_1 \in \mathcal {X}_2$ and by construction there exists some $g \in \mathcal {X}_1$ for which $ d_{ucc}\left (W_{J}\circ \sigma \bullet {\dots } \circ \sigma \bullet W_1 ,g \right )<\delta , $; then (ii) holds. □

Proof of Corollary 5

Since each F_n, for $n=1,\dots ,N$, is a continuous function from $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ to $[0,\infty ]$ then each $F_n^{-1}\left [[0,C_n)\right ]$ is an open subset of $C(\mathbb {R}^{m},\mathbb {R}^{n})$. Since the finite intersection of open sets is itself open, then $\cap _{n=1}^N F_n^{-1}\left [[0,C_n)\right ]$ is an open subset of $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. Since there exists some $f_0\in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ satisfying (8) then U is non-empty. Since $\left ({\mathscr{F}},\circlearrowleft \right )$ has the UAP on $C(\mathbb {R}^{m},\mathbb {R}^{n})$ then $\left ({\mathscr{F}},\circlearrowleft \right ) \cap U$ is dense in U.

Fix $b \in {{{{\mathbb {R}}}^{m}}}$ with b_i > 0 for $i=1,\dots ,m$ and set A = I_m.

Since σ is a transitive activation function then Corollary 1 applies and therefore the set $ \left \{{\Phi }^N_{I_m,b}(f): f \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )} \cap U\right \} $ is dense in $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. Therefore (i)-(iv) hold. □

Proof of Corollary 2

Let S(x) = σ ∙ (x + b) and let $B\triangleq \left \{x \in {{{{\mathbb {R}}}^{m}}}: \sigma (x)>x\right \}$. By hypothesis B is Borel and μ(B) > 0. For each $i=1,\dots ,m$ we compute σ ∙ (x_i + b_i) > x_i + b_i ≥ x_i. Therefore, for μ-a.e. every x ∈ B, $N \in {\mathbb {N}}$ and each $i=1,\dots ,m$

$$ S^{N}(x)_{i} \geq x_{i} + Nb_{i}. $$

Since b_i > 0 then $\lim \limits _{N \to \infty } S^N(x)=\infty $. Therefore, the condition [80, Corollary 1.3 (C2)] is met, and by the discussion following the result on [80, page 127], condition [80, Corollary 1.3 (C1)] holds; i.e.: for every non-empty open subset $U,V\subseteq L^1_{\mu }(\mathbb {R}^{m},\mathbb {R}^{n})$ there exists some $N_{U,V}\in {\mathbb {N}}$ such that

$$ {\Phi}_{I_{m},b}^{N_{U,V}}(U)\cap V \neq \emptyset . $$

(32)

By Lemma 1, the map ${\Phi }_{I_m,b}$ and therefore the map ${\Phi }_{I_m,b}^{N_{U,V}}$ is continuous. Thus, $({\Phi }_{I_m,b}^{N_{U,V}})^{-1}[V]$ is a non-empty open subset of $L^1_{\mu }(\mathbb {R}^{m},\mathbb {R}^{n})$ and therefore $U \cap ({\Phi }_{I_m,b}^{N_{\mathcal {X},U}})^{-1}[V]$ is a non-empty open subset of U. Taking $U=\operatorname {Ball}_{L^1_{\mu }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})}(g,\delta )$ and $V=\operatorname {Ball}_{L^1_{\mu }(\mathbb {R}^{m},\mathbb {R}^{n})}(f,\epsilon )$ we obtain the conclusion. □

Proof of Corollary 3

By Proposition 1 and the observation in its proof that σ(x) > x we only need to verify that σ is Borel bi-measurable. Indeed, since σ is continuous and injective then by [81, Proposition 2.1], σ^− 1 exists and is continuous on the image of σ. Since σ was assumed to be surjective then σ^− 1 exists on all of ${\mathbb {R}}$ and is continuous thereon. Hence, σ^− 1 and σ are measurable since any continuous function is measurable. □

Proof of Theorem 6

Fix A = I_m and $b\in {{{{\mathbb {R}}}^{m}}}$ with b_i > 0 for $i=1,\dots ,m$. Since $int({\textup {co}\left (A\right ){{{\mathscr{F}}}}})$ is a non-empty open set then there exists some $f \in int({\text {co}({{\mathscr{F}}})})$ and some δ > 0 for which

$$ \text{Ball}_{L^{1}_{\mu}({{{{\mathbb{R}}}^{m}}})}(f,\delta)\triangleq \left\{g \in L^{1}_{\mu}({{{{\mathbb{R}}}^{m}}}): {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \|f(x)-g(x)\|d\mu(x)<\delta\right\} $$

is an open subset of $int({\textup {co}\left (A\right ){{{\mathscr{F}}}}})$. Since $\textup {co}\left (A\right ){{{\mathscr{F}}}}\cap \operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})$ is dense in $\operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})$ then its intersection with any non-empty open subset thereof is also dense; in particular, $\text {co}({{\mathscr{F}}})\cap \operatorname {Ball}_{L^1_{\mu }(\mathbb {R}^{m})}(f,\delta )$ is dense in $\operatorname {Ball}_{L^1_{\mu }(\mathbb {R}^{m})}(f,\delta )$. Since σ is L¹-transitive then (iii) follows from Corollary 2.

Since $L^1_{\mu }$ is a metric space then $\left \{\operatorname {Ball}_{L^1_{\mu }({{{{\mathbb {R}}}^{m}}})}(g,\delta ): g \in L^1_{\mu }({{{{\mathbb {R}}}^{m}}}), \delta >0\right \}$ is a base for the topology thereon. Therefore, Corollary 2 implies that for any two non-empty open subsets $U,V \in L^1_{\mu }(\mathbb {R}^{m})$ there exists some $N_{U,V}\in {\mathbb {N}}$ satisfying ${\Phi }^{N_{U,V}}_{I_m,b}(U)\cap V \neq \emptyset $. Hence, ${\Phi }_{I_m,b}$ is topologically transitive on $L^1_{\mu }(\mathbb {R}^{m})$, in the sense of [74, Definition 1.38]. Moreover, since ${\Phi }_{I_m,b}$ is a continuous linear map then Birkhoff’s transitivity theorem, as formulated in [74, Theorem 2.19], applies and therefore ${\Phi }_{I_m,b}$ is a hypercylic operator on $L^1_{\mu }(\mathbb {R}^{m})$. Therefore, [74, Proposition 5.8] implies that $\|{\Phi }_{I_m,b}\|_{op}>1$. Setting $\kappa \triangleq \|{\Phi }_{I_m,b}\|_{op}$ yields (ii).

It remains to show the approximation bound of described by (i). Fix $f \in L^1_{\mu }({{{{\mathbb {R}}}^{m}}})$. Since $L^1_{\mu }({{{{\mathbb {R}}}^{m}}})$ is a Banach space then it has no isolated points and since ${\Phi }_{I_m,b}$ is a hypercylic operator then Birkhoff’s transitivity theorem, as formulated in [74, Theorem 2.19], implies that there exists a dense G_δ-subset $HC({\Phi }_{I_m,b})\subseteq L^1_{\mu }(\mathbb {R}^{m})$ such that for every $g \in HC({\Phi }_{I_m,b})$ the set $\{{\Phi }^N_{I_m,b}(g)\}_{N \in {\mathbb {N}}}$ is dense in $L^1_{\mu }(\mathbb {R}^{m})$. Therefore, every non-empty open subset of $L^1_{\mu }(\mathbb {R}^{m})$ contains some element of $HC({\Phi }_{I_m,b})$. In particular, there is some $g \in HC({\Phi }_{I_m,b})\cap \operatorname {int}(\text {co}({{\mathscr{F}}}))$ since $\operatorname {int}(\text {co}({{\mathscr{F}}}))$ is a non-empty open subset of $L^1_{\mu }(\mathbb {R}^{m})$.

Since $\textup {co}\left (A\right ){{{\mathscr{F}}}}\cap \operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})$ is dense in $\operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})$ then, in particular, $g \in \overline {\operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})}$. Therefore, the conditions of [69, Theorem 2] and [69, Equation (23)] are met, hence, for each $n \in {\mathbb {N}}^+$ the following approximation bound holds

$$ \inf_{f_{i} \in {\mathscr{F}}, {\sum}_{i=1}^{n} \alpha_{i}=1, \alpha_{i} \in [0,1]} {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| \sum\limits_{i=1}^{n} \alpha_{i} f_{i}(x)-g(x) \right\|d\mu(x) \leq \frac{\sqrt{2\mu({{{{\mathbb{R}}}^{d}}})}}{\sqrt{n}} , $$

(33)

Since $\{{\Phi }^N_{I_m,b}(g)\}_{N \in {\mathbb {N}}}$ is dense in $L^1_{\mu }({{{{\mathbb {R}}}^{m}}})$ then there exists some $N \in {\mathbb {N}}$ for which ${\Phi }^N_{I_m,b}(g) \in \operatorname {Ball}_{L^1_{\mu }({{{{\mathbb {R}}}^{m}}})}\left (f,\frac 1{\sqrt {n}}\right )$. Thus, the following bound holds

$$ {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \|f(x)-{\Phi}^{N}_{I_{m},b}(g)(x)\|d\mu(x) \leq \frac1{\sqrt{n}} , $$

(34)

Since ${\Phi }_{I_m,b}$ is a continuous linear map from the Banach space $L^1_{\mu }({{{{\mathbb {R}}}^{m}}})$ to itself then it is Lipschitz with constant $\|{\Phi }_{I_m,b}\|_{op}$, where ∥⋅∥_op denotes the operator norm, and by [64, Corollary 2.1.2] we have

$$ \|{\Phi}_{I_{m},b}\|^{N}_{op}= \left\| \frac{d(\sigma \bullet(\cdot + b))_{\#}\mu}{d\mu_{M}} \right\|_{\infty}^{N} . $$

(35)

Moreover, by Lemma 1, we know that the right-hand side of (35) is finite. Therefore (34) implies that for every $f_1,\dots ,f_n \in {{\mathscr{F}}}$, $\alpha _1,\dots ,\alpha _n\in [0,1]$ with ${\sum }_{i=1}^n \alpha _i=1$, the following holds

$$ \begin{aligned} & {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| {\Phi}_{I_{m},b}^{N}\left( \sum\limits_{i=1}^{n} \alpha_{i} f_{i}\right)(x) - f(x) \right\|d\mu(x) \\ \leq & {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| {\Phi}^{N}_{I_{m},b}\left(\sum\limits_{i=1}^{n} \alpha_{i} f_{i}\right)(x) - {\Phi}^{N}_{I_{m},b}\left( g\right)(x) \right\|d\mu(x) \\ & + {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| f(x) - {\Phi}^{N}_{I_{m},b}\left( g\right)(x) \right\|d\mu(x) \\ &\leq \left\| {\Phi}^{N}_{I_{m},b}\right\|_{op} \left( {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| \sum\limits_{i=1}^{n} \alpha_{i} f_{i}(x) - g(x) \right\|d\mu(x) \right) \\ & + {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| {\Phi}^{N}_{I_{m},b}\left( g\right)(x) - f(x) \right\|d\mu(x) \\ & \leq \left\| \frac{d(\sigma\bullet(\cdot + b))_{\#}\mu}{d\mu_{M}} \right\|_{\infty}^{N} \left( {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| \sum\limits_{i=1}^{n} \alpha_{i} f_{i}(x) - g(x) \right\|d\mu(x) \right) + \frac1{\sqrt{n}} . \end{aligned} $$

(36)

Combining the estimates (33)–(36) we obtain

$$ \begin{aligned} \inf_{f_{i} \in {\mathscr{F}}, {\sum}_{i=1}^{n} \alpha_{i}=1, \alpha_{i} \in [0,1]} & {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| {\Phi}^{N}_{I_{m},b}\left( \sum\limits_{i=1}^{n} \alpha_{i} f_{i}\right)(x) - f(x) \right\|d\mu(x) \\ \leq & \left\| \frac{d(\sigma\bullet(\cdot +b))_{\#}\mu}{d\mu_{M}} \right\|_{\infty}^{N} \left( {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| \sum\limits_{i=1}^{n} \alpha_{i} f_{i}(x) - g(x) \right\|d\mu(x) \right) + \frac1{\sqrt{n}} \\ \leq & \left\| \frac{d(\sigma \bullet (\cdot +b))_{\#}\mu}{d\mu_{M}} \right\|_{\infty}^{N} \frac{\sqrt{2\mu({{{{\mathbb{R}}}^{d}}})}}{\sqrt{n}} + \frac1{\sqrt{n}} \\ =& \frac1{\sqrt{n}}\left( 1 + \sqrt{2\mu({{{{\mathbb{R}}}^{m}}})} \right) . \end{aligned} $$

(37)

Since ${\Phi }^N_{I_m,b}$ is linear, then the right-hand side of (37) reduces and we obtain the following estimate

$$ \begin{aligned} \inf_{f_{i} \in {\mathscr{F}}, {\sum}_{i=1}^{n} \alpha_{i}=1, \alpha_{i} \in [0,1]} & {\int}_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| \sum\limits_{i=1}^{n} \alpha_{i} {\Phi}^{N}_{I_{m},b}\left( f_{i}\right) (x) - f(x) \right\|d\mu(x) \!\leq\! \frac1{\sqrt{n}}\left( \! 1 + \sqrt{2\mu({{{{\mathbb{R}}}^{m}}})} \right) . \end{aligned} $$

(38)

Therefore, the estimate in (i) holds. □

For the statement of the next lemma concerns the Banach space of functions vanishing at infinity. Denoted by $C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$, this is the set of continuous functions f from $\mathbb {R}^{m}$ to $\mathbb {R}^{n}$ such that, given any 𝜖 > 0 there exists some compact subset $K_{\epsilon }\subseteq \mathbb {R}^{m}$ for which $ \sup _{x \in K_{\epsilon }}\|f(x)\|<\epsilon . $ As discussed in [82, VII], $C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ is made into a Banach space by equipping with the supremum norm $\|f\|_{\infty }\triangleq \sup _{x \in {{{{\mathbb {R}}}^{m}}}} \|f(x)\|$.

Lemma 4 (Uniform Approximation of Functions Vanishing at Infinity)

Suppose that $\left ({{\mathscr{F}}},\circlearrowleft \right )$ is a universal approximator on $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$, then for every $f\in C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ and every 𝜖 > 0 there exists $g_{\epsilon }\in C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ with representation

$$ f_{\epsilon}(\cdot) = \left( g_{\epsilon} e^{-\frac{b}{b - \|\cdot\|^{2}}} + a\right)I_{\|\cdot\|< b} + \left( ae^{- \left| g_{\epsilon}(\cdot) \right|(\|x\|-b)}\right)I_{\|\cdot\|\geq b} , $$

(39)

the absolute value $\left |\cdot \right |$ is applied component-wise, $g_{\epsilon }\in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}$, and a,b > 0, and satisfying the uniform approximation bound

$$ \left\| f - f_{\epsilon} \right\|_{\infty} <\epsilon . $$

Proof of Lemma 4

Let $\left ({{\mathscr{F}}},\circlearrowleft \right )$ be a universal approximator on $C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$, let $f \in C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$, and 𝜖 > 0. Since f vanishes at infinity then there exists some non-empty compact $K_{\epsilon ,f}\subseteq \mathbb {R}^{m}$ for which ∥f(x)∥≤ 𝜖2^− 1 for every x∉K_𝜖,f. By the Heine-Borel theorem K_𝜖,f is bounded and therefore there exists some b^⋆ > 0 such that $K_{\epsilon ,f}\subseteq \operatorname {Ball}_{\mathbb {R}^{m}}(0,b^{\star })\triangleq \left \{ x \in \mathbb {R}^{m}: \|x\|< b^{\star } \right \}$. Therefore,

$$ \sup_{x \in {{{{\mathbb{R}}}^{m}}} - \text{Ball}_{{{{{\mathbb{R}}}^{m}}}}(0,b^{\star})} \left\| f(x) \right\| <\epsilon 2^{-1} . $$

(40)

Since the bump function $x\mapsto e^{-1\frac {1}{1-x^2}}I_{|x|<1}$ is continuous, affine functions are continuous, $f\in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$, and the composition and multiplication of continuous functions is again continuous then the function $x\mapsto \left [f(x)-\epsilon 2^{-1}\right ]e^{\frac {b^{\star }}{b^{\star }-\|x\|^2}}I_{\|x\|<b^{\star }}$ is itself continuous. Observe also that the set $\overline {\text{Ball}(0,\text{b}^{\star})}= \left \{x \in \mathbb{R}^{m}: \|x\|\leq b^{\star }\right \}$ is closed and bounded, thus it is compact by the Heine-Borel theorem. Since $\left ({\mathscr{F}},\circlearrowleft \right )$ is a universal approximator on $C(\mathbb {R}^{m},\mathbb {R}^{n})$ for the topology of uniform convergence on compacts then there exists some $g_{\epsilon }\in {\mathcal{N}\mathcal{N}}^{\left ({\mathscr{F}},\circlearrowleft \right )}$ satisfying

$$ \sup_{x \in \overline{\text{Ball}(0,\text{b}^{\star}})} \left\| g_{\epsilon}(x) - \left[f(x)-\epsilon 2^{-1}\right]e^{\frac{b^{\star}}{b^{\star}-\|x\|^{2}}}I_{\|x\|<b^{\star}} \right\| <\epsilon 2^{-1} . $$

(41)

Since $0\leq e^{-\frac {b^{\star }}{b^{\star }-\|x\|^2}} \leq 1$ for every $x \in {{{{\mathbb {R}}}^{m}}}$, then from (41) we compute

$$ \begin{aligned} & \sup_{x \in {\text{Ball}(0,{\text{b}^{\star}})}} \left\| g_{\epsilon}(x)e^{-\frac{b^{\star}}{b^{\star} - \|x\|^{2}}} I_{\|x\|<b^{\star}} + \epsilon 2^{-1} I_{\|x\|<b^{\star}} - f(x) \right\|\\ \leq & \sup_{x \in \overline{{\text{Ball}(0,{\text{b}^{\star}})}} } \left\| g_{\epsilon}(x)e^{-\frac{b^{\star}}{b^{\star} - \|x\|^{2}}} + \epsilon 2^{-1} - f(x) \right\|\\ \leq & \sup_{x \in \overline{{\text{Ball}(0,{\text{b}^{\star}})} } } \left\| g_{\epsilon}(x) e^{-\frac{b^{\star}}{b^{\star}-\|x\|^{2}}} + \left( f(x) - \epsilon 2^{-1}\right)e^{\frac{b^{\star}}{b^{\star}-\|x\|^{2}}}e^{-\frac{b^{\star}}{b^{\star}-\|x\|^{2}}} \right\|\\ \leq & \sup_{x \in \overline{{\text{Ball}(0,{\text{b}^{\star}})} } } e^{-\frac{b^{\star}}{b^{\star}-\|x\|^{2}}} \left\| g_{\epsilon}(x) + \left( f(x) - \epsilon 2^{-1}\right)e^{\frac{b^{\star}}{b^{\star}-\|x\|^{2}}} \right\|\\ \leq & \sup_{x \in \overline{{\text{Ball}(0,{\text{b}^{\star}})} } } \left\| g_{\epsilon}(x) + \left( f(x) - \epsilon 2^{-1}\right)e^{\frac{b^{\star}}{b^{\star}-\|x\|^{2}}} \right\| \\ \leq & \frac{\epsilon}{2} . \end{aligned} $$

(42)

Observe that, for every $x \in {{{{\mathbb {R}}}^{m}}}-\overline {\operatorname {Ball(0,b^{\star })}}$ we have ∥x∥− b^⋆ ≥ 0, −|g_𝜖(x)|≤ 0 and therefore

$$ 0 \leq \epsilon 2^{-1} e^{-|g_{\epsilon}(x)| (\|x\|-b^{\star})} \leq \epsilon . $$

(43)

Combining (40), (432), and (43) we compute the following bound

$$ \begin{aligned} &\sup_{x \in {{{{\mathbb{R}}}^{m}}}} \left\| \left( g_{\epsilon}(x) e^{-\frac{b^{\star}}{b^{\star} - \|x\|^{2}}} +\epsilon 2^{-1}\right)I_{\|x\|<b^{\star}} +\epsilon 2^{-1} e^{-|g_{\epsilon}(x)|(\|x\|-b)}I_{\|x\|\geq b^{\star}} - f(x) \right\| \\ \leq & \max \left\{ \sup_{x \in {\text{Ball}(0,{\text{b}^{\star}})} } \left\| g_{\epsilon}(x)e^{-\frac{b^{\star}}{b^{\star} - \|x\|^{2}}} I_{\|x\|<b^{\star}} + \epsilon 2^{-1} e^{-|g_{\epsilon}(x)|(\|x\|-b)}I_{\|x\|<b^{\star}} - f(x) \right\| \right. ,\\ & \left. \sup_{x \in {{{{\mathbb{R}}}^{m}}}-{\text{Ball}(0,{\text{b}^{\star}})} } \left\| g_{\epsilon}(x)e^{-\frac{b^{\star}}{b^{\star} - \|x\|^{2}}} I_{\|x\|<b^{\star}} + \epsilon 2^{-1} e^{-|g_{\epsilon}(x)|(\|x\|-b)}I_{\|x\|<b^{\star}} - f(x) \right\| \right\} \\ \leq & \max \left\{ \epsilon , \sup_{x \in {{{{\mathbb{R}}}^{m}}}-{\text{Ball}(0,{\text{b}^{\star}})} } \left\| g_{\epsilon}(x)e^{-\frac{b^{\star}}{b^{\star} - \|x\|^{2}}} I_{\|x\|<b^{\star}} + \epsilon 2^{-1} e^{-|g_{\epsilon}(x)|(\|x\|-b)}I_{\|x\|<b^{\star}} - f(x) \right\| \right\} \\ = & \max \left\{ \epsilon , \sup_{x \in {{{{\mathbb{R}}}^{m}}}-{\text{Ball}(0,{\text{b}^{\star}})} } \left\| \epsilon 2^{-1} e^{-|g_{\epsilon}(x)|(\|x\|-b)}I_{\|x\|<b^{\star}} - f(x) \right\| \right\} \\ \leq & \max \left\{ \epsilon , \sup_{x \in {{{{\mathbb{R}}}^{m}}}-{\text{Ball}(0,{\text{b}^{\star}})} } \left\| \epsilon 2^{-1} e^{-|g_{\epsilon}(x)|(\|x\|-b)} \right\| + \sup_{x \in {{{{\mathbb{R}}}^{m}}}-{\text{Ball}(0,{\text{b}^{\star}})} } \left\| f(x) \right\| \right\}\\ = & \max\{\epsilon, \epsilon 2^{-1} + \epsilon 2^{-1}\} = \epsilon . \end{aligned} $$

(44)

Thus, the result holds. □

Proof of Theorem 6

For each ω ∈Ω, define the map ${\Phi }_{\omega }:C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\rightarrow C_{\omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ by ${\Phi }_{\omega }(f)\triangleq \left (\omega (\|\cdot \|)+1\right )f$. For each $f,g \in C_0(\mathbb {R}^{m},\mathbb {R}^{n})$ we compute

$$ \begin{aligned} \left\| {\Phi}_{\omega}(f) - {\Phi}_{\omega}(g) \right\|_{\omega,\infty} = & \sup_{x \in {{{{\mathbb{R}}}^{m}}}} \frac{ \left\| {\Phi}_{\omega}(f) - {\Phi}_{\omega}(g) \right\| }{ \omega(\|\cdot\|)+1 } \\ = & \sup_{x \in {{{{\mathbb{R}}}^{m}}}} \frac{ \left\| \left( \omega(\|\cdot\|)+1\right) f(x) - \left( \omega(\|\cdot\|)+1\right) g(x) \right\| }{ \omega(\|\cdot\|)+1 } \\ = & \sup_{x \in {{{{\mathbb{R}}}^{m}}}} \frac{ \left( \omega(\|\cdot\|)+1\right) \left\| f(x) - g(x) \right\| }{ \omega(\|\cdot\|)+1 } \\ = & \|f-g\|_{\infty} . \end{aligned} $$

(45)

Therefore, for each ω ∈Ω, the map Φ_ω is an isometry. For each ω ∈Ω, define the map ${\Psi }_{\omega }:C_{\omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\rightarrow C_0(\mathbb {R}^{m},{\mathbb {R}})$ by ${\Psi }_{\omega }(\tilde {f})\triangleq \frac 1{\omega (\|\cdot \|)+1} \tilde {f}$. For each $\tilde {f}\in C_{\omega }(\mathbb {R}^{m},\mathbb {R}^{n})$ and compute

$$ \begin{aligned} {\Phi}_{\omega}\circ {\Psi}_{\omega}(\tilde{f}) = & {\Phi}_{\omega} \left( \frac1{\omega(\|\cdot\|)+1} \tilde{f} \right) = & \left( \omega(\|\cdot\|)+1\right)\frac1{\omega(\|\cdot\|)+1} \tilde{f} = & \tilde{f}. \end{aligned} $$

(46)

Hence, Ψ_ω is a right-inverse of Φ_ω. Since every isometry is a homeomorphism onto its image and since Φ_ω is surjective isometry then Φ_ω defines a homeomorphism from $C_0(\mathbb {R}^{m},\mathbb {R}^{n})$ onto $C_{\omega }(\mathbb {R}^{m},\mathbb {R}^{n})$. In particular, ${\Phi }_{\omega }\left (C_0(\mathbb {R}^{m},\mathbb {R}^{n})\right )=C_{\omega }(\mathbb {R}^{m},\mathbb {R}^{n})$. Therefore,

$$ C_{\Omega}({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}}) = \bigcup_{\omega \in {\Omega}} C_{\omega}({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}}) = \bigcup_{\omega \in {\Omega}} {\Phi}_{\omega}\left( C_{0}({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}})\right)=C_{\omega}({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}}). $$

Hence, condition (5) holds.

Since it was assumed that $\sup _{x \in {{{{\mathbb {R}}}^{m}}}} \|f(x)\|e^{-\|x\|}<\infty $ holds, then Lemma 4 applies, whence,

$$\left\{ \left( fe^{-\frac{b}{b - \|\cdot\|^{2}}} + a\right)I_{\|\cdot\|< b} + \left( ae^{- \left| f(\cdot) \right|(\|x\|-b)}\right)I_{\|\cdot\|\geq b}: 0<b,a, f \in {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)} \right\} $$

is dense in $C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. Therefore, the conditions for Theorem 2 are met. Hence,

$$ \bigcup\limits_{\omega \in {\Omega}} {\Phi}_{\omega}\left( \! \left\{ \left( \!fe^{-\frac{b}{b - \|\cdot\|^{2}}} + a\right)I_{\|\cdot\|< b} + \left( ae^{- \left| f(\cdot) \right|(\|x\|-b)}\right)I_{\|\cdot\|\geq b}: 0\!<\!b,a, f \!\in\! {\mathcal{NN}}^{\left( {\mathscr{F}},\circlearrowleft\right)} \right\} \right) $$

(47)

is dense in $C_{\Omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. By definition, (47) is a subset of ${\mathcal {NN}}^{\left ({{\mathscr{F}}}{\Omega },\circlearrowleft {\Omega }\right )}$ and therefore ${\mathcal {NN}}^{\left ({{\mathscr{F}}}{\Omega },\circlearrowleft {\Omega }\right )}$ is dense in $C_{\Omega }(\mathbb {R}^{m},\mathbb {R}^{n})$. Hence, $\left ({{\mathscr{F}}}{\Omega },\circlearrowleft {\Omega }\right )$ is a universal approximator on $C_{\Omega }(\mathbb {R}^{m},\mathbb {R}^{n})$. □

Proof of Proposition 2

For each $k,m\in {\mathbb {N}}$ with n ≤ m, we have that $\exp (-k t)>\exp (-mt)$ for every $t \in [0,\infty )$. Thus,

$$ C_{\exp(-k \cdot)}({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}})\subseteq C_{\exp(-m \cdot)}({{{{\mathbb{R}}}^{m}}},{{{{\mathbb{R}}}^{n}}}) , $$

(48)

and the inclusion is strict if n < m. Moreover, for n ≤ m, the inclusion of each $i^k_m:C_{\exp (-n \cdot )}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ into $C_{\exp (-m \cdot )}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ is continuous. Thus, $\left \{C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n}),i^k_m\right \}_{n \in {\mathbb {N}}}$ is a strict inductive system of Banach spaces. Therefore, by [83, Proposition 4.5.1] there exists a finest topology on $\bigcup _{k \in {\mathbb {N}}} C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})$ both making it into a locally-convex space and ensuring that each $C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})$ is a subspace. Denote $\bigcup _{k \in {\mathbb {N}}} C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})$ equipped with this topology by $C_{\Omega }^{LCS}(\mathbb {R}^{m},\mathbb {R}^{n})$.

If $f \in C_{\Omega }^{LCS}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ then by construction there must exist some $K \in {\mathbb {N}}$ such that $f \in C_{\exp (-K\cdot )}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. By [84, Propositions 2 and 4], a sequence $\{f_t\}_{t \in {\mathbb {N}}}$ converges to some f if and only if there exists some $K \in {\mathbb {N}}$ and some $N_K \in {\mathbb {N}}^+$ such that for every t ≥ N_K every $f_t \in C_{\exp (-K\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})$ and the sub-sequence $\{f_t\}_{t\geq N_K}$ converges in the Banach topology of $C_{\exp (-K\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})$ to f. In particular, since $C_{\exp (-0\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})=C_0(\mathbb {R}^{m},\mathbb {R}^{n})$ then the function $f(x)\triangleq (\exp (-|x|),\dots ,\exp (-|x|)) \in C_{\exp (-0 \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})$. Since each $f \in {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}$ is either constant of $\sup _{x \in \mathbb {R}^{m}} \|f(x)\|=\infty $ then for any sequence $\{f_t\}_{t \in {\mathbb {N}}} \in {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}$ there exists some $N_0 \in {\mathbb {N}}^+$ for which the sub-sequence $\{f_t\}_{t \geq N_0}$ lies in $C_{\exp (-0\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})=C_0(\mathbb {R}^{m},\mathbb {R}^{n})$ if and only if for each t ≥ N₀ the map f_t is constant. Therefore, for each t ≥ N₀ we compute that

$$ \|f - f_{t}\|_{\exp(0\cdot ), \infty} = \|f - f_{t}\|_{\infty} \geq \inf_{c \in {{{{\mathbb{R}}}^{m}}}} \sup_{x \in {{{{\mathbb{R}}}^{m}}}} |\exp(-|x|)- c| > \frac1{2}. $$

Hence, f_t cannot converge to f in $C_{\Omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$ and therefore $\left ({{\mathscr{F}}},\circlearrowleft \right )$ does not have the UAP on $C_{\Omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})$. □

Proof of Corollary 7

Let $X\triangleq {\mathbb {R}}$ and $\mathcal {X}_0\triangleq \mathcal {X}\triangleq L^{\infty }({\mathbb {R}})$. Since every Banach space is a pointed metric space with reference-point its zero vector and since ${\mathbb {R}}$ is separable then Theorem 4 applies. We only need to verify the form of η and of ρ. Indeed, the identification of $B({\mathbb {R}})$ with $L^1({\mathbb {R}})$ and explicit description of η is constructed in [32, Example 3.11]. The fact that $L^{\infty }({\mathbb {R}})$ is barycentric follows from the fact that it is a Banach space and by [31, Lemma 2.4]. □

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kratsios, A. The Universal Approximation Property. Ann Math Artif Intell 89, 435–469 (2021). https://doi.org/10.1007/s10472-020-09723-1

Download citation

Accepted: 27 November 2020
Published: 22 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10472-020-09723-1

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Universal Approximation Property

Abstract

Article PDF

Similar content being viewed by others

Siamese Neural Networks: An Overview

Fundamentals of Artificial Neural Networks and Deep Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Proofs of Main Results

Lemma 2 (Characterization of the Universal Approximation Property)

Proof of Lemma 2

Proof of Theorem 2

Proof of Theorem 3

Remark 1

Proof of Theorem 4

Appendix B: Proof of Applications of Main Results

Lemma 3

Proof Lemma 3

Proof of Theorem 5

Proof of Proposition 1

Proof of Corollary 4

Proof of Corollary 5

Proof of Corollary 2

Proof of Corollary 3

Proof of Theorem 6

Lemma 4 (Uniform Approximation of Functions Vanishing at Infinity)

Proof of Lemma 4

Proof of Theorem 6

Proof of Proposition 2

Proof of Corollary 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation