Abstract
The universal approximation property of various machine learning models is currently only understood on a case-by-case basis, limiting the rapid development of new theoretically justified neural network architectures and blurring our understanding of our current models’ potential. This paper works towards overcoming these challenges by presenting a characterization, a representation, a construction method, and an existence result, each of which applies to any universal approximator on most function spaces of practical interest. Our characterization result is used to describe which activation functions allow the feed-forward architecture to maintain its universal approximation capabilities when multiple constraints are imposed on its final layers and its remaining layers are only sparsely connected. These include a rescaled and shifted Leaky ReLU activation function but not the ReLU activation function. Our construction and representation result is used to exhibit a simple modification of the feed-forward architecture, which can approximate any continuous function with non-pathological growth, uniformly on the entire Euclidean input space. This improves the known capabilities of the feed-forward architecture.
Article PDF
Similar content being viewed by others
References
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psych. Rev. 65(6), 386 (1958)
Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Kolmogorov, A.N.: On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953–956 (1957)
Webb, S.: Deep learning for biology. Nature 554(7693) (2018)
Eraslan, G., Avsec, Z., Gagneur, J., Theis, F.J.: Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20(7), 389–403 (2019)
Plis, S.M.: Deep learning for neuroimaging: a validation study. Front. Neurosci. 8, 229 (2014)
Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Trans. Intell. Syst. Technol. 11(3) (2020)
Buehler, H., Gonon, L., Teichmann, J., Wood, B.: Deep hedging. Quant. Finance 19(8), 1271–1291 (2019)
Becker, S., Cheridito, P., Jentzen, A.: Deep optimal stopping. J. Mach. Learn. Res. 20, Paper No. 74, 25 (2019)
Cuchiero, C., Khosrawi, W., Teichmann, J.: A generative adversarial network approach to calibration of local stochastic volatility models. Risks 8(4), 101 (2020)
Kratsios, A., Hyndman, C.: Deep arbitrage-free learning in a generalized HJM framework via arbitrage-regularization. Risks 8(2), 40 (2020)
Horvath, B., Muguruza, A., Tomas, M.: Deep learning volatility: a deep neural network perspective on pricing and calibration in (rough) volatility models. Quant. Finance 0(0), 1–17 (2020)
Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993)
Kidger, P., Lyons, T. In: Abernethy, J, Agarwal, S (eds.) : Universal Approximation with Deep Narrow Networks, vol. 125, pp 2306–2327. PMLR, USA (2020)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Park, S., Yun, C., Lee, J., Shin, J.: Minimum width for universal approximation. ICLR (2021)
Hanin, B.: Universal function approximation by deep neural nets with bounded width and relu activations. Math. - MDPI 7(10) (2019)
Lu, Z., Pu, H., Wang, F., Hu, Z., Wang, L.: The expressive power of neural networks: A view from the width. In: Advances in Neural Information Processing Systems, vol. 30, pp 6231–6239. Curran Associates, Inc. (2017)
Fletcher, P.T., Venkatasubramanian, S., Joshi, S.: The geometric median on riemannian manifolds with application to robust atlas estimation. Neuroimage 45(1), S143–S152 (2009). Mathematics in Brain Imaging
Keller-Ressel, M., Nargang, S.: Hydra: a method for strain-minimizing hyperbolic embedding of network- and distance-based data. J. Complex Netw. 8(1), cnaa002, 18 (2020)
Ganea, O., Becigneul, G., Hofmann, T.: Hyperbolic neural networks. In: Bengio, S, Wallach, H, Larochelle, H, Grauman, K, Cesa-Bianchi, N, Garnett, R (eds.) Advances in Neural Information Processing Systems, vol. 31, pp 5345–5355. Curran Associates, Inc. (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp 7354–7363. PMLR (2019)
Arens, R.F., Eells, J.: On embedding uniform and topological spaces. Pacific J. Math. 6, 397–403 (1956)
von Luxburg, U., Bousquet, O.: Distance-based classification with Lipschitz functions. J. Mach. Learn. Res. 5, 669–695 (2003/04)
Ambrosio, L., Puglisi, D.: Linear extension operators between spaces of Lipschitz maps and optimal transport. J. Reine Angew. Math. 764, 1–21 (2020)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks, pp. 214–223. PMLR, International Convention Centre, Sydney, Australia (2017)
Xu, T., Le, W., Munn, M., Acciaio, B.: Cot-gan: Generating sequential data via causal optimal transport. Advances in Neural Information Processing Systems 33 (2020)
Godefroy, G., Kalton, N.J.: Lipschitz-free Banach spaces. pp. 121–141. Dedicated to Professor Aleksander Pełczyński on the occasion of his 70th birthday (2003)
Weaver, N.: Lipschitz algebras. World Scientific Publishing Co. Pte. Ltd., Hackensack (2018)
Godefroy, G.: A survey on Lipschitz-free Banach spaces. Comment. Math. 55(2), 89–118 (2015)
Jost, J.: Riemannian Geometry and Geometric Analysis, 6th edn. Universitext, Springer, Heidelberg (2011)
Basso, G.: Extending and improving conical bicombings. preprint 2005.13941 (2020)
Nagata, J-: Modern general topology, revised. North-Holland Publishing Co., Amsterdam (1974). Wolters-Noordhoff Publishing, Groningen; American Elsevier Publishing Co., New York (1974). Bibliotheca Mathematica, Vol. VII
Munkres, J.R.: Topology. Prentice Hall, Inc., Upper Saddle River (2000). 2
Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)
Kontorovich, L., Nadler, B.: Universal kernel-based learning with applications to regular languages. J. Mach. Learn. Res. 10, 1095–1129 (2009)
Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 9, 1615–1646 (2008)
Grigoryeva, L., Ortega, J-P: Differentiable reservoir computing. J. Mach. Learn. Res. 20, Paper No. 179, 62 (2019)
Cuchiero, C., Gonon, L., Grigoryeva, L., Ortega, J-P, Teichmann, J.: Discrete-time signatures and randomness in reservoir computing. pre-print 2010.14615 (2020)
Fletcher, P.T.: Geodesic regression and the theory of least squares on Riemannian manifolds. Int. J. Comput. Vis. 105(2), 171–185 (2013)
Kratsios, A., Bilokopytov, E.: Non-euclidean universal approximation (2020)
Osborne, M.S.: Locally convex spaces, Graduate Texts in Mathematics, vol. 269. Springer, Cham (2014)
Petersen, P., Raslan, M., Voigtlaender, F.: Topological properties of the set of functions generated by neural networks of fixed size. Found Comput Math. https://doi.org/10.1007/s10208-020-09461-0 (2020)
Gribonval, R., Kutyniok, G., Nielsen, M., Voigtlaender, F.: Approximation spaces of deep neural networks. Constr. Approx forthcoming (2020)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2016)
Gelfand, I.: Normierte Ringe. Rec. Math. N. S. 9(51), 3–24 (1941)
Isbell, J.R.: Structure of categories. Bull. Amer. Math. Soc. 72, 619–655 (1966)
Dimov, G.D.: Some generalizations of the Stone duality theorem. Publ. Math. Debrecen 80(3-4), 255–293 (2012)
Tuitman, J.: A refinement of a mixed sparse effective Nullstellensatz. Int. Math. Res. Not. IMRN 7, 1560–1572 (2011)
Fletcher, P.T.: Geodesic regression and the theory of least squares on Riemannian manifolds. Int. J. Comput. Vis. 105(2), 171–185 (2013)
Meyer, G., Bonnabel, S., Sepulchre, R.: Regression on fixed-rank positive semidefinite matrices: a Riemannian approach. J. Mach. Learn. Res. 12, 593–625 (2011)
Baes, M., Herrera, C., Neufeld, A., Ruyssen, P.: Low-rank plus sparse decomposition of covariance matrices using neural network parametrization. pre-print 1908.00461 (2019)
Hummel, J., Biederman, I.: Dynamic binding in a neural network for shape recognition. Psych. Rev. 99, 480–517 (1992)
Bishop, C.M.: Mixture density networks (1994)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ICLR (2017)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. Neural Netw. Learn Syst. 20(1), 61–80 (2009)
PrajitRamachandran, Q.V.L.: Searching for activation functions. ICLR (2018)
Pinkus, A.: Approximation theory of the MLP model in neural networks 8, 143–195 (1999)
Koopman, B.O.: Hamiltonian systems and transformation in hilbert space. Proc. Natl. Acad. Sci. 17(5), 315–318 (1931)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. ICML 30(1), 3 (2013)
Singh, R.K., Manhas, J.S.: Composition operators on function spaces, North-Holland Mathematics Studies, vol. 179. North-Holland Publishing Co., Amsterdam (1993)
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, vol. 27, pp 17–36. JMLR Workshop and Conference Proceedings (2012)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y, Hammer, B, Iliadis, L, Maglogiannis, I (eds.) Artificial Neural Networks and Machine Learning – ICANN 2018, pp 270–279. Springer (2018)
Chollet, F., et al.: Keras. https://keras.io/guides/transfer_learning/ (2015)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39(3), 930–945 (1993)
Darken, C., Donahue, M., Gurvits, L., Sontag, E.: Rate of approximation results motivated by robust neural network learning. In: Proceedings of the Sixth Annual Conference on Computational Learning Theory, pp 303–309. Association for Computing Machinery, New York (1993)
Prolla, J.B.: Weighted spaces of vector-valued continuous functions. Ann. Mat. Pura Appl. (4) 89, 145–157 (1971)
Bourbaki, N.: Éléments de mathématique. Topologie générale. Chapitres 1 à 4. Hermann, Paris (1971)
Phelps, R.R.: Subreflexive normed linear spaces. Arch. Math. (Basel) 8, 444–450 (1957)
Kadec, M.I.: A proof of the topological equivalence of all separable infinite-dimensional Banach spaces. Funkcional. Anal. i Priložen. 1, 61–70 (1967)
Grosse-Erdmann, K.-G., Peris Manguillot, A.: Linear chaos. Universitext, Springer, London (2011)
Pérez Carreras, P., Bonet, J.: Barrelled locally convex spaces, North-Holland Mathematics Studies, vol. 131. North-Holland Publishing Co., Amsterdam. Notas de Matemática [Mathematical Notes], 113 (1987)
Kreyszig, E.: Introductory functional analysis with applications, Wiley Classics Library. Wiley, New York (1989)
Bourbaki, N.: Espaces vectoriels topologiques. Chapitres 1 à 5, New. Masson, Paris (1981). Éléments de mathématique
Kalmes, T.: Dynamics of weighted composition operators on function spaces defined by local properties. Studia Math. 249(3), 259–301 (2019)
Przestacki, A.: Dynamical properties of weighted composition operators on the space of smooth functions. J. Math. Anal. Appl. 445(1), 1097–1113 (2017)
Bayart, F., Darji, U.B., Pires, B.: Topological transitivity and mixing of composition operators. J. Math. Anal. Appl. 465(1), 125–139 (2018)
Hoffmann, H.: On the continuity of the inverses of strictly monotonic functions. Irish Math. Soc. Bull. (75), 45–57 (2015)
Behrends, E., Schmidt-Bichler, U.: M-structure and the Banach-Stone theorem. Studia Math. 69(1), 33–40 (1980/81)
Jarchow, H.: Locally convex spaces. B. G. Teubner, Stuttgart. Mathematische Leitfäden. [Mathematical Textbooks] (1981)
Dieudonné, J., Schwartz, L.: La dualité dans les espaces F et LF. Ann. Inst. Fourier (Grenoble) 1, 61–101 (1949)
Funding
Open access funding provided by Swiss Federal Institute of Technology Zurich.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proofs of Main Results
Theorem 1 is encompassed by the following broader but more technical result.
Lemma 2 (Characterization of the Universal Approximation Property)
Let \(\mathcal {X}\) be a function space, E is an infinite-dimensional Fréchet space for which there exits some homeomorphism \({\Phi }:\mathcal {X}\rightarrow E\), and \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) be an architecture on \(\mathcal {X}\). Then the following are equivalent:
-
(i)
UAP: \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) has the UAP,
-
(ii)
Decomposition of UAP via Subspaces: There exist subspaces \(\{\mathcal {X}_{i}\}_{i \in I}\) of \(\mathcal {X}\) such that:
-
(a)
\(\bigcup _{i \in I} \mathcal {X}_{i}\) is dense in \(\mathcal {X}\),
-
(b)
For each i ∈ I, \({\Phi }(\mathcal {X}_{i})\) is a separable infinite-dimensional Fréchet subspace of E and \({\Phi }\left ({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\right )\) contains a countable, dense, and linearly-independent subset of \({\Phi }(\mathcal {X}_{i})\),
-
(c)
For each i ∈ I, there exists a homeomorphism \({\Phi }_{i}:\mathcal {X}_{i} \rightarrow L^{2}({\mathbb {R}})\).
-
(a)
-
(iii)
Decomposition of UAP via Topologically Transitive Dynamics: There exist subspaces \(\{\mathcal {X}_{i}\}_{i \in I}\) of \(\mathcal {X}\) and continuous functions {ϕi}i∈I with \(\phi _{i}:\mathcal {X}_{i}\rightarrow \mathcal {X}_{i}\) such that:
-
(a)
\(\bigcup _{i \in I} \mathcal {X}_{i}\) is dense in \(\mathcal {X}\),
-
(b)
For every pair of non-empty open subsets U,V of \(\mathcal {X}\) and every i ∈ I, there is some \(N_{i,U,V}\in {\mathbb {N}}\) such that \(\phi ^{N_{i,U,V}}(U\cap \mathcal {X}_{i})\cap (V\cap \mathcal {X}_{i}) \neq \emptyset \),
-
(c)
For every i ∈ I, there is some \(g_{i} \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\) such that \(\{{\phi _{i}^{n}}(g_{i})\}_{n \in {\mathbb {N}}}\) is a dense subset of \({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\), and in particular, it is a dense subset of \(\mathcal {X}_{i}\),
-
(d)
For each i ∈ I, \(\mathcal {X}_{i}\) is homeomorphic to \(C({\mathbb {R}})\).
-
(a)
-
(iv)
Parameterization of UAP on Subspaces: There are triples {(Xi,Φi,ψi)}i∈I of separable topological spaces Xi, non-constant continuous functions \({\Phi }_{i}:X_{i}\to \mathcal {X}\), and functions \(\psi _{i}:X_{i}\rightarrow X_{i}\) satisfying the following:
-
(a)
\(\bigcup _{i \in I} {\Phi }_{i}(X_{i})\) is dense in \(\mathcal {X}\),
-
(b)
For every i ∈ I and every pair of non-empty open subsets U,V of Xi, there is some \(N_{i,U,V}\in {\mathbb {N}}\) such that \(\psi ^{N_{i,U,V}}(U\cap X_{i})\cap (V\cap X_{i}) \neq \emptyset \),
-
(c)
For every i ∈ I, there is some \(x_{i} \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap X_{i}\) such that \(\{{\Phi }_{i}\circ {\psi _{i}^{n}}(x_{i})\}_{n \in {\mathbb {N}}}\) is a dense subset of \({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap {\Phi }_{i}(X_{i})\), and in particular, it is a dense subset of Φi(Xi).
-
(a)
Moreover, if \(\mathcal {X}\) is separable, then I may be taken to be a singleton.
Proof of Lemma 2
Suppose that (ii) holds. Since \(\bigcup _{i \in I} \mathcal {X}_{i}\) is dense in \(\mathcal {X}\) and since \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\subseteq {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\), then, it is sufficient to show that \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )} \cap \mathcal {X}_{i}\) is dense in \(\bigcup _{i \in I} \mathcal {X}_{i}\) to conclude that is is dense in \(\mathcal {X}\). Since each \(\mathcal {X}_{i}\) is a subspace of \(\mathcal {X}\) then, by restriction, each \(\mathcal {X}_{i}\) is a subspace of \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap \mathcal {X}_{i}\) with its relative topology.
Let \(\tilde {\mathcal {X}}\) denote the set \(\bigcup _{i \in I} \mathcal {X}_{i}\) equipped with the finest topology making each \(\mathcal {X}_{i}\) into a subspace, such a topology exists by [71, Proposition 2.6]. Since each \(\mathcal {X}_{i}\) is also a subspace of \(\bigcup _{i \in I} \mathcal {X}_{i}\) with its relative topology and since, by definition, that topology is no finer than the topology of \(\tilde {\mathcal {X}}\) then it is sufficient to show that \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap \mathcal {X}_{i}\) is dense in \(\tilde {X}\) to conclude that it is dense in \(\bigcup _{i \in I} \mathcal {X}_{i}\) equipped with its relative topology.
Indeed, by [71, Proposition 2.7] the space \(\tilde {X}\) is given by the (topological) quotient of the disjoint union \(\sqcup _{i \in I} \mathcal {X}_{i}\), in the sense of topological spaces (see [71, Example 3, Section 2.4]), under the equivalence relation \(f_{i}\sim f_{j}\) if fi = fj in \(\mathcal {X}\). Denote the corresponding quotient map by \(Q_{\tilde {\mathcal {X}}}\). Since a subset U of the quotient topology is open (see [71, Example 2, Section 2.4]) if and only if \(Q_{\tilde {\mathcal {X}}}^{-1}[U]\) is an open subset of \(\sqcup _{i \in I} \mathcal {X}_{i}\) and since a subset V of \(\sqcup _{i \in I} \mathcal {X}_{i}\) is open if and only if \(V\cap \mathcal {X}_{i}\) is open for each i ∈ I in the topology of \(\mathcal {X}_{i}\) then \(U\subseteq \tilde {\mathcal {X}}\) is open if and only if \(Q_{\tilde {\mathcal {X}}}^{-1}[U] \cap \mathcal {X}_{i}\) is open for each i ∈ I. Since \(\{{\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\}_{n \in {\mathbb {N}}^{+}}\) is dense in \(\mathcal {X}_{i}\) then for every open subset \(U'\subseteq \mathcal {X}_{i}\)
In particular, (11) implies that for every open subset \(U\subseteq \tilde {\mathcal {X}}\)
Therefore, \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\) is dense in \(\tilde {\mathcal {X}}\) and therefore it is dense in \(\bigcup _{i \in I} \mathcal {X}_{i}\) equipped with its relative topology. Hence, \({{\mathscr{F}}}\) has the UAP and therefore (i) holds.
In the next portion of the proof, we denote the (linear algebraic) dimension of any vector space V by dim(V ). Recall, that this is the cardinality of the smallest basis for V. We follow the Von Neumann convention and, whenever required by the context, we identify the natural number n with the ordinal \(\{1,\dots ,n\}\).
Assume that (i) holds. For the first part of this proof, we would like to show that D contains a linearly independent and dense subset D′. Since \(\mathcal {X}\) is homeomorphic to some infinite-dimensional Fréchet space E, then there exists a homeomorphism \({\Phi }:\mathcal {X}\to E\) mapping \({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\) to a dense subset D of E. We denote the metric on E by d. A consequence of [72, Theorem 3.1], discussed thereafter by the authors, implies that since E is an infinite dimensional Fréchet space then it has a dense Hamel basis, which we denote by {ba}a∈A. By definition of the Hamel basis of E we may assume that the cardinality of A, denoted by Card(A), is equal to dim(E). Next, we use {ba}a∈A to produce a base of open sets for the topology of E of cardinality equal to dim(E).
Since E is a metric space, then its topology is generated by the open sets \(\{\operatorname {Ball}_{E}(b_{a},q)\}_{a \in A, r \in (0,\infty )}\), where \( \operatorname {Ball}_{E}(b_{a},r) \triangleq \left \{ d(b_{a},x)<r \right \}. \) Indeed, since \({\mathbb {Q}}\) is dense in \({\mathbb {R}}\), then for every a ∈ A and \(r \in (0,\infty )\) the basic open set Ball E(ba,r) can be expressed by \( \operatorname {Ball}_{E}(b_{a},r) = \bigcup _{q \in {\mathbb {Q}}\cap (0,r)} \operatorname {Ball}_{E}(b_{a},q). \) Hence, \(\{\operatorname {Ball}_{E}(b_{a},q)\}_{a \in A, q \in {\mathbb {Q}}\cap (0,\infty )}\) generates the topology on E. Moreover, the cardinality the indexing set \(A\times {\mathbb {Q}}\) is computed by
since E is infinite and therefore at-least countable. Therefore, \(\{\operatorname {Ball}_{E}(b_{a},q)\}_{a \in A, q \in {\mathbb {Q}}\cap (0,\infty )}\) is a base for the topology on E of Cardinality equal to dim(E). Let ω be the smallest ordinal with \(Card(\omega )=\textup {dim}(E)=Card(A\times {\mathbb {Q}} \cap (0,\infty ))\). In particular, there exists a bijection \(F:\omega \to A\times {\mathbb {Q}} \cap (0,\infty )\) which allows us to canonically order the open sets {Ball E(F(j)1,F(j)2)}j≤ω, where for any j < ω we denote F(j)1 ∈ A and \(F(j)_{2} \in {\mathbb {Q}} \cap (0,\infty )\).
We construct D′ by transfinite induction using ω. Indeed since 1 < ω, then since D is dense in E and {Ball E(F(j)1,F(j)2)}j≤ω defines a base for the topology of E, then there exists some U1 ∈{Ball E(F(j)1,F(j)2)}j≤ω containing some d1 ∈ D. For the inductive step, suppose that for all i ≤ j for some j < ω, we have constructed a linearly independent set {di}i<j with di ∈{Ball E(F(i)1,F(i)2)} for every i ≤ j. Since j < ω and {di}i<j contains Card(j) and {di}i<j is a Hamel basis of span({xi}i<j) then \( \textup {dim}\left (\operatorname {span}(\{x_{i}\}_{i < j}) \right ) < \textup {dim}(E). \) Hence, span({xi}i<j) has empty interior and therefore it cannot contain any {Ball E(F(j)1,F(j)2)}j≤ω. In particular, there is an open subset \(V'\subseteq \operatorname {Ball}_{E}(F(j)_{1},F(j)_{2}) - \operatorname {span}(\{x_{i}\}_{i < j})\) and since D was assumed to be dense in E then there must be some \(d_{j} \in V'\subseteq \operatorname {Ball}_{E}(F(j)_{1},F(j)_{2})\). This completes the inductive step and therefore there is a linearly independent and dense subset \(D'\triangleq \{d_{j}\}_{j \leq \omega }\) contained in D of cardinality Card(ω) = dim(E).
Next, let I be the set of all countable sequences of distinct elements in ω. For every i ∈ I, let \(E_{i}\triangleq \overline {\operatorname {span}_{j \in i}(d_{j})}\), where \(\overline {A}\) denotes the closure of a subset \(A\subseteq E\) in the topology of E. Then, each Ei is a linear subspace of E with countable basis {dj}j∈i. Since any Fréchet space with countable basis is separable and therefore each Ei is a separable Fréchet space. Moreover, by construction,
and therefore \(\bigcup _{i \in I} E_{i}\) is dense in E since D′ is dense in E. Since Φ is a homeomorphism then \({\Phi }^{-1}:E\to \mathcal {X}\) is a continuous surjection, and since the image of a dense set under any continuous map is dense in the range of that map then Φ− 1(D′) is dense in \(\mathcal {X}\). Moreover, using the fact that inverse images commute with unions and the fact that that Φ is a bijection, we compute that
Since Φ as a bijection and D was defined as the image of \({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\) in E under Φ, then \(D'\subset {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\) and D′ is dense in \(\mathcal {X}\). In particular, (14) implies that \(\bigcup _{i \in I} {\Phi }^{-1}[E_{i}] \subseteq \bigcup _{i \in I} ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap {\Phi }^{-1}[E_{i}])\) and therefore \(\bigcup _{i \in I} ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap {\Phi }^{-1}[E_{i}])\) is dense in \(\mathcal {X}\). In particular, \(\bigcup _{i \in I} {\Phi }^{-1}[E_{i}]\) is dense in \(\mathcal {X}\), and for each i ∈ I, if we define \(\mathcal {X}_{i}\triangleq {\Phi }^{-1}[E_{i}]\) then we obtain (ii.a).
Since Φ is a homeomorphism then it preserves dense sets and in particular since {di}j∈i is a countable, dense, and linearly independent subset of \({\Phi }^{-1}[\{d_{j}\}_{j \in i}]\) then it is a dense countable subset of \(\mathcal {X}_{i}\). Hence, each \(\mathcal {X}_{i}\) is separable.
This gives (ii.b). Lastly, by [73] any two separable infinite-dimensional Fréchet space are homeomorphic. In particular, since \(L^{2}({\mathbb {R}})\) is a separable Hilbert space is a separable Fréchet space. Therefore, for each i ∈ I, there is a homeomorphism \({\Phi }_{i}: E_{i} \to L^{2}({\mathbb {R}})\). In particular, \({\Phi }_{i}\circ {\Phi }:\mathcal {X}_{i}\to L^{2}({\mathbb {R}})\) must be a homeomorphism and therefore (ii.b) holds. Therefore, (i) implies (ii).
Suppose that (ii) holds. Then, (iii.a) holds by (ii.a). For each i ∈ I, let \(\{d_{n,i}\}_{n \in {\mathbb {N}}}\) be a countable dense subset of \(\mathcal {X}_{i}\cap {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\) for which \({\Phi }(\{d_{n,i}\}_{n \in {\mathbb {N}}})\) is a linearly independent, and let \(E_{i}=\overline {\operatorname {span}(\{d_{n,i}\}_{n \in {\mathbb {N}}})}\). Let \(D\triangleq \bigcup _{i \in I} \{d_{n,i}\}_{n \in {\mathbb {N}}}\) and \(D'\triangleq {\Phi }(D)\). Thus, for every i ∈ I, D′∩ Ei is a countably infinite linearly independent and dense subset of Ei then by [74, Theorem 8.24] there exists a continuous linear operator Ti : D ∩ Ei → D ∩ Ei satisfying
for each \(n \in {\mathbb {N}}\) and each i ∈ I. In particular, \(\left \{{T^{n}_{i}}(d_{0,i})\right \}\) is dense in Ei. For each i ∈ I, define \(\phi _{i}\triangleq {\Phi }^{-1}\circ T_{i} \circ {\Phi }\) and \(g_{i}\triangleq {\Phi }^{-1}(d_{0,i})\) and observe that for every \(n \in {\mathbb {N}}\)
Since \(\{{T_{i}^{n}}(d_{0,i})\}_{n \in {\mathbb {N}}}\) is dense in Ei and Φ is a homeomorphism from \(\mathcal {X}_{i}\) to Ei then
is dense in \(\mathcal {X}_{i}\). Thus, (iii.c) holds. For any i ∈ I, define the map \(\psi _{i}:L^{2}({\mathbb {R}})\to L^{2}({\mathbb {R}})\) by
and define the vector \(\tilde {g}_{i} \in L^{2}({\mathbb {R}})\) by \(\tilde {g}_{i}\triangleq {\Phi }_{i}\circ {\Phi }(g_{i})\). Since Φ and Φi are homeomorphisms and since ϕi is continuous then ψi is well-defined and continuous. Moreover, analogously to (15) we compute that \( \left \{ {\psi _{i}^{n}}(\tilde {g}_{i}) \right \}_{n \in {\mathbb {N}}} \) is dense in \(L^{2}({\mathbb {R}})\). Since \(L^{2}({\mathbb {R}})\) is a complete separable metric space with no isolated points and ψi is continuous self-map of \(L^{2}({\mathbb {R}})\) for which there is a vector \(\tilde {g}_{i} \in L^{2}({\mathbb {R}})\) such that the set of iterates \(\{{\psi _{i}^{n}}(\tilde {g}_{i})\}_{n \in {\mathbb {N}}}\) is dense in \(L^{2}({\mathbb {R}})\) then Birkhoff Transitivity Theorem, see the formulation of [74, Theorem 1.16], implies that for every pair of non-empty open subsets \(\tilde {U},\tilde {V}\subseteq L^{2}({\mathbb {R}})\) there is some \(n_{\tilde {U},\tilde {V}}\) satisfying
Since Φi ∘Φ is a homeomorphism, then [74, Proposition 1.13] and (16) imply that for every pair of non-empty open subsets \(U',V'\subseteq \mathcal {X}_{i}\) there exists some \(n_{U',V'} \in {\mathbb {N}}\) satisfying
Since \(\mathcal {X}_{i}\) is equipped with the subspace topology then every non-empty open subset \(U'\subseteq \mathcal {X}_{i}\) is of the form \(U\cap \mathcal {X}_{i}\) for some non-empty open subset \(U\subseteq \mathcal {X}\). Therefore, (17) implies (iii.b). Since both \(L^{2}({\mathbb {R}})\) and \(C({\mathbb {R}})\) are separable infinite-dimensional Fréchet spaces then the [73, Anderson-Kadec Theorem] implies that there exists a homeomorphism \({\Psi }:L^{2}({\mathbb {R}})\rightarrow C({\mathbb {R}})\). Therefore, for each i ∈ I, \({\Psi }\circ {\Phi }_{i}\circ {\Phi }:\mathcal {X}\rightarrow C({\mathbb {R}})\) is a homeomorphism and thus (ii.c) implies (iii.d).
Suppose that (iii) holds. For every i ∈ I, set \(X_{i}\triangleq \mathcal {X}_{i}\), let \({\Phi }_{i}\triangleq 1_{X_{i}}\) be the identity map on Xi, set \(\psi _{i}\triangleq \phi _{i}\), and set \(x_{i}\triangleq g_{i}\). Therefore, (iv) holds.
Suppose that (iv) holds. By (iv.c), for each i ∈ I, \({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\) is dense in \(\mathcal {X}_{i}\). Therefore,
By (iv.a) since \(\bigcup _{i \in I} \mathcal {X}_{i}\) is dense in \(\mathcal {X}\) therefore its closure is \(\mathcal {X}\) and therefore the smallest, and thus only, closed set containing \(\bigcup _{i \in I}\mathcal {X}_{i}\) is \(\mathcal {X}\) itself. Therefore, by (18) the smallest set containing \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap \mathcal {X}_{i}\) must be \(\mathcal {X}\). Therefore, \({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\) is dense in \(\mathcal {X}\) and (i) holds. This concludes the proof. □
Proof of Theorem 2
By the [73, Anderson-Kadec Theorem] there is no loss of generality in assuming that m = n = 1, since \(C(\mathbb {R}^{m},\mathbb {R}^{n})\) and \(C({\mathbb {R}})\) are homeomorphic. Let \(\mathcal {X}^{\prime }\triangleq \bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\). By (5), \(\mathcal {X}^{\prime }\) is dense in \(\mathcal {X}\) and since density is transitive, then it is enough to show that \(\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )})\) is dense in \(\mathcal {X}^{\prime }\) to conclude that it is dense in \(\mathcal {X}\). Since each Φi is continuous, then, the topology on \(\mathcal {X}^{\prime }\) is no finer than the finest topology on \(\bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\) making each Φi continuous and by [71, Proposition 2.6] such a topology exists. Let \(\mathcal {X}^{\prime \prime }\) denote \(\bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\) equipped with the finest topology making each \({\Phi }_{i}(C({\mathbb {R}}))\) into a subspace. By construction, if \(U\subseteq \mathcal {X}^{\prime }\) is open then it is open in \(\mathcal {X}^{\prime \prime }\) and therefore if \(\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}) \) intersects each non-empty open subset of \(\mathcal {X}^{\prime \prime }\) then it must do the same for \(\mathcal {X}^{\prime }\). Hence, it is enough to show that \(\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )})\) is dense in \(\mathcal {X}^{\prime \prime }\) to conclude that it is dense in \(\mathcal {X}^{\prime }\) and therefore, \(\bigcup _{i \in I} {\Phi }_{i}({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )})\) is dense in \(\mathcal {X}\).
We proceed similarly to the proof of Lemma 2. Indeed, by [71, Proposition 2.7] the space \(\mathcal {X}^{\prime \prime }\) is given by the (topological) quotient of the disjoint union \(\sqcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\), in the sense of topological spaces (see [71, Example 3, Section 2.4]), under the equivalence relation \(f_{i}\sim f_{j}\) if fi = fj in \(\mathcal {X}\). Denote the corresponding quotient map by \(Q_{\mathcal {X}^{\prime }}\). Since a subset U of the quotient topology is open (see [71, Example 2, Section 2.4]) if and only if \(Q_{\mathcal {X}^{\prime }}^{-1}[U]\) is an open subset of \(\sqcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\) and since a subset V of \(\sqcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\) is open if and only if \(V\cap {\Phi }_{i}(C({\mathbb {R}}))\) is open for each i ∈ I in the topology of \({\Phi }_{i}(C({\mathbb {R}}))\) then \(U\subseteq \mathcal {X}^{\prime \prime }\) is open if and only if \(Q_{\mathcal {X}^{\prime }}^{-1}[U] \cap {\Phi }_{i}(C({\mathbb {R}}))\) is open for each i ∈ I. Since \(\{{\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap {\Phi }_{i}(C({\mathbb {R}}))\}_{n \in {\mathbb {N}}^{+}}\) is dense in \({\Phi }_{i}(C({\mathbb {R}}))\) then for every open subset \(U'\subseteq {\Phi }_{i}(C({\mathbb {R}}))\)
In particular, (19) implies that for every open subset \(U\subseteq \mathcal {X}^{\prime \prime }\)
Therefore, \(\bigcup _{i \in I} {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap {\Phi }_{i}(C({\mathbb {R}}))\) is dense in \(\mathcal {X}^{\prime \prime }\) and therefore it is dense in \(\bigcup _{i \in I} {\Phi }_{i}(C({\mathbb {R}}))\) equipped with its relative topology. Hence, \({({{\mathscr{F}}}_{\Phi },\circlearrowleft _{\Phi })}\) has the UAP on \(\mathcal {X}^{\prime \prime }\) and therefore it has the UAP on \(\mathcal {X}\) itself. □
Proof of Theorem 3
Let σ be a continuous and non-polynomial activation function. Then [61] implies that the architecture \({\left ({\mathscr{F}}_{0},\circlearrowleft _{0}\right )}\), as defined in Example 4, is a universal approximator on \(C({\mathbb {R}})\).
By Theorem 1, since \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) has the UAP on \(\mathcal {X}\) and since \(\mathcal {X}\) is homeomorphic to an infinite-dimensional Fréchet space then there are homeomorphisms {Φi}i∈I from \(C({\mathbb {R}})\) onto a family of subspaces \(\{\mathcal {X}_{i}\}_{i \in I}\) of \(\mathcal {X}\) such that \(\bigcup _{i \in I} \mathcal {X}_{i}\) is dense. Fix 𝜖 > 0 and \(f \in \mathcal {X}\). Since \(\bigcup _{i \in I} \mathcal {X}_{i}\) is dense in \(\mathcal {X}\) there exists some i ∈ I and some \(f_{i}\in \mathcal {X}_{i}\) such that
Since Φi is a homeomorphism then it must map dense sets to dense sets. Since \(\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )\) has the UAP on \(C({\mathbb {R}})\) then \({\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )}\) is dense in \(C({\mathbb {R}})\) and therefore, for each i ∈ I, \({\Phi }_{i}({\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )})\) is dense in \(\mathcal {X}_{i}\). Hence, there exists some \(\tilde {g}_{\epsilon }\in {\Phi }_{i}({\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )})\) such that \(d_{\mathcal {X}}(f_{i},\tilde {g}_{\epsilon })<\frac {\epsilon }{2}\). Since Φi is a homeomorphism, it is a bijection, therefore there exists a unique \(g_{\epsilon }\in {\mathcal {NN}}^{\left ({{\mathscr{F}}}0,\circlearrowleft 0\right )}\) with \({\Phi }_{i}(g_{\epsilon })=\tilde {g}_{\epsilon }\). Hence, the triangle inequality and (21) imply that
This yields the first inequality in the Theorem’s statement.
By Theorem 1 since, for each i ∈ I, \({\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\) is dense in \(\mathcal {X}_{i}\) and since \({\Phi }_{i}^{-1}\) is a homeomorphism on \(\mathcal {X}_{i}\) then \({\Phi }_{i}^{-1}\left ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\right )\) is dense in \(C({\mathbb {R}})\). In particular, there exits some \(\tilde {f}_{\epsilon } \in {\Phi }_{i}^{-1}\left ({\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\cap \mathcal {X}_{i}\right )\) satisfying
Since Φi is a bijection then there exists a unique \(f_{\epsilon }\in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\) such that \({\Phi }_{i}^{-1}(f_{\epsilon })=\tilde {f}_{\epsilon }\). Therefore, (23) and the triangle inequality imply that
Therefore the conclusion holds. □
Remark 1
By the [73, Anderson-Kadec Theorem], since both \(L^{2}({\mathbb {R}})\) and \(C({\mathbb {R}})\) are separable infinite-dimensional Fréchet spaces then there exists a homeomorphism \({\Phi }:L^{2}({\mathbb {R}})\rightarrow C({\mathbb {R}})\). Therefore, the proof of Corollary 3 holds (mutatis mutandis) with each Φ replaced by \({\Phi }_{i}\circ {\Phi }^{-1}\) and with \(C({\mathbb {R}})\) in place of \(L^{2}({\mathbb {R}})\).
The proof of the next result relies on some aspects of inductive limits of Banach spaces. Briefly, an inductive limit of Banach spaces is a locally convex space B for which there exists a pre-ordered set I, a set of Banach sub-spaces {Bi}i∈I with \(B_{i}\subseteq B_{j}\) if i ≤ j. The inductive limit of this direct system is the subset \(\bigcup _{i \in I} B_{i}\) equipped with the finest topology which simultaneously makes each Bi into a subspace and makes \(\bigcup _{i \in I} B_{i}\) into a locally-convex spaces. Spaces constructed in this way are called ultrabornological spaces and more details about them can be found in [75, Chapter 6].
Proof of Theorem 4
Since \(B(\mathcal {X}_{0})\) and B(X) are both infinite-dimensional Banach spaces, then they are infinite-dimensional ultrabornological space, in the sense of [75, Definition 6.1.1]. Since X is separable, then as observed in [33], B(X) is separable. Therefore, [75, Theorem 6.5.8] applies; hence, there exists a directed set I with pre-order ≤, a collection of Banach subspaces {Bi}i∈I satisfying (i) and (ii), and a collection of continuous linear isomorphisms \({\Phi }_{i}:B(X)\rightarrow B_{i}\). Furthermore, the topology on B is coarser than the inductive limit topology \(\varinjlim _{i \in I} B_{i}\). Since each B(X) and Bi are Banach spaces, and in particular normed linear spaces, then by the results of [76, Section 2.7] the maps Φi are bounded linear isomorphisms.
Let i ∈ I, and fix any xi ∈ X −{0X} then since \(\delta ^{X}:X\rightarrow B(X)\) is base-point preserving then \(\delta ^{X}_{x_{i}}\neq 0\) and therefore there exists a linearly independent subset \({\mathscr{B}}_{x_{i}}\) of B(X) containing \(\delta ^{X}_{x_{i}}\). Since B(X) is separable then \({\mathscr{B}}_{x_{i}}\) is countably infinite and therefore [74, Theorem 8.24] there exists a bounded linear map \(\phi _{i}:B(X)\rightarrow B(X)\) such that \(\{{\phi _{i}^{n}}(\delta ^{X}_{x_{i}})\}_{n \in {\mathbb {N}}^{+}}\) is a dense subset of B(X).
Since Φi is a continuous linear isomorphisms then it is in particular a surjective continuous map from B(X) onto Bi. Since the image of a dense set under a continuous surjection is itself dense then \(\left \{{\Phi }_{i}\circ {\phi _{i}^{n}}(\delta _{x_{i}})\right \}_{n \in {\mathbb {N}}^{+}}\) is a dense subset of Bi. Moreover, this holds for each i ∈ I.
By definition, the topology on \(\varinjlim _{i \in I} B_{i}\) is at-least as fine as the Banach space topology on \(B(\mathcal {X}_{0})\), since each Bi is a linear subspace of \(B(\mathcal {X}_{0})\). Moreover, the topology on \(\varinjlim _{i \in I} B_{i}\) is no finer than the finest topology on \(\bigcup _{i \in I} B_{i}\) making each Bi into a topological space (but not requiring that \(\bigcup _{i \in I} B_{i}\) be locally-convex), which exists by [77, Proposition 6]. Denote this latter space by \(\tilde {B}\). Therefore, if
is dense in \(\tilde {B}\) then it is dense in \(\varinjlim _{i \in I} B_{i}\) and in \(B(\mathcal {X}_{0})\). Hence, we show that (24) is dense in \(\tilde {B}\). That is, it is enough to show that every open subset of \(\tilde {B}\) contains an element of (24).
By [71, Proposition 2.7] the space \(\tilde {B}\) is given by the topological quotient of the disjoint union ⊔i∈IBi, in the sense of topological spaces (see [71, Example 3, Section 2.4]), under the equivalence relation \(x_{i}\sim x_{j}\) for any i ≤ j if xi = xj in Bj. Denote the corresponding quotient map by \(Q_{\tilde {B}}\). Since a subset U of the quotient topology is open (see [71, Example 2, Section 2.4]) if and only if \(Q_{\tilde {B}}^{-1}[U]\) is an open subset of ⊔i∈IBi and since a subset V of ⊔i∈IBi is open if and only if V ∩ Bi is open for each i ∈ I in the topology of Bi then \(U\subseteq \tilde {B}\) is open if and only if \(Q_{\tilde {B}}^{-1}[U] \cap B_{i}\) is open for each i ∈ I. Since \(\{{\Phi }_{i}\circ {\phi _{i}^{n}}(x_{i})\}_{n \in {\mathbb {N}}^{+}}\) is dense in Bi then for every open subset \(U'\subseteq B_{i}\)
In particular, (25) implies that for every open subset \(U\subseteq \tilde {B}\)
Therefore, (24) is dense in \(\tilde {B}\) and, in particular, it is dense in \(B(\mathcal {X}_{0})\).
Since \(\mathcal {X}_{0}\) was barycentric, then there exists a continuous linear map \(\rho :B(\mathcal {X}_{0})\rightarrow \mathcal {X}_{0}\) which is a left-inverse of \(\delta ^{\mathcal {X}_{0}}\). Thus, for every \(f \in \mathcal {X}_{0}\), \(\rho \circ \delta ^{\mathcal {X}_{0}}_{f} = f\) and therefore ρ is a continuous surjection. Since the image of a dense set under a continuous surjection is dense and since (24) is dense then
is a dense subset of \(\mathcal {X}_{0}\). Since \(\mathcal {X}_{0}\) has assumed to be dense in \(\mathcal {X}\) and since density is transitive then (27) is dense in \(\mathcal {X}\). This concludes the main portion of the proof.
The final remark follows from the fact that if \(X=\mathcal {X}_{0}\) then the identity map \(1_{X}:X\rightarrow \mathcal {X}_{0}\) is an isometry and therefore the universal property of B(X) described in Theorem [32, Theorem 3.6] implies that 1X uniquely extends to a bounded linear isomorphism L between B(X) and \(B(\mathcal {X}_{0})\) satisfying
Hence L must be the identity on B(X). □
Appendix B: Proof of Applications of Main Results
Lemma 3
Fix some \(b \in {{{{\mathbb {R}}}^{m}}}\), and let \(\sigma :{\mathbb {R}}\to {\mathbb {R}}\) be a continuous activation function. Then ΦA,b is a well-defined and continuous linear map from \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) to itself and the following are equivalent:
-
(i)
For each δ > 0,𝜖 > 0 and each \(f,g\in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) there is some \(N_{U,V}\in {\mathbb {N}}^{+}\) such that
$$ \left\{{\Phi}^{N_{U,V}}(\tilde{g}): d_{ucc}(\tilde{g},g)<\delta\right\} \cap \left\{ \tilde{f}: d_{ucc}(\tilde{f},f)<\epsilon \right\} \neq \emptyset , $$ -
(ii)
σ is injective, A is of full-rank, and for every compact subset \(K\subseteq [a,b]\) there is some \(N_{K}\in {\mathbb {N}}^{+}\) such that
$$ S^{N}(K)\cap K = \emptyset, $$where S(x) = σ ∙ (Ax + b).
If A is the m × m-identity matrix Im and bi > 0 for \(i=1,\dots ,m\) then (i) and (ii) are equivalent to
-
(iii)
σ is injective and has no fixed-points.
If A is the m × m-identity matrix Im and bi > 0 for \(i=1,\dots ,m\) then (iii) is equivalent to
-
(iv)
Either σ(x) > x or σ(x) < x for every \(x \in {\mathbb {R}}\).
Proof Lemma 3
By [37, Theorem 46.8] the topology of uniform convergence on compacts is the compact-open topology on \(C(\mathbb {R}^{m},\mathbb {R}^{n})\) and by [37, Theorem 46.11] composition is a continuous operation in the compact-open topology. Therefore, ΦA,b is well-defined and continuous map. Its linearity follows from the fact that
Since the topology of uniform convergence on compacts is a metric topology, with metric ducc, then \(\left \{U_{f,{\epsilon }}:f \in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}), {\epsilon } >0\right \}\) defines a base for this topology, where \(U_{f,\epsilon }\triangleq \left \{g \in C(\mathbb {R}^{m},\mathbb {R}^{n}): d_{ucc}(f,g)<\epsilon \right \}\). Therefore, Lemma 3 (i) is equivalent to the statement: for each pair of non-empty open subsets \(U,V \in C(\mathbb {R}^{m},\mathbb {R}^{n})\) there is some \(N_{U,V}\in {\mathbb {N}}^+\) such that \( {\Phi }_{I,b}^{N_{U,V}}(U)\cap V \neq \emptyset . \) Without loss of generality, we prove this formulation instead.
Next, by [78, Corollary 4.1] ΦA,b satisfies Theorem 1 (ii.b) if and only if \(S(x)\triangleq \sigma (Ax+b)\) is injective and for every compact subset \(K\subseteq \mathbb {R}^{m}\) there exists some \(N_K \in {\mathbb {N}}^+\) such that
Therefore, A must be injective which is only possible if A is of full-rank. This gives the equivalence between (i) and (ii).
We consider the equivalence between (ii) and (iii) in the case where A is the identity matrix and bi > 0 for \(i=1,\dots ,m\). Since \(S(x)=(\sigma (x+b_1),\dots ,\sigma (x+b_m))\) it is sufficient to verify condition (28) in the case where m = 1. Since bi > 0 for \(1,\dots ,m\) then it is clear that S is injective and has no fixed points if and only if σ is injective and has no fixed points. We show that S is injective and has no fixed points if and only if (ii) holds. Indeed, note that if S has not fixed points, then since bi > 0 for \(i=1,\dots ,m\) then S has no fixed points if and only if σ no fixed points.
From here, we proceed analogously to the proof of [79, Lemma 4.1]. If S has a fixed-point then for every \(N \in {\mathbb {N}}^+\), SN(x) = {x} which is a non-empty compact subset of \({\mathbb {R}}\). Therefore, (28) cannot hold. Conversely, suppose that S has no fixed points. The intermediate-value theorem and the fact that S has no fixed-points that either S(x) < x or S(x) > x. Mutatis mutandis, we proceed with the first case. Since σ is injective and S has not fixed points then S must be a strictly increasing function; thus S([a,b]) = [S(a),S(b)] for every a < b.
Let K be a non-empty compact subset of \({\mathbb {R}}\). By the Heine-Borel theorem K is closed and bounded, thus it is contained in some [a,b] for a < b. Therefore, it is sufficient to show the results for the case where K = [a,b]. Since S is increasing then for every \(n \in {\mathbb {N}}\), the sequence \(\{S^n(a)\}_{n \in {\mathbb {N}}}\) satisfies Sn(a) < Sn+ 1(a). If this sequence is not unbounded then there would exist some \(a_0 \in {\mathbb {R}}\) such that \(a_0= \lim \limits _{n \to \infty } S^n(a)\). Therefore, by the continuity of S we would find that
but since S has not fixed points then there cannot exist such an a0 since otherwise a0 = S(a0). Therefore, a0 does not exist and thus \(\{S^n(a)\}_{n \in {\mathbb {N}}}\) is unbounded. Hence, for every a < b there exists some \(N_{[a,b]}\in {\mathbb {N}}^+\) such that
Thus, (ii) and (iii) are equivalent when A = Im.
Next, assume that any of (i) to (iii) hold, that \(\mathcal {X}\) is a non-empty subset of \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\), and that \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) has the UAP on \(\mathcal {X}\). Then for any other non-empty open subset \(U\subseteq C(\mathbb {R}^{m},\mathbb {R}^{n})\) there exists some \(N_{\mathcal {X},U}\in {\mathbb {N}}\) such that
Since ΦA,b is continuous then so is \({\Phi }_{A,b}^N\) and therefore \(({\Phi }_{A,b}^{N_{\mathcal {X},U}})^{-1}[U]\) is a non-empty open subset of \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). Since the finite intersection of open sets is again open, then we have that
This implies that \(\mathcal {X} \cap {\Phi }_{I_m,b}^{N_{\mathcal {X},U}}[U]\) is a non-empty open subset of \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) contained in \(\mathcal {X}\). Since \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) has te UAP on \(\mathcal {X}\), then there exists some \(f \in {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )} \cap [\mathcal {X} \cap {\Phi }_{A,b}^{N_{\mathcal {X},U}}[U]]\). Thus, \({\Phi }^{N_{\mathcal {X},U}}(f)\in U\) and, by definition, \({\Phi }^{N_{\mathcal {X},U}}(f)\in {\mathcal {NN}}^{\left ({{\mathscr{F}}}\sigma ;deep,\circlearrowleft \sigma ;deep\right )}\).
Thus, for each U in
there exists some \(N_U \in {\mathbb {N}}^+\) and some \(f_U \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\) such that \({\Phi }^{N_U}(f_U)\in U\). In particular, since (31) is a base for the topology on \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) and since the intersection of open sets is again open, then every non-empty open subset of U is contained an element of (31) which, in turn, contains an element of the form \({\Phi }^{N_U}(f_U)\). Thus, \({\mathcal {NN}}^{\left ({{\mathscr{F}}}\sigma ;deep,\circlearrowleft \sigma ;deep\right )} \cap U\neq \emptyset \).
Hence, \({\mathcal {NN}}^{\left ({{\mathscr{F}}}\sigma ;deep,\circlearrowleft \sigma ;deep\right )}\) has the UAP on \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). □
Proof of Theorem 5
The equivalence between (i), (ii), and (iv) follows from Lemma 3. The equivalence between (iii) and (iv) follows from the formulation of Birkhoff’s transitivity theorem described in [74, Theorem 2.19]. □
Proof of Proposition 1
Since α1 < 1 then σ(x) > x for every x < 0. Since 0 < α2 then σ(0) = 0 < α2. Lastly, since \(\tilde {\sigma }\) is monotone increasing then for every x > 0 we have that
Therefore, σ cannot have a fixed point. Moreover, since \(\tilde {\sigma }\) is strictly increasing it must be injective, since if x < y then σ(x) < σ(y) and therefore σ(x)≠σ(y) if x≠y. Hence, σ is injective. Moreover, since the sum of continuous functions is again continuous, then σ is continuous.
Since α1x + α2 is affine then it is continuously differentiable. Thus σ is continuously differentiable on any x < 0. Lastly, setting α2 not equal to \(\tilde {\sigma }'(0)-1\) ensure that σ is not differentiable at 0 and therefore it cannot be polynomial. In particular, it cannot be affine. □
For convenience, we denote the collection of set-functions from \({{{{\mathbb {R}}}^{m}}}\) to \({{{{\mathbb {R}}}^{n}}}\) by \([{{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}]\).
Proof of Corollary 4
Since ducc is a metric on \([{{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}]\) and since \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\subseteq [{{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}}]\), then the map \(F:C(\mathbb {R}^{m},\mathbb {R}^{n})\rightarrow C(\mathbb {R}^{m},\mathbb {R}^{n})\) defined by \(F(g)\triangleq d_{ucc}(\tilde {f}_0,g)\) is continuous. Therefore, the set \(F^{-1}\left [(-\infty ,\delta )\right ]\) is an open subset of \(C(\mathbb {R}^{m},\mathbb {R}^{n})\). In particular, (7) guarantees that it is non-empty. Since σ is non-affine and continuously differentiable at-least at one point with non-zero derivative at that point then [17, Theorem 3.2] applies, whence the set \(\mathcal {X}_0\) of continuous functions \(h:\mathbb {R}^{m}\rightarrow \mathbb {R}^{n}\) with representation
where \(W_j:{ {{\mathbb {R}}^{d_j} }}\rightarrow { {{\mathbb {R}}^{d_{j+1}} }}\), for \(j=1,\dots ,J-1\), are affine and nm + 2 ≥ dj if j∉{1,J} and d1 = m, and dJ = n, is dense in \(C({\mathbb {R}^{m}},{\mathbb {R}^{n}})\). Therefore, since \(F^{-1}\left [(-\infty ,\delta )\right ]\) is an open subset of \(C(\mathbb {R}^{m},\mathbb {R}^{n})\) then \(\mathcal {X}_0\cap F^{-1}\left [(-\infty ,\delta )\right ]\) is dense in \(F^{-1}\left [(-\infty ,\delta )\right ]\).
Fix some \(b \in {{{{\mathbb {R}}}^{m}}}\) with bi > 0 for \(i=1,\dots ,m\). Since σ is continuous, injective, and has no fixed-points then applying Lemma 3 implies that \( \mathcal {X}_1 \triangleq \{{\Phi }_{I_m,b}^N(f): f \in F^{-1}[(-\infty ,\delta )] \cap \mathcal {X}_0, N \in {\mathbb {N}}^+\}, \) is a dense subset of \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). This gives (i). Moreover, by construction, every \(g \in \mathcal {X}_1\) admits a representation satisfying (iii) and (iv). Furthermore, since \(W_{J}\circ \sigma \bullet {\dots } \circ \sigma \bullet W_1 \in \mathcal {X}_2\) and by construction there exists some \(g \in \mathcal {X}_1\) for which \( d_{ucc}\left (W_{J}\circ \sigma \bullet {\dots } \circ \sigma \bullet W_1 ,g \right )<\delta , \); then (ii) holds. □
Proof of Corollary 5
Since each Fn, for \(n=1,\dots ,N\), is a continuous function from \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) to \([0,\infty ]\) then each \(F_n^{-1}\left [[0,C_n)\right ]\) is an open subset of \(C(\mathbb {R}^{m},\mathbb {R}^{n})\). Since the finite intersection of open sets is itself open, then \(\cap _{n=1}^N F_n^{-1}\left [[0,C_n)\right ]\) is an open subset of \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). Since there exists some \(f_0\in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) satisfying (8) then U is non-empty. Since \(\left ({\mathscr{F}},\circlearrowleft \right )\) has the UAP on \(C(\mathbb {R}^{m},\mathbb {R}^{n})\) then \(\left ({\mathscr{F}},\circlearrowleft \right ) \cap U\) is dense in U.
Fix \(b \in {{{{\mathbb {R}}}^{m}}}\) with bi > 0 for \(i=1,\dots ,m\) and set A = Im.
Since σ is a transitive activation function then Corollary 1 applies and therefore the set \( \left \{{\Phi }^N_{I_m,b}(f): f \in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )} \cap U\right \} \) is dense in \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). Therefore (i)-(iv) hold. □
Proof of Corollary 2
Let S(x) = σ ∙ (x + b) and let \(B\triangleq \left \{x \in {{{{\mathbb {R}}}^{m}}}: \sigma (x)>x\right \}\). By hypothesis B is Borel and μ(B) > 0. For each \(i=1,\dots ,m\) we compute σ ∙ (xi + bi) > xi + bi ≥ xi. Therefore, for μ-a.e. every x ∈ B, \(N \in {\mathbb {N}}\) and each \(i=1,\dots ,m\)
Since bi > 0 then \(\lim \limits _{N \to \infty } S^N(x)=\infty \). Therefore, the condition [80, Corollary 1.3 (C2)] is met, and by the discussion following the result on [80, page 127], condition [80, Corollary 1.3 (C1)] holds; i.e.: for every non-empty open subset \(U,V\subseteq L^1_{\mu }(\mathbb {R}^{m},\mathbb {R}^{n})\) there exists some \(N_{U,V}\in {\mathbb {N}}\) such that
By Lemma 1, the map \({\Phi }_{I_m,b}\) and therefore the map \({\Phi }_{I_m,b}^{N_{U,V}}\) is continuous. Thus, \(({\Phi }_{I_m,b}^{N_{U,V}})^{-1}[V]\) is a non-empty open subset of \(L^1_{\mu }(\mathbb {R}^{m},\mathbb {R}^{n})\) and therefore \(U \cap ({\Phi }_{I_m,b}^{N_{\mathcal {X},U}})^{-1}[V]\) is a non-empty open subset of U. Taking \(U=\operatorname {Ball}_{L^1_{\mu }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})}(g,\delta )\) and \(V=\operatorname {Ball}_{L^1_{\mu }(\mathbb {R}^{m},\mathbb {R}^{n})}(f,\epsilon )\) we obtain the conclusion. □
Proof of Corollary 3
By Proposition 1 and the observation in its proof that σ(x) > x we only need to verify that σ is Borel bi-measurable. Indeed, since σ is continuous and injective then by [81, Proposition 2.1], σ− 1 exists and is continuous on the image of σ. Since σ was assumed to be surjective then σ− 1 exists on all of \({\mathbb {R}}\) and is continuous thereon. Hence, σ− 1 and σ are measurable since any continuous function is measurable. □
Proof of Theorem 6
Fix A = Im and \(b\in {{{{\mathbb {R}}}^{m}}}\) with bi > 0 for \(i=1,\dots ,m\). Since \(int({\textup {co}\left (A\right ){{{\mathscr{F}}}}})\) is a non-empty open set then there exists some \(f \in int({\text {co}({{\mathscr{F}}})})\) and some δ > 0 for which
is an open subset of \(int({\textup {co}\left (A\right ){{{\mathscr{F}}}}})\). Since \(\textup {co}\left (A\right ){{{\mathscr{F}}}}\cap \operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})\) is dense in \(\operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})\) then its intersection with any non-empty open subset thereof is also dense; in particular, \(\text {co}({{\mathscr{F}}})\cap \operatorname {Ball}_{L^1_{\mu }(\mathbb {R}^{m})}(f,\delta )\) is dense in \(\operatorname {Ball}_{L^1_{\mu }(\mathbb {R}^{m})}(f,\delta )\). Since σ is L1-transitive then (iii) follows from Corollary 2.
Since \(L^1_{\mu }\) is a metric space then \(\left \{\operatorname {Ball}_{L^1_{\mu }({{{{\mathbb {R}}}^{m}}})}(g,\delta ): g \in L^1_{\mu }({{{{\mathbb {R}}}^{m}}}), \delta >0\right \}\) is a base for the topology thereon. Therefore, Corollary 2 implies that for any two non-empty open subsets \(U,V \in L^1_{\mu }(\mathbb {R}^{m})\) there exists some \(N_{U,V}\in {\mathbb {N}}\) satisfying \({\Phi }^{N_{U,V}}_{I_m,b}(U)\cap V \neq \emptyset \). Hence, \({\Phi }_{I_m,b}\) is topologically transitive on \(L^1_{\mu }(\mathbb {R}^{m})\), in the sense of [74, Definition 1.38]. Moreover, since \({\Phi }_{I_m,b}\) is a continuous linear map then Birkhoff’s transitivity theorem, as formulated in [74, Theorem 2.19], applies and therefore \({\Phi }_{I_m,b}\) is a hypercylic operator on \(L^1_{\mu }(\mathbb {R}^{m})\). Therefore, [74, Proposition 5.8] implies that \(\|{\Phi }_{I_m,b}\|_{op}>1\). Setting \(\kappa \triangleq \|{\Phi }_{I_m,b}\|_{op}\) yields (ii).
It remains to show the approximation bound of described by (i). Fix \(f \in L^1_{\mu }({{{{\mathbb {R}}}^{m}}})\). Since \(L^1_{\mu }({{{{\mathbb {R}}}^{m}}})\) is a Banach space then it has no isolated points and since \({\Phi }_{I_m,b}\) is a hypercylic operator then Birkhoff’s transitivity theorem, as formulated in [74, Theorem 2.19], implies that there exists a dense Gδ-subset \(HC({\Phi }_{I_m,b})\subseteq L^1_{\mu }(\mathbb {R}^{m})\) such that for every \(g \in HC({\Phi }_{I_m,b})\) the set \(\{{\Phi }^N_{I_m,b}(g)\}_{N \in {\mathbb {N}}}\) is dense in \(L^1_{\mu }(\mathbb {R}^{m})\). Therefore, every non-empty open subset of \(L^1_{\mu }(\mathbb {R}^{m})\) contains some element of \(HC({\Phi }_{I_m,b})\). In particular, there is some \(g \in HC({\Phi }_{I_m,b})\cap \operatorname {int}(\text {co}({{\mathscr{F}}}))\) since \(\operatorname {int}(\text {co}({{\mathscr{F}}}))\) is a non-empty open subset of \(L^1_{\mu }(\mathbb {R}^{m})\).
Since \(\textup {co}\left (A\right ){{{\mathscr{F}}}}\cap \operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})\) is dense in \(\operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})\) then, in particular, \(g \in \overline {\operatorname {int}(\textup {co}\left (A\right ){{{\mathscr{F}}}})}\). Therefore, the conditions of [69, Theorem 2] and [69, Equation (23)] are met, hence, for each \(n \in {\mathbb {N}}^+\) the following approximation bound holds
Since \(\{{\Phi }^N_{I_m,b}(g)\}_{N \in {\mathbb {N}}}\) is dense in \(L^1_{\mu }({{{{\mathbb {R}}}^{m}}})\) then there exists some \(N \in {\mathbb {N}}\) for which \({\Phi }^N_{I_m,b}(g) \in \operatorname {Ball}_{L^1_{\mu }({{{{\mathbb {R}}}^{m}}})}\left (f,\frac 1{\sqrt {n}}\right )\). Thus, the following bound holds
Since \({\Phi }_{I_m,b}\) is a continuous linear map from the Banach space \(L^1_{\mu }({{{{\mathbb {R}}}^{m}}})\) to itself then it is Lipschitz with constant \(\|{\Phi }_{I_m,b}\|_{op}\), where ∥⋅∥op denotes the operator norm, and by [64, Corollary 2.1.2] we have
Moreover, by Lemma 1, we know that the right-hand side of (35) is finite. Therefore (34) implies that for every \(f_1,\dots ,f_n \in {{\mathscr{F}}}\), \(\alpha _1,\dots ,\alpha _n\in [0,1]\) with \({\sum }_{i=1}^n \alpha _i=1\), the following holds
Combining the estimates (33)–(36) we obtain
Since \({\Phi }^N_{I_m,b}\) is linear, then the right-hand side of (37) reduces and we obtain the following estimate
Therefore, the estimate in (i) holds. □
For the statement of the next lemma concerns the Banach space of functions vanishing at infinity. Denoted by \(C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\), this is the set of continuous functions f from \(\mathbb {R}^{m}\) to \(\mathbb {R}^{n}\) such that, given any 𝜖 > 0 there exists some compact subset \(K_{\epsilon }\subseteq \mathbb {R}^{m}\) for which \( \sup _{x \in K_{\epsilon }}\|f(x)\|<\epsilon . \) As discussed in [82, VII], \(C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) is made into a Banach space by equipping with the supremum norm \(\|f\|_{\infty }\triangleq \sup _{x \in {{{{\mathbb {R}}}^{m}}}} \|f(x)\|\).
Lemma 4 (Uniform Approximation of Functions Vanishing at Infinity)
Suppose that \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) is a universal approximator on \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\), then for every \(f\in C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) and every 𝜖 > 0 there exists \(g_{\epsilon }\in C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) with representation
the absolute value \(\left |\cdot \right |\) is applied component-wise, \(g_{\epsilon }\in {\mathcal {NN}}^{\left ({{\mathscr{F}}},\circlearrowleft \right )}\), and a,b > 0, and satisfying the uniform approximation bound
Proof of Lemma 4
Let \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) be a universal approximator on \(C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\), let \(f \in C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\), and 𝜖 > 0. Since f vanishes at infinity then there exists some non-empty compact \(K_{\epsilon ,f}\subseteq \mathbb {R}^{m}\) for which ∥f(x)∥≤ 𝜖2− 1 for every x∉K𝜖,f. By the Heine-Borel theorem K𝜖,f is bounded and therefore there exists some b⋆ > 0 such that \(K_{\epsilon ,f}\subseteq \operatorname {Ball}_{\mathbb {R}^{m}}(0,b^{\star })\triangleq \left \{ x \in \mathbb {R}^{m}: \|x\|< b^{\star } \right \}\). Therefore,
Since the bump function \(x\mapsto e^{-1\frac {1}{1-x^2}}I_{|x|<1}\) is continuous, affine functions are continuous, \(f\in C({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\), and the composition and multiplication of continuous functions is again continuous then the function \(x\mapsto \left [f(x)-\epsilon 2^{-1}\right ]e^{\frac {b^{\star }}{b^{\star }-\|x\|^2}}I_{\|x\|<b^{\star }}\) is itself continuous. Observe also that the set \(\overline {\text{Ball}(0,\text{b}^{\star})}= \left \{x \in \mathbb{R}^{m}: \|x\|\leq b^{\star }\right \}\) is closed and bounded, thus it is compact by the Heine-Borel theorem. Since \(\left ({\mathscr{F}},\circlearrowleft \right )\) is a universal approximator on \(C(\mathbb {R}^{m},\mathbb {R}^{n})\) for the topology of uniform convergence on compacts then there exists some \(g_{\epsilon }\in {\mathcal{N}\mathcal{N}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\) satisfying
Since \(0\leq e^{-\frac {b^{\star }}{b^{\star }-\|x\|^2}} \leq 1\) for every \(x \in {{{{\mathbb {R}}}^{m}}}\), then from (41) we compute
Observe that, for every \(x \in {{{{\mathbb {R}}}^{m}}}-\overline {\operatorname {Ball(0,b^{\star })}}\) we have ∥x∥− b⋆ ≥ 0, −|g𝜖(x)|≤ 0 and therefore
Combining (40), (432), and (43) we compute the following bound
Thus, the result holds. □
Proof of Theorem 6
For each ω ∈Ω, define the map \({\Phi }_{\omega }:C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\rightarrow C_{\omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) by \({\Phi }_{\omega }(f)\triangleq \left (\omega (\|\cdot \|)+1\right )f\). For each \(f,g \in C_0(\mathbb {R}^{m},\mathbb {R}^{n})\) we compute
Therefore, for each ω ∈Ω, the map Φω is an isometry. For each ω ∈Ω, define the map \({\Psi }_{\omega }:C_{\omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\rightarrow C_0(\mathbb {R}^{m},{\mathbb {R}})\) by \({\Psi }_{\omega }(\tilde {f})\triangleq \frac 1{\omega (\|\cdot \|)+1} \tilde {f}\). For each \(\tilde {f}\in C_{\omega }(\mathbb {R}^{m},\mathbb {R}^{n})\) and compute
Hence, Ψω is a right-inverse of Φω. Since every isometry is a homeomorphism onto its image and since Φω is surjective isometry then Φω defines a homeomorphism from \(C_0(\mathbb {R}^{m},\mathbb {R}^{n})\) onto \(C_{\omega }(\mathbb {R}^{m},\mathbb {R}^{n})\). In particular, \({\Phi }_{\omega }\left (C_0(\mathbb {R}^{m},\mathbb {R}^{n})\right )=C_{\omega }(\mathbb {R}^{m},\mathbb {R}^{n})\). Therefore,
Hence, condition (5) holds.
Since it was assumed that \(\sup _{x \in {{{{\mathbb {R}}}^{m}}}} \|f(x)\|e^{-\|x\|}<\infty \) holds, then Lemma 4 applies, whence,
is dense in \(C_0({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). Therefore, the conditions for Theorem 2 are met. Hence,
is dense in \(C_{\Omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). By definition, (47) is a subset of \({\mathcal {NN}}^{\left ({{\mathscr{F}}}{\Omega },\circlearrowleft {\Omega }\right )}\) and therefore \({\mathcal {NN}}^{\left ({{\mathscr{F}}}{\Omega },\circlearrowleft {\Omega }\right )}\) is dense in \(C_{\Omega }(\mathbb {R}^{m},\mathbb {R}^{n})\). Hence, \(\left ({{\mathscr{F}}}{\Omega },\circlearrowleft {\Omega }\right )\) is a universal approximator on \(C_{\Omega }(\mathbb {R}^{m},\mathbb {R}^{n})\). □
Proof of Proposition 2
For each \(k,m\in {\mathbb {N}}\) with n ≤ m, we have that \(\exp (-k t)>\exp (-mt)\) for every \(t \in [0,\infty )\). Thus,
and the inclusion is strict if n < m. Moreover, for n ≤ m, the inclusion of each \(i^k_m:C_{\exp (-n \cdot )}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) into \(C_{\exp (-m \cdot )}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) is continuous. Thus, \(\left \{C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n}),i^k_m\right \}_{n \in {\mathbb {N}}}\) is a strict inductive system of Banach spaces. Therefore, by [83, Proposition 4.5.1] there exists a finest topology on \(\bigcup _{k \in {\mathbb {N}}} C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})\) both making it into a locally-convex space and ensuring that each \(C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})\) is a subspace. Denote \(\bigcup _{k \in {\mathbb {N}}} C_{\exp (-k \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})\) equipped with this topology by \(C_{\Omega }^{LCS}(\mathbb {R}^{m},\mathbb {R}^{n})\).
If \(f \in C_{\Omega }^{LCS}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) then by construction there must exist some \(K \in {\mathbb {N}}\) such that \(f \in C_{\exp (-K\cdot )}({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). By [84, Propositions 2 and 4], a sequence \(\{f_t\}_{t \in {\mathbb {N}}}\) converges to some f if and only if there exists some \(K \in {\mathbb {N}}\) and some \(N_K \in {\mathbb {N}}^+\) such that for every t ≥ NK every \(f_t \in C_{\exp (-K\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})\) and the sub-sequence \(\{f_t\}_{t\geq N_K}\) converges in the Banach topology of \(C_{\exp (-K\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})\) to f. In particular, since \(C_{\exp (-0\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})=C_0(\mathbb {R}^{m},\mathbb {R}^{n})\) then the function \(f(x)\triangleq (\exp (-|x|),\dots ,\exp (-|x|)) \in C_{\exp (-0 \cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})\). Since each \(f \in {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\) is either constant of \(\sup _{x \in \mathbb {R}^{m}} \|f(x)\|=\infty \) then for any sequence \(\{f_t\}_{t \in {\mathbb {N}}} \in {\mathcal {NN}}^{\left ({\mathscr{F}},\circlearrowleft \right )}\) there exists some \(N_0 \in {\mathbb {N}}^+\) for which the sub-sequence \(\{f_t\}_{t \geq N_0}\) lies in \(C_{\exp (-0\cdot )}(\mathbb {R}^{m},\mathbb {R}^{n})=C_0(\mathbb {R}^{m},\mathbb {R}^{n})\) if and only if for each t ≥ N0 the map ft is constant. Therefore, for each t ≥ N0 we compute that
Hence, ft cannot converge to f in \(C_{\Omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\) and therefore \(\left ({{\mathscr{F}}},\circlearrowleft \right )\) does not have the UAP on \(C_{\Omega }({{{{\mathbb {R}}}^{m}}},{{{{\mathbb {R}}}^{n}}})\). □
Proof of Corollary 7
Let \(X\triangleq {\mathbb {R}}\) and \(\mathcal {X}_0\triangleq \mathcal {X}\triangleq L^{\infty }({\mathbb {R}})\). Since every Banach space is a pointed metric space with reference-point its zero vector and since \({\mathbb {R}}\) is separable then Theorem 4 applies. We only need to verify the form of η and of ρ. Indeed, the identification of \(B({\mathbb {R}})\) with \(L^1({\mathbb {R}})\) and explicit description of η is constructed in [32, Example 3.11]. The fact that \(L^{\infty }({\mathbb {R}})\) is barycentric follows from the fact that it is a Banach space and by [31, Lemma 2.4]. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kratsios, A. The Universal Approximation Property. Ann Math Artif Intell 89, 435–469 (2021). https://doi.org/10.1007/s10472-020-09723-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-020-09723-1
Keywords
- Universal approximation
- Constrained approximation
- Uniform approximation
- Deep learning
- Topological transitivity
- Composition operators