1 Introduction

The application of deep neural networks (DNNs) as approximation architecture in numerical solution methods of partial differential equations (PDEs), possibly on high-dimensional parameter- and state-spaces, attracted increasing attention in recent years. An incomplete list of recently proposed algorithmic approaches is [11, 45, 46, 52, 54] and references therein. In these works, DNN-based approaches for the numerical approximation of solutions of elliptic and parabolic boundary value problems are proposed. Two key ingredients in these approaches are: (a) use of DNNs as approximation architecture for the numerical approximation of solutions (thus using DNNs in place of, e.g., finite element, finite volume or finite difference methods), and (b) incorporation of a suitable weak form of the PDE of interest into the loss function of the DNN training. For example, weak residuals, least squares or, for variational formulations from continuum mechanics, total potential energies in variational principles [11] have been proposed.

In the study of NNs as numerical methods for solving PDEs, usually three types of errors are identified. After fixing a NN architecture and activation function, the approximation error indicates how well the PDE solution can be approximated by NNs with that architecture. An additional error is incurred when the NN must be trained on only a finite amount of possibly corrupted data about the PDE solution. This contribution to the overall error, in particular there where the given data are uninformative, is the generalization error and is in addition to further errors that are caused by the training algorithm, which can be called optimization error. In this paper, we study the approximation error of deep ReLU neural networks.

One condition for good performance of these computational approaches requires the DNNs to achieve a high rate of approximation uniformly over the solution set associated with the PDE under consideration. This is analogous to what has been found in the mathematical convergence rate analysis of, e.g., finite element methods: convergence rate bounds are well-known to be related, via stability and quasi-optimality, to approximability of solutions sets of PDEs from the finite element spaces under consideration. Since numerical solutions are (generally oblique) projections of the unknown solution onto finite-dimensional subspaces, the convergence rates are naturally determined by approximation rates of the subspace families under consideration within the regularity classes of PDE. For elliptic boundary and eigenvalue problems, function classes of (weighted) Sobolev or Besov type are well known to describe both solution regularity and approximation rates.

For functions belonging to a smoothness space of finite differentiation order such as continuously differentiable, Sobolev-regular, or Besov-regular functions on a bounded domain, upper bounds for algebraic approximation rates by NNs were established for example in [9, 10, 16, 32, 55, 57, 58]. Here, we only mentioned results that use the ReLU activation function. Besides, for PDEs, in particular in high-dimensional domains approximation rates of the solution that go beyond classical smoothness-based results were established in [5, 12, 26, 29, 51]. Again, we confine the list to publications with approximation rates for NNs with the ReLU activation function (referred to as ReLU NNs below).

In the present paper, we prove that exponential approximation rates are achieved by deep ReLU NNs for weighted, analytic solution classes of linear and nonlinear elliptic source and eigenvalue problems on polygonal and polyhedral domains. Mathematical results on weighted analytic regularity [2, 6, 8, 17,18,19,20, 24, 35, 38, 39] imply that these classes consist of functions that are analytic with possible corner, edge, and corner-edge singularities.

In contrast to the previously mentioned approximation results for ReLU NNs, the function class studied here is special in the sense that it admits extremely high regularity in most parts of the domain except for designated locations, i.e., the edges and corners of a domain, where the regularity is assumed to be very low. An approximation scheme to realize the exponential approximation rates associated with analytic regularity, therefore, hinges on a successful resolution of the singularities. We will see that, in addition to emulating local polynomial approximation, the presented scheme is strongly adapted to the potentially complex geometries of the underlying domains.

Our analysis provides, for the aforementioned functions, approximation errors in Sobolev norms that decay exponentially in terms of the number of parameters M of the ReLU NNs.

1.1 Contribution

The principal contribution of this work is threefold:

  1. 1.

    We prove, in Theorem 4.3, a general result on the approximation by ReLU NNs of weighted analytic function classes on \(Q {:}{=}(0,1)^d\), where \(d = 2,3\). The analytic regularity of solutions is quantified via countably normed, analytic classes, based on weighted Sobolev spaces of Kondrat’ev type in Q, which admit corner and, in space dimension \(d=3\), also edge singularities. Such classes were introduced, e.g., in [2, 6, 8, 17,18,19] and in the references there. We prove exponential expression rates by ReLU NNs in the sense that for a number M of free parameters of the NNs, the approximation error is bounded, in the \(H^1\)-norm, by \(C\exp (-bM^{1/(2d+1)})\) for constants \(b,C > 0\).

  2. 2.

    Based on the ReLU NN approximation rate bound of Theorem 4.3, we establish, in Sect. 5, approximation results for solutions of different types of PDEs by NNs with ReLU activation. Concretely, in Sect. 5.1.1, we study the reapproximation of solutions of nonlinear Schrödinger equations with singular potentials in space dimension \(d=2,3\). We prove that for solutions which are contained in weighted, analytic classes in \(\varOmega = {\mathbb {R}}^d / (2{\mathbb {Z}})^d\), ReLU NNs (whose realizations are continuous, piecewise affine) with at most M free parameters yield an approximation with accuracy of the order \(\exp (-bM^{1/(2d+1)})\) for some \(b>0\). Importantly, this convergence is in the \(H^1(\varOmega )\)-norm. In Sect. 5.1.2, we establish the same exponential approximation rates for the eigenstates of the Hartree–Fock model with singular potential in \({\mathbb {R}}^3\). This result constitutes the first, to our knowledge, mathematical underpinning of the recently reported, high efficiency of various NN-based approaches in variational electron structure computations, e.g., [21, 25, 44]. In Sect. 5.2, we demonstrate the same approximation rates also for elliptic boundary value problems with analytic coefficients and analytic right-hand sides, in plane, polygonal domains \(\varOmega \). The approximation error of the NNs is, again, bound in the \(H^1(\varOmega )\)-norm. We also infer an exponential NN expression rate bound for corresponding traces, in \(H^{1/2}(\partial \varOmega )\) and for viscous, incompressible flow. Finally, in Sect. 5.3, we obtain the same asymptotic exponential rates for the approximation of solutions to elliptic boundary value problems, with analytic data, on so-called Fichera-type domains \(\varOmega \subset {{\mathbb {R}}}^3\) (being, roughly speaking, finite unions of tensorized hexahedra). These solutions exhibit corner, edge and corner-edge singularities.

  3. 3.

    The exponential approximation rates of the ReLU NNs established here are based on emulating corresponding variable grid and degree (“hp”) piecewise polynomial approximations. In particular, our construction comprises novel tensor product hp-approximations on Cartesian products of geometric partitions of intervals. In Theorem A.25, we establish novel tensor product hp-approximation results for weighted analytic functions on Q of the form \(\Vert u - u_{{\mathsf {h}}{\mathsf {p}}} \Vert _{H^1(Q)} \le C \exp (-b\root 2d \of {N})\) for \(d=1,2,3\), where N is the number of degrees of freedom in the representation of \(u_{{\mathsf {h}}{\mathsf {p}}}\) and \(C,b>0\) are independent of N (but depend on u). The tensor-product structure of the piecewise polynomial approximations is essential to facilitating the construction of deep ReLU neural networks: our constructive proofs exploit approximate tensorization of ReLU NNs in order to emulate the corresponding piecewise polynomial constructions. The geometric partitions employed in Q and the architectures of the constructed ReLU NNs are of tensor product structure, and generalize to space dimension \(d>3\). Note that hp-approximations based on non-tensor-product, geometric partitions of Q into hexahedra have been studied before, e.g., in [47, 48] in space dimension \(d=3\). There, the rate of \(\Vert u - u_{{\mathsf {h}}{\mathsf {p}}} \Vert _{H^1(Q)} \lesssim \exp (-b\root 5 \of {N})\) was proved. Being based on tensorization, the present construction of exponentially convergent, tensorized hp-approximations in “Appendix” Appendix: A does not invoke the rather involved polynomial trace liftings in [47, 48] and is interesting in its own right: the geometric and mathematical simplification comes at the expense of a slightly smaller (still exponential) rate of approximation. We expect that this construction of \(u_{{\mathsf {h}}{\mathsf {p}}}\) will allow a rather direct derivation of rank bounds for tensor structured function approximation of u in Q, generalizing results in [27, 28] and extending [37] from point to edge and corner-edge singularities.

1.2 Neural Network Approximation of Weighted Analytic Function Classes

Deriving exponential approximation rates for weighted analytic functions on general domains requires the combination of three arguments: First, a novel approximation result of weighted analytic functions on cubes \((0,1)^d\) with corner and/or edge singularities in \(H^1((0,1)^d)\) by tensor product hp-finite elements. Second, a reapproximation scheme for high-dimensional hp-finite elements in \(W^{1,q}\)-norms for \(q \in [1,\infty ]\) by ReLU NNs. Third, a ReLU NN-based approximation scheme on polyhedral domains via a localization method that uses a ReLU NN implementation of a domain-adapted partition of unity.

First, we specifically design tensorized hp-approximations so that they can be emulated by NNs by the reapproximation strategy that we outline below. We then prove exponential convergence of the approximation of weighted analytic functions by the tensorized hp-piecewise polynomials we constructed. Furthermore, in order to estimate the size of the resulting NNs, we need to bound the norms of the coefficients of the hp-projections. Those bounds are usually not a concern when dealing with hp-finite element methods, but they are necessary for our analysis of ReLU NNs. The construction of the hp-projections, the convergence analysis, and the bounds on the coefficients are presented in Theorem 2.1 and developed in “Appendix” Appendix: A.

We describe the NN emulation of hp-finite element interpolants and their lifting to domains in more detail: the emulation of hp-finite element approximation by ReLU NNs is fundamentally based on the approximate multiplication network formalized in [57]. Based on the approximate multiplication operation and an extension thereof to errors measured with respect to \(W^{1,q}\)-norms, for \(q \in [1,\infty ]\), we established already in [41] a reapproximation theorem of univariate splines of order \(p\in {\mathbb {N}}\) on arbitrary meshes with \(N\in {\mathbb {N}}\) cells. There, we observed that there exists a NN that reapproximates a variable-order, free-knot spline u in the \(H^1\)-norm up to an error of \(\epsilon >0\) with a number of free parameters that scales logarithmically in \(\epsilon \) and \(|u|_{H^1}\), linearly in N and quadratically in p. We recall this result in Proposition 3.7 below.

From this, it is apparent by the triangle inequality that, in univariate approximation problems where hp-finite elements yield exponential approximation rates, also ReLU NNs achieve exponential approximation rates (albeit with a possibly smaller exponent, because of the quadratic dependence on p, see [41, Theorem 5.12]).

The extension of this result to higher dimensions for high-order finite elements that are built from univariate finite elements by tensorization is based on the underlying compositionality of NNs. Because of that, it is possible to compose a NN implementing a multiplication of d inputs with d approximations of univariate finite elements. We introduce a formal framework describing these operations in Sect. 3. We can then prove, in Theorem 4.3, approximation rates by ReLU NNs for weighted analytic function classes in cubes.

With Theorem 4.3 established, the next step is to extend the approximation result from cubes to general domains. The ReLU NNs of Theorem 4.3 are continuous functions on \({\mathbb {R}}^d\), and we have little control over the behavior of these functions outside of the cubes. This implies that even on unions of disjoint cubes the approximation results of Theorem 4.3 do not directly transfer by taking sums of the local approximations.

Instead, we first extend Theorem 4.3 to weighted analytic functions defined on Fichera-type domains \((-1,1)^d \setminus (-1,0]^d\) for \(d = 2,3\) by, again, reapproximating a quasi-interpolant on this domain. To extend the results to general polygonal domains for \(d = 2\), we construct an overlapping cover of the domain of affinely-transformed cubes or affinely-transformed Fichera-type domains plus a partition of unity subordinate to this partition (Lemma 5.5). We demonstrate that this partition of unity can be exactly represented by ReLU NNs. The localization by this partition of unity reduces the approximation problem locally to one of the previously described approximations on either an affinely-transformed cube or an affinely-transformed Fichera-type domain. This yields Theorem 5.6 which shows that weighted analytic functions on polygonal domains can be approximated with exponential accuracy with respect to the number of parameters of the underlying neural network.

1.3 Outline

The manuscript is structured as follows: in Sect. 2, in particular Sect. 2.2, we review the weighted function spaces which will be used to describe the weighted analytic function classes in polytopes \(\varOmega \) that underlie our approximation results. In Sect. 2.3, we present an approximation result by tensor-product hp-finite elements for functions from the weighted analytic class. A proof of this result is provided in “Appendix” Appendix: A. In Sect. 3, we review definitions of NNs and a “ReLU calculus” from [12, 43] whose operations will be required in the ensuing NN approximation results.

In Sect. 4, we state and prove the key results of the present paper. In Sect. 5, we illustrate our results by deducing novel NN expression rate bounds for solution classes of several concrete examples of elliptic boundary-value and eigenvalue problems where solutions belong to the weighted analytic function classes introduced in Sect. 2. Some of the more technical proofs of Sect. 5 are deferred to “Appendix” Appendix: B. In Sect. 6, we briefly recapitulate the principal mathematical results of this paper and indicate possible consequences and further directions.

2 Setting and Functional Spaces

We start by recalling some general notation that will be used throughout the paper. We also introduce some tools that are required to describe two- and three-dimensional domains as well as the associated weighted Sobolev spaces.

2.1 Notation

For \(\alpha \in {\mathbb {N}}^d_0\), define \({|\alpha |}{:}{=}\alpha _1+\dots +\alpha _d\) and \({|\alpha |_\infty }{:}{=}\max \{\alpha _1, \dots , \alpha _d\}\). When we indicate a relation on \({|\alpha |}\) or \({|\alpha |_\infty }\) in the subscript of a sum, we mean the sum over all multi-indices that fulfill that relation: e.g., for a \(k\in {\mathbb {N}}_0\)

$$\begin{aligned} \sum _{{|\alpha |}\le k} = \sum _{\alpha \in {\mathbb {N}}^d_0:{|\alpha |}\le k}. \end{aligned}$$

For a domain \(\varOmega \subset {\mathbb {R}}^d\), \(k\in {\mathbb {N}}_0\) and for \(1\le p\le \infty \), we indicate by \(W^{k,p}(\varOmega )\) the classical \(L^p(\varOmega )\)-based Sobolev space of order k. We write \(H^k(\varOmega ) = W^{k,2}(\varOmega )\). We introduce the norms \(\Vert \cdot \Vert _{W_{\mathrm {mix}}^{1,p}(\varOmega )}\) as

$$\begin{aligned} \Vert v\Vert _{W_{\mathrm {mix}}^{1,p}(\varOmega )}^{p} {:}{=}\sum _{{|\alpha |_\infty }\le 1} \Vert \partial ^\alpha v\Vert ^p_{L^p(\varOmega )}, \end{aligned}$$

with associated spaces

$$\begin{aligned} W_{\mathrm {mix}}^{1,p}(\varOmega ) {:}{=}\left\{ v\in L^p(\varOmega ): \Vert v\Vert _{W_{\mathrm {mix}}^{1,p}(\varOmega )} < \infty \right\} . \end{aligned}$$

We denote \(H_{\mathrm {mix}}^1(\varOmega ) = W_{\mathrm {mix}}^{1,2}(\varOmega )\). For \(\varOmega = I_1\times \dots \times I_d\), with bounded intervals \(I_j\subset {\mathbb {R}}\), \(j=1, \dots , d\), \(H_{\mathrm {mix}}^{1}(\varOmega ) = H^1(I_1)\otimes \dots \otimes H^1(I_d)\) with Hilbertian tensor products. Throughout, C will denote a generic positive constant whose value may change at each appearance, even within an equation.

The \(\ell ^p\)-norm, \(1\le p\le \infty \), on \({\mathbb {R}}^n\) is denoted by \(\left\| x \right\| _{p}\). The number of nonzero entries of a vector or matrix x is denoted by \(\Vert x\Vert _0\).

Three-dimensional domain. Let \(\varOmega \subset {\mathbb {R}}^3\) be a bounded, polyhedral domain. Let \({\mathcal {C}}\) denote a set of isolated points, situated either at the corners of \(\varOmega \) or in the interior of \(\varOmega \) (that we refer to as the singular corners in either case, for simplicity), and let \({\mathcal {E}}\) be a subset of the edges of \(\varOmega \) (the singular edges). Furthermore, denote by \({\mathcal {E}}_c\subset {\mathcal {E}}\) the set of singular edges abutting at a corner \(c\). For each \(c\in {\mathcal {C}}\) and each \(e\in {\mathcal {E}}\), we introduce the following weights:

$$\begin{aligned} r_c(x) {:}{=}|x-c| = {{\,\mathrm{dist}\,}}(x, c),\qquad r_e(x) {:}{=}{{\,\mathrm{dist}\,}}(x, e),\qquad \rho _{ce}(x) {:}{=}\frac{r_e(x)}{r_c(x)}\quad \text { for }x \in \varOmega . \end{aligned}$$

For \(\varepsilon >0\), we define edge-, corner-, and corner-edge neighborhoods:

$$\begin{aligned} \varOmega _{e}^\varepsilon&{:}{=}\bigg \{ x\in \varOmega : r_e(x)< \varepsilon \text { and }r_c(x)>\varepsilon , \forall c\in {\mathcal {C}}\bigg \},\\ \varOmega _c^\varepsilon&{:}{=}\bigg \{ x\in \varOmega : r_c(x)< \varepsilon \text { and }\rho _{ce}(x)>\varepsilon , \forall e\in {\mathcal {E}}\bigg \}, \\ \varOmega _{ce}^\varepsilon&{:}{=}\bigg \{ x\in \varOmega : r_c(x)< \varepsilon \text { and }\rho _{ce}(x)<\varepsilon \bigg \}. \end{aligned}$$

We fix a value \({{\hat{\varepsilon }}}>0\) small enough so that \(\varOmega _c^{{{\hat{\varepsilon }}}}\cap \varOmega ^{{{\hat{\varepsilon }}}}_{c'} = \emptyset \) for all \(c\ne c'\in {\mathcal {C}}\) and \(\varOmega ^{{\hat{\varepsilon }}}_{ce} \cap \varOmega ^{{\hat{\varepsilon }}}_{ce'} = \varOmega ^{{\hat{\varepsilon }}}_e \cap \varOmega ^{{\hat{\varepsilon }}}_{e'}=\emptyset \) for all singular edges \(e\ne e'\). In the sequel, we omit the dependence of the neighborhoods on \({{\hat{\varepsilon }}}\). Let also

$$\begin{aligned} \varOmega _{{\mathcal {C}}}{:}{=}\bigcup _{c\in {\mathcal {C}}}\varOmega _c,\qquad \varOmega _{{\mathcal {E}}}{:}{=}\bigcup _{e\in {\mathcal {E}}}\varOmega _e,\qquad \varOmega _{{\mathcal {C}}{\mathcal {E}}} {:}{=}\bigcup _{c\in {\mathcal {C}}}\bigcup _{e\in {\mathcal {E}}_c}\varOmega _{ce}, \end{aligned}$$

and

$$\begin{aligned} \varOmega _0 {:}{=}\varOmega \setminus \overline{(\varOmega _{{\mathcal {C}}}\cup \varOmega _{{\mathcal {E}}}\cup \varOmega _{{\mathcal {C}}{\mathcal {E}}})}. \end{aligned}$$

In each subdomain \(\varOmega _{ce}\) and \(\varOmega _e\), for any multi-index \(\alpha \in {\mathbb {N}}_0^3\), we denote by \({\alpha _\parallel }\) the multi-index whose component in the coordinate direction parallel to e is equal to the component of \(\alpha \) in the same direction, and which is zero in every other component. Moreover, we set \({\alpha _\bot }{:}{=}\alpha -{\alpha _\parallel }\).

Two-dimensional domain. Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon. We adopt the convention that \({\mathcal {E}}{:}{=}\emptyset \). For \(c\in {\mathcal {C}}\), we define

$$\begin{aligned} \varOmega _c^\varepsilon {:}{=}\bigg \{ x\in \varOmega : r_c(x)< \varepsilon \bigg \} \;. \end{aligned}$$

As in the three-dimensional case, we fix a sufficiently small \({{\hat{\varepsilon }}}>0\) so that \(\varOmega ^{{{\hat{\varepsilon }}}}_{c}\cap \varOmega ^{{{\hat{\varepsilon }}}}_{c'}=\emptyset \) for \(c\ne c'\in {\mathcal {C}}\) and write \(\varOmega _c= \varOmega _c^{{\hat{\varepsilon }}}\). Furthermore, \(\varOmega _{{\mathcal {C}}}\) is defined as for \(d=3\), and \(\varOmega _0 {:}{=}\varOmega \setminus \overline{\varOmega _{{\mathcal {C}}}}\).

2.2 Weighted Spaces with Nonhomogeneous Norms

We introduce classes of weighted, analytic functions in space dimension \(d = 3\), as arise in analytic regularity theory for linear, elliptic boundary value problems in polyhedra, in the particular form introduced in [8]. There, the structure of the weights is in terms of Cartesian coordinates which is particularly suited for the presently adopted, tensorized approximation architectures.

The definition of the corresponding classes when \(d=2\) is analogous. For a weight exponent vector \({\underline{\gamma }}= \{\gamma _c, \gamma _e : \, c\in {\mathcal {C}}, e\in {\mathcal {E}}\}\), we introduce the nonhomogeneous, weighted Sobolev norms

$$\begin{aligned} \Vert v\Vert _{{\mathcal {J}}^{k}_{\underline{\gamma }}(\varOmega )}&{:}{=}\sum _{{|\alpha |}\le k}\Vert \partial ^\alpha v\Vert _{L^2(\varOmega _0)} + \sum _{c\in {\mathcal {C}}}\sum _{{|\alpha |}\le k}\Vert r_c^{({|\alpha |}-\gamma _c)_+}\partial ^\alpha v \Vert _{L^2(\varOmega _c)}\\&\quad + \sum _{e\in {\mathcal {E}}}\sum _{{|\alpha |}\le k}\Vert r_e^{({|\alpha _\bot |}-\gamma _e)_+}\partial ^\alpha v \Vert _{L^2(\varOmega _e)}\\&\quad + \sum _{c\in {\mathcal {C}}}\sum _{e\in {\mathcal {E}}_c} \sum _{{|\alpha |}\le k}\Vert r_c^{({|\alpha |}-\gamma _c)_+}\rho _{ce}^{({|\alpha _\bot |}-\gamma _e)_+}\partial ^\alpha v\Vert _{L^2(\varOmega _{ce})} \end{aligned}$$

where \((x)_+ = \max \{0, x\}\). Moreover, we define the associated function space by

$$\begin{aligned} {\mathcal {J}}^k_{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}}) {:}{=}\bigg \{ v\in L^2(\varOmega ): \Vert v\Vert _{{\mathcal {J}}^k_{\underline{\gamma }}(\varOmega )}< \infty \bigg \}. \end{aligned}$$

Furthermore,

$$\begin{aligned} {\mathcal {J}}^\infty _{\underline{\gamma }}(\varOmega ;{\mathcal {C}}, {\mathcal {E}}) = \bigcap _{k\in {\mathbb {N}}_0} {\mathcal {J}}^k_{\underline{\gamma }}(\varOmega ;{\mathcal {C}}, {\mathcal {E}}). \end{aligned}$$

For \(A, C>0\), we define the space of weighted analytic functions with nonhomogeneous norm as

$$\begin{aligned}&{\mathcal {J}}^{\varpi }_{\underline{\gamma }}(\varOmega ;{\mathcal {C}}, {\mathcal {E}};C, A)\nonumber \\&\quad \begin{aligned} {:}{=}\bigg \{v\in {\mathcal {J}}^\infty _{\underline{\gamma }}(\varOmega ;{\mathcal {C}}, {\mathcal {E}}): {}&\sum _{{|\alpha |}=k}\Vert \partial ^\alpha v\Vert _{L^2(\varOmega _0)}\le CA^kk!,\\&\sum _{{|\alpha |}=k}\Vert r_c^{({|\alpha |}-\gamma _c)_+}\partial ^\alpha v \Vert _{L^2(\varOmega _c)}\le CA^kk!\quad \forall c\in {\mathcal {C}},\\&\sum _{{|\alpha |}=k}\Vert r_e^{({|\alpha _\bot |}-\gamma _e)_+}\partial ^\alpha v \Vert _{L^2(\varOmega _e)}\le CA^kk!\quad \forall e\in {\mathcal {E}},\\&\sum _{{|\alpha |}=k}\Vert r_c^{({|\alpha |}-\gamma _c)_+}\rho _{ce}^{({|\alpha _\bot |}-\gamma _e)_+}\partial ^\alpha v\Vert _{L^2(\varOmega _{ce})}\le CA^kk!\quad \\&\forall c\in {\mathcal {C}}\text { and }\forall e\in {\mathcal {E}}_c, \forall k\in {\mathbb {N}}_0 \bigg \}. \end{aligned}\nonumber \\ \end{aligned}$$
(2.1)

Finally, we denote

$$\begin{aligned} {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ;{\mathcal {C}}, {\mathcal {E}}) {:}{=}\bigcup _{C, A>0} {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ;{\mathcal {C}}, {\mathcal {E}}; C, A). \end{aligned}$$

2.3 Approximation of Weighted Analytic Functions on Tensor Product Geometric Meshes

The approximation result of weighted analytic functions via NNs that we present below is based on emulating an approximation strategy of tensor product hp-finite elements. In this section, we present this hp-finite element approximation. Let \(I \subset {\mathbb {R}}\) be an interval. A partition of I into \(N \in {\mathbb {N}}\) intervals is a set \({\mathcal {G}}\) such that \(|{\mathcal {G}}|= N\), all elements of \({\mathcal {G}}\) are disjoint, connected, and open subsets of I, and

$$\begin{aligned} \bigcup _{U \in {\mathcal {G}}} {\overline{U}} = {\overline{I}}. \end{aligned}$$

We denote, for all \(p\in {\mathbb {N}}_0\), by \({\mathbb {Q}}_p({\mathcal {G}})\) the piecewise polynomials of degree p on \({\mathcal {G}}\).

One specific partition of \(I= (0,1)\) is given by the one-dimensional geometrically graded grid, which for \(\sigma \in (0, 1/2]\) and \(\ell \in {\mathbb {N}}\), is given by

$$\begin{aligned}&{\mathcal {G}}^\ell _{1} {:}{=}\left\{ J^\ell _k, \, k=0, \dots , \ell \right\} ,\quad \text {where } J_0^\ell {:}{=}(0, \sigma ^\ell )\quad \text {and} \quad \nonumber \\&J_k^{\ell } {:}{=}(\sigma ^{\ell -k+1}, \sigma ^{\ell -k}), \, k=1, \dots , \ell . \end{aligned}$$
(2.2)

Theorem 2.1

Let \(d \in \{2,3\}\) and \(Q {:}{=}(0,1)^d\). Let \({\mathcal {C}}=\{c\}\) where \(c\) is one of the corners of Q and let \({\mathcal {E}}= {\mathcal {E}}_c\) contain the edges adjacent to c when \(d=3\), \({\mathcal {E}}=\emptyset \) when \(d=2\). Further assume given constants \(C_f, A_f>0\), and

$$\begin{aligned} \begin{array}{lll} {\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}, &{} \text {with } \gamma _c>1,\; \text {for all } c\in {\mathcal {C}}&{} \text { if } d = 2,\\ {\underline{\gamma }}= \{\gamma _c, \gamma _e: c\in {\mathcal {C}}, e\in {\mathcal {E}}\}, &{} \text {with } \gamma _c>3/2\text { and } \gamma _e>1,\; \text {for all }c\in {\mathcal {C}}\text { and }e\in {\mathcal {E}}&{} \text { if } d = 3. \end{array} \end{aligned}$$

Then, there exist \(C_p>0\), \(C_L>0\) such that, for every \(0< \epsilon <1\), there exist \(p, L \in {\mathbb {N}}\) with

$$\begin{aligned} p \le C_p(1+\left| \log (\epsilon )\right| ),\qquad L \le C_L (1+\left| \log (\epsilon )\right| ), \end{aligned}$$

so that there exist piecewise polynomials on \(I = (0,1)\)

$$\begin{aligned} v_{i}\in {\mathbb {Q}}_p({\mathcal {G}}^L_1)\cap H^1(I),\qquad i=1, \dots , N_{\mathrm {1d}}, \end{aligned}$$

with \(N_{\mathrm {1d}}= (L+1)p + 1\), and, for all \(f\in {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}};C_f,A_f)\) there exists a d-dimensional array of coefficients

$$\begin{aligned} c = \left\{ c_{{i_1\dots i_d}}: ({i_1,\dots , i_d}) \in \{1, \dots , N_{\mathrm {1d}}\}^d\right\} \end{aligned}$$

such that

  1. 1.

    For every \(i = 1, \dots N_{\mathrm {1d}}\), \({{\,\mathrm{supp}\,}}(v_{i})\) intersects either a single interval or two neighboring subintervals of \({\mathcal {G}}^L_1\). Furthermore, there exist constants \(C_v\), \(b_v\) depending only on \(C_f\), \(A_f\), \(\sigma \) such that

    $$\begin{aligned} \Vert v_{ i}\Vert _{L^\infty (I)} \le 1, \qquad \Vert v_{i}\Vert _{H^1(I)} \le C_v \epsilon ^{-b_v}, \qquad \forall i=1, \dots , N_{\mathrm {1d}}. \end{aligned}$$
    (2.3)
  2. 2.

    It holds that

    $$\begin{aligned}&\left\| f - \sum _{{i_1,\dots , i_d}= 1}^{N_{\mathrm {1d}}} c_{{i_1\dots i_d}} \phi _{{i_1\dots i_d}}\right\| _{H^1(Q)} \le \epsilon \qquad \text {with}\nonumber \\&\phi _{{i_1\dots i_d}} = \bigotimes _{j=1}^d v_{ i_j} ,\,\forall {i_1,\dots , i_d}=1, \dots , N_{\mathrm {1d}}. \end{aligned}$$
    (2.4)
  3. 3.

    \(\Vert c\Vert _\infty \le C_2 (1+\left| \log (\epsilon ) \right| )^d\) and \(\Vert c\Vert _1 \le C_c (1+\left| \log (\epsilon ) \right| )^{2d}\), for \(C_2, C_c>0\) independent of p, L, \(\epsilon \).

We present the proof in Sect. A.9.3 after developing an appropriate framework of hp-approximation in Sect. Appendix: A.

3 Basic ReLU Neural Network Calculus

In the sequel, we distinguish between a neural network, as a collection of weights, and the associated realization of the NN. This is a function that is determined through the weights and an activation function. In this paper, we only consider the so-called ReLU activation:

$$\begin{aligned} \varrho : {\mathbb {R}}\rightarrow {\mathbb {R}}: x \mapsto \max \{0, x\}. \end{aligned}$$

Definition 3.1

( [43, Definition 2.1]) Let \(d, L\in {\mathbb {N}}\). A neural network \(\varPhi \) with input dimension d and L layers is a sequence of matrix-vector tuples

$$\begin{aligned} \varPhi = \big ((A_1,b_1), (A_2,b_2), \dots , (A_L, b_L)\big ), \end{aligned}$$

where \(N_0 {:}{=}d\) and \(N_1, \dots , N_{L} \in {\mathbb {N}}\), and where \(A_\ell \in {\mathbb {R}}^{N_\ell \times N_{\ell -1}}\) and \(b_\ell \in {\mathbb {R}}^{N_\ell }\) for \(\ell =1,...,L\).

For a NN \(\varPhi \), we define the associated realization of the NN \(\varPhi \) as

$$\begin{aligned} \mathrm {R}(\varPhi ): {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{N_L} : x\mapsto x_L {=}{:}\mathrm {R}(\varPhi )(x), \end{aligned}$$

where the output \(x_L \in {\mathbb {R}}^{N_L}\) results from

$$\begin{aligned} \begin{aligned} x_0&{:}{=}x, \\ x_{\ell }&{:}{=}\varrho (A_{\ell } \, x_{\ell -1} + b_\ell ), \qquad \text { for } \ell = 1, \dots , L-1,\\ x_L&{:}{=}A_{L} \, x_{L-1} + b_{L}. \end{aligned} \end{aligned}$$
(3.1)

Here, \(\varrho \) is understood to act component-wise on vector-valued inputs, i.e., for \(y = (y^1, \dots , y^m) \in {\mathbb {R}}^m\), \(\varrho (y) {:}{=} (\varrho (y^1), \dots , \varrho (y^m))\). We call \(N(\varPhi ) {:}{=}d + \sum _{j = 1}^L N_j\) the number of neurons of the NN \(\varPhi \), \({{\,\mathrm{L}\,}}(\varPhi ){:}{=}L\) the number of layers or depth, \({{\,\mathrm{M}\,}}_j(\varPhi ){:}{=}\Vert A_j\Vert _{0} + \Vert b_j \Vert _{0}\) the number of nonzero weights in the j-th layer, and \({{\,\mathrm{M}\,}}(\varPhi ) {:}{=}\sum _{j=1}^L {{\,\mathrm{M}\,}}_j(\varPhi )\) the number of nonzero weights of \(\varPhi \), also referred to as its size. We refer to \(N_L\) as the dimension of the output layer of \(\varPhi \).

3.1 Concatenation, Parallelization, Emulation of Identity

An essential component in the ensuing proofs is to construct NNs out of simpler building blocks. For instance, given two NNs, we would like to identify another NN so that the realization of it equals the sum or the composition of the first two NNs. To describe these operations precisely, we introduce a formalism of operations on NNs below. The first of these operations is the concatenation.

Proposition 3.2

(NN concatenation, [43, Remark 2.6]) Let \(L_1, L_2 \in {\mathbb {N}}\), and let \(\varPhi ^1, \varPhi ^2\) be two NNs of respective depths \(L_1\) and \(L_2\) such that \(N^1_0 = N^2_{L_2}{=}{:}d\), i.e., the input layer of \(\varPhi ^1\) has the same dimension as the output layer of \(\varPhi ^2\).

Then, there exists a NN \(\varPhi ^1 \odot \varPhi ^2\), called the sparse concatenation of \(\varPhi ^1\) and \(\varPhi ^2\), such that \(\varPhi ^1 \odot \varPhi ^2\) has \(L_1+L_2\) layers, \(\mathrm {R}(\varPhi ^1 \odot \varPhi ^2) = \mathrm {R}(\varPhi ^1) \circ \mathrm {R}(\varPhi ^2)\) and \({{\,\mathrm{M}\,}}\left( \varPhi ^1 \odot \varPhi ^2\right) \le 2{{\,\mathrm{M}\,}}\left( \varPhi ^1\right) + 2{{\,\mathrm{M}\,}}\left( \varPhi ^2\right) \).

The second fundamental operation on NNs is parallelization, achieved with the following construction.

Proposition 3.3

(NN parallelization, [43, Definition 2.7]) Let \(L, d \in {\mathbb {N}}\) and let \(\varPhi ^1, \varPhi ^2\) be two NNs with L layers and with d-dimensional input each. Then, there exists a NN \(\mathrm {P}(\varPhi ^1, \varPhi ^2)\) with d-dimensional input and L layers, which we call the parallelization of \(\varPhi ^1\) and \(\varPhi ^2\), such that

$$\begin{aligned} \mathrm {R}\left( \mathrm {P}\left( \varPhi ^1,\varPhi ^2\right) \right) (x) = \left( \mathrm {R}\left( \varPhi ^1\right) (x), \mathrm {R}\left( \varPhi ^2\right) (x)\right) , \qquad \text { for all } x \in {\mathbb {R}}^d \end{aligned}$$

and \({{\,\mathrm{M}\,}}(\mathrm {P}(\varPhi ^1, \varPhi ^2)) = {{\,\mathrm{M}\,}}(\varPhi ^1) + {{\,\mathrm{M}\,}}(\varPhi ^2)\).

Proposition 3.3 requires two NNs to have the same depth. If two NNs have different depth, then we can artificially enlarge one of them by concatenating with a NN that implements the identity. One possible construction of such a NN is presented next.

Proposition 3.4

(NN emulation of \(\mathrm {Id}\), [43, Remark 2.4]) For every \(d,L\in {\mathbb {N}}\) there exists a NN \(\varPhi ^{\mathrm {Id}}_{d,L}\) with \({{\,\mathrm{L}\,}}(\varPhi ^{\mathrm {Id}}_{d,L}) = L\) and \({{\,\mathrm{M}\,}}(\varPhi ^{\mathrm {Id}}_{d,L}) \le 2 d L\), such that \(\mathrm {R} (\varPhi ^{\mathrm {Id}}_{d,L}) = \mathrm {Id}_{{\mathbb {R}}^d}\).

Finally, we sometimes require a parallelization of NNs that do not share inputs.

Proposition 3.5

(Full parallelization of NNs with distinct inputs, [12, Setting 5.2]) Let \(L \in {\mathbb {N}}\) and let

$$\begin{aligned} \varPhi ^1 = \left( \left( A_1^1,b_1^1\right) , \dots , \left( A_{L}^1,b_{L}^1\right) \right) , \qquad \varPhi ^2 = \left( \left( A_1^2,b_1^2\right) , \dots , \left( A_{L}^2,b_{L}^2\right) \right) \end{aligned}$$

be two NNs with L layers each and with input dimensions \(N^1_0=d_1\) and \(N^2_0=d_2\), respectively.

Then, there exists a NN, denoted by \(\mathrm {FP}(\varPhi ^1, \varPhi ^2)\), with d-dimensional input where \(d = (d_1+d_2)\) and L layers, which we call the full parallelization of \(\varPhi ^1\) and \(\varPhi ^2\), such that for all \(x = (x_1,x_2) \in {\mathbb {R}}^d\) with \(x_i \in {\mathbb {R}}^{d_i}, i = 1,2\)

$$\begin{aligned} \mathrm {R}\left( \mathrm {FP}\left( \varPhi ^1,\varPhi ^2\right) \right) (x_1,x_2) = \left( \mathrm {R}\left( \varPhi ^1\right) (x_1), \mathrm {R}\left( \varPhi ^2\right) (x_2)\right) \end{aligned}$$

and \({{\,\mathrm{M}\,}}(\mathrm {FP}(\varPhi ^1, \varPhi ^2)) = {{\,\mathrm{M}\,}}(\varPhi ^1) + {{\,\mathrm{M}\,}}(\varPhi ^2)\).

Proof

Set \(\mathrm {FP}\left( \varPhi ^1,\varPhi ^2\right) {:}{=}\left( \left( A_1^3, b_1^3\right) , \dots , \left( A_L^3, b_L^3\right) \right) \) where, for \(j = 1, \dots , L\), we define

$$\begin{aligned} A_{j}^3 {:}{=}\left( \begin{array}{cc} A_j^1 &{} 0 \\ 0 &{} A_j^2 \end{array}\right) \qquad \text { and }\qquad b_{j}^3 {:}{=}\left( \begin{array}{c} b_j^1 \\ b_j^2 \end{array}\right) . \end{aligned}$$

All properties of \(\mathrm {FP}\left( \varPhi ^1,\varPhi ^2\right) \) claimed in the statement of the proposition follow immediately from the construction. \(\square \)

3.2 Emulation of Multiplication and Piecewise Polynomials

In addition to the basic operations above, we use two types of functions that we can approximate especially efficiently with NNs. These are high dimensional multiplication functions and univariate piecewise polynomials. We first give the result of an emulation of a multiplication in arbitrary dimension.

Proposition 3.6

([16, Lemma C.5], [42, Proposition 2.6]) There exists a constant \(C>0\) such that, for every \(0<\varepsilon < 1\), \(d \in {\mathbb {N}}\) and \(M \ge 1\) there is a NN \(\varPi _{\epsilon , M}^{d}\) with d-dimensional input- and one-dimensional output, so that

$$\begin{aligned}&\left| \prod _{\ell = 1}^d x_\ell - {{\,\mathrm{R}\,}}(\varPi _{\epsilon , M}^{d})(x)\right| \le \epsilon , \qquad \text { for all } x=(x_1, \dots , x_d) \in [-M,M]^d, \\&\left| \frac{\partial }{\partial x_j} \prod _{\ell = 1}^d x_\ell - \frac{\partial }{\partial x_j} {{\,\mathrm{R}\,}}(\varPi _{\epsilon , M}^{d})(x)\right| \le \epsilon , \qquad {\begin{aligned}\text { for almost every }x =(x_1, \dots , x_d) \in [-M,M]^{d} \\ \text { and all } j = 1, \dots , d,\end{aligned}} \end{aligned}$$

and \({{\,\mathrm{R}\,}}(\varPi _{\epsilon , M}^{d})(x) = 0\) if \(\prod _{\ell =1}^dx_\ell = 0\), for all \(x = (x_1, \dots , x_d)\in {\mathbb {R}}^d\). Additionally, \(\varPi _{\epsilon , M}^{d}\) satisfies

$$\begin{aligned} \max \left\{ {{\,\mathrm{L}\,}}\left( \varPi _{\epsilon , M}^{d}\right) , {{\,\mathrm{M}\,}}\left( \varPi _{\epsilon , M}^{d}\right) \right\} \le C \left( 1+ d \log (d M^d/\epsilon )\right) . \end{aligned}$$

In addition to the high-dimensional multiplication, we can efficiently approximate univariate continuous, piecewise polynomial functions by realizations of NNs with the ReLU activation function.

Proposition 3.7

([41, Proposition 5.1]) There exists a constant \(C>0\) such that, for all \(N_{\mathrm {int}}\in {\mathbb {N}}\) and \({\varvec{p}} = (p_i)_{i\in \{1,\ldots ,{N_{\mathrm {int}}}\}} \subset {\mathbb {N}}\), for all partitions \({\mathcal {T}}\) of \(I=(0,1)\) into \({N_{\mathrm {int}}}\) open, disjoint, connected subintervals \(I_i\), \(i=1,\ldots ,N_{\mathrm {int}}\), for all \(v\in {S_{{\varvec{p}}} (I,{\mathcal {T}})} {:}{=}\{v\in H^1(I): v|_{I_i} \in {\mathbb {P}}_{p_i}(I_i), i=1,\ldots ,N_{\mathrm {int}}\}\), and for every \(0<\varepsilon < 1\), there exist NNs \(\{\varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\}_{\varepsilon \in (0,1)}\) such that for all \(1\le q'\le \infty \) it holds that

$$\begin{aligned} \left\| v-\mathrm {R}\left( \varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\right) \right\| _{W^{1,q'}(I)} \le&\, \varepsilon \left| v \right| _{W^{1,q'}(I)}, \\ {{\,\mathrm{L}\,}}\left( \varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\right) \le&\, C (1+\log (p_{\max })) \left( p_{\max } + \left| \log \varepsilon \right| \right) , \\ {{\,\mathrm{M}\,}}\left( \varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\right) \le&\, C{N_{\mathrm {int}}} (1+\log (p_{\max })) \left( p_{\max } + \left| \log \varepsilon \right| \right) \\&\quad + C \sum _{i=1}^{N_{\mathrm {int}}} p_i\left( p_i + |\log \varepsilon | \right) , \end{aligned}$$

where \(p_{\max } {:}{=}\max \{p_i :i = 1, \dots , N_{\mathrm {int}}\}\). In addition, \(\mathrm {R}\left( \varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon } \right) (x_j)=v(x_j)\) for all \(j\in \{0,\ldots ,{N_{\mathrm {int}}}\}\), where \(\{x_j\}_{j=0}^{N_{\mathrm {int}}}\) are the nodes of \({\mathcal {T}}\).

Remark 3.8

It is not hard to see that the result holds also for \(I = (a,b)\), where \(a,b\in {\mathbb {R}}\), with \(C>0\) depending on \((b-a)\). Indeed, for any \(v \in H^1((a,b))\) the concatenation of v with the invertible, affine map \(T :x \mapsto (x-a)/(b-a)\) is in \(H^1((0,1))\). Applying Proposition 3.7 yields NNs \(\{\varPhi ^{v,{\mathcal {T}},{\varvec{p}}}_{\varepsilon }\}_{\varepsilon \in (0,1)}\) approximating \(v \circ T\) to an appropriate accuracy. Concatenating these networks with the 1-layer NN \((A_1,b_1)\), where \(A_1x + b_1 = T^{-1}x\) yields the result. The explicit dependence of \(C>0\) on \((b-a)\) can be deduced from the error bounds in (0, 1) by affine transformation.

4 Exponential Approximation Rates by Realizations of NNs

We now establish several technical results on the exponentially consistent approximation by realizations of NNs with ReLU activation of univariate and multivariate tensorized polynomials. These results will be used to establish Theorem 4.3, which yields exponential approximation rates of NNs for functions in the weighted, analytic classes introduced in Sect. 2.2. They are of independent interest, as they imply that spectral and pseudospectral methods can, in principle, be emulated by realizations of NNs with ReLU activation.

4.1 NN-based Approximation of Univariate, Piecewise Polynomial Functions

We start with the following corollary to Proposition 3.7. It quantifies stability and consistency of realizations of NNs with ReLU activation for the emulation of the univariate, piecewise polynomial basis functions in Theorem 2.1.

Corollary 4.1

Let \(I=(a,b)\subset {\mathbb {R}}\) be a bounded interval. Fix \(C_p>0\), \(C_v>0\), and \(b_v>0\). Let \(0<\epsilon _{{\mathsf {h}}{\mathsf {p}}} < 1\) and \(p, N_{\mathrm {1d}}, N_{\mathrm {int}}\in {\mathbb {N}}\) be such that \(p \le C_p(1+\left| \log \epsilon _{{\mathsf {h}}{\mathsf {p}}} \right| )\) and let \({\mathcal {G}}_{\mathrm {1d}}\) be a partition of I into \(N_{\mathrm {int}}\) open, disjoint, connected subintervals and, for \(i\in \{1, \dots , N_{\mathrm {1d}}\}\), let \(v_i\in {\mathbb {Q}}_p({\mathcal {G}}_{{\mathrm {1d}}}) \cap H^1(I)\) be such that \({{\,\mathrm{supp}\,}}(v_i)\) intersects either a single interval or two adjacent intervals in \({\mathcal {G}}_{\mathrm {1d}}\) and \( \Vert v_i\Vert _{H^1(I)}\le C_v \epsilon _{{\mathsf {h}}{\mathsf {p}}}^{-b_v}\), for all \(i\in \{1, \dots , N_{\mathrm {1d}}\}\).

Then, for every \(0 < \epsilon _1 \le \epsilon _{{\mathsf {h}}{\mathsf {p}}}\), and for every \(i\in \{1, \dots , N_{\mathrm {1d}}\}\), there exists a NN \(\varPhi ^{v_{i}}_{\epsilon _1}\) such that

$$\begin{aligned} \left\| v_{i}-{{\,\mathrm{R}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right) \right\| _{H^1(I)} \le {}&\epsilon _1 |v_i|_{H^1(I)} , \end{aligned}$$
(4.1)
$$\begin{aligned} {{\,\mathrm{L}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right) \le {}&C_4 (1 + \left| \log (\epsilon _1)\right| )(1 + \log (1+\left| \log (\epsilon _1)\right| )) , \end{aligned}$$
(4.2)
$$\begin{aligned} {{\,\mathrm{M}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right) \le {}&C_5 (1 + \left| \log (\epsilon _1)\right| ^2) , \end{aligned}$$
(4.3)

for constants \(C_4, C_5>0\) depending on \(C_p>0\), \(C_v>0\), \(b_v>0\) and \((b-a)\) only. In addition, \({{\,\mathrm{R}\,}}\left( \varPhi ^{v_i}_{\varepsilon _1} \right) (x_j)=v_i(x_j)\) for all \(i\in \{1,\ldots ,N_{\mathrm {1d}}\}\) and \(j\in \{0,\ldots ,{N_{\mathrm {int}}}\}\), where \(\{x_j\}_{j=0}^{N_{\mathrm {int}}}\) are the nodes of \({\mathcal {G}}_{{\mathrm {1d}}}\).

Proof

Let \(i=1, \dots , N_{\mathrm {1d}}\). For \(v_{i}\) as in the assumption of the corollary, we have that either \({{\,\mathrm{supp}\,}}(v_{i}) = {\overline{J}}\) for a unique \(J\in {\mathcal {G}}_{\mathrm {1d}}\) or \({{\,\mathrm{supp}\,}}(v_{i}) =\overline{J \cup J'}\) for two neighboring intervals \(J, J'\in {\mathcal {G}}_{\mathrm {1d}}\). Hence, there exists a partition \({\mathcal {T}}_{i}\) of I of at most four subintervals so that \(v_{i} \in S_{{\varvec{p}}} (I,{\mathcal {T}}_{i})\), where \({\varvec{p}} = (p_i)_{i\in \{1,\ldots ,4\}}\).

Because of this, an application of Proposition 3.7 with \(q' = 2\) and Remark 3.8 yields that for every \(0<\epsilon _1 \le \epsilon _{{\mathsf {h}}{\mathsf {p}}}< 1\) there exists a NN \(\varPhi ^{v_{i}}_{\epsilon _1} {:}{=} \varPhi ^{v_{i},{\mathcal {T}}_{i},{\varvec{p}}}_{\epsilon _1}\) such that (4.1) holds. In addition, by invoking \(p \lesssim 1+|\log (\epsilon _{\mathsf {hp}})|\le 1+\left| \log (\epsilon _1) \right| \), we observe that

$$\begin{aligned} {{\,\mathrm{L}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right)&\le \, C (1+\log (p)) \left( p + \left| \log \left( {\epsilon _1}\right) \right| \right) \lesssim 1 + \left| \log (\epsilon _1)\right| (1 + \log (1+\left| \log (\epsilon _1)\right| )). \end{aligned}$$

Therefore, there exists \(C_4 >0\) such that (4.2) holds. Furthermore,

$$\begin{aligned} {{\,\mathrm{M}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right) \le&\, 4C(1+\log (p)) \left( p + \left| \log \left( \epsilon _1\right) \right| \right) +C \sum _{i=1}^{4} p(p+\left| \log \left( {\epsilon _1}\right) \right| )\\ \lesssim&\, p^2 + \left| \log \left( {\epsilon _1}\right) \right| p + (1+\log (p)) \left( p + \left| \log \left( {\epsilon _1}\right) \right| \right) . \end{aligned}$$

We use \(p \lesssim 1+\left| \log (\epsilon _1)\right| \) and obtain that there exists \(C_5 >0\) such that (4.3) holds. \(\square \)

4.2 Emulation of Functions with Singularities in Cubic Domains by NNs

Below we state a result describing the efficiency of re-approximating continuous, piecewise tensor product polynomial functions in a cubic domain, as introduced in Theorem 2.1, by realizations of NNs with the ReLU activation function.

Theorem 4.2

Let \(d\in \{2,3\}\), let \(I = (a,b)\subset {\mathbb {R}}\) be a bounded interval, and let \(Q=I^d\). Suppose that there exist constants \(C_p>0\), \(C_{N_{\mathrm {1d}}}>0\), \(C_v>0\), \(C_c>0\), \(b_v>0\), and, for \(0< \epsilon \le 1\), assume there exist \(p, N_{\mathrm {1d}}, N_{\mathrm {int}}\in {\mathbb {N}}\), and \(c\in {\mathbb {R}}^{N_{\mathrm {1d}}\times \dots \times N_{\mathrm {1d}}}\), such that

$$\begin{aligned} N_{\mathrm {1d}}\le C_{N_{\mathrm {1d}}}(1+\left| \log \epsilon \right| ^2),\qquad \Vert c\Vert _{1} \le C_{c}(1+\left| \log \epsilon \right| ^{2d}),\qquad p \le C_p(1+\left| \log \epsilon \right| ). \end{aligned}$$

Further, let \({\mathcal {G}}_{\mathrm {1d}}\) be a partition of I into \(N_{\mathrm {int}}\) open, disjoint, connected subintervals and let, for all \(i\in \{1, \dots , N_{\mathrm {1d}}\}\), \(v_i\in {\mathbb {Q}}_p({\mathcal {G}}_{{\mathrm {1d}}}) \cap H^1(I)\) be such that \({{\,\mathrm{supp}\,}}(v_i)\) intersects either a single interval or two neighboring subintervals of \({\mathcal {G}}_{\mathrm {1d}}\) and

$$\begin{aligned} \Vert v_i\Vert _{H^1(I)}\le C_v \epsilon ^{-b_v}, \qquad \Vert v_i\Vert _{L^\infty (I)}\le 1,\qquad \forall i\in \{1, \dots , N_{\mathrm {1d}}\}. \end{aligned}$$

Then, there exists a NN \(\varPhi _{\epsilon , c}\) such that

$$\begin{aligned} \left\| \sum _{{i_1,\dots , i_d}=1}^{N_{\mathrm {1d}}} c_{{i_1\dots i_d}}\bigotimes _{j=1}^dv_{i_j} - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , c}\right) \right\| _{H^1(Q)} \le \epsilon . \end{aligned}$$
(4.4)

Furthermore, it holds that \( \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , c}\right) \right\| _{L^\infty (Q)} \le (2^d+1)C_c (1 + \left| \log \epsilon \right| ^{2d}), \)

$$\begin{aligned} {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , c}) \le C (1+\left| \log \epsilon \right| ^{2d+1}), \qquad {{\,\mathrm{L}\,}}(\varPhi _{\epsilon , c}) \le C (1+ \left| \log \epsilon \right| \log (\left| \log \epsilon \right| )), \end{aligned}$$

where \(C>0\) depends on \(C_p\), \(C_{N_{\mathrm {1d}}}\), \(C_v\), \(C_c\), \(b_v\), d, and \((b-a)\) only.

Proof

Assume \(I \ne \emptyset \) as otherwise there is nothing to show. Let \(C_I\ge 1\) be such that \(C_I^{-1}\le (b-a) \le C_I\). Let \(c_{v, \mathrm {max}} {:}{=}\max \{\Vert v_i\Vert _{H^1(I)}:i \in \{1, \dots , N_{\mathrm {1d}}\}\} \le C_v \epsilon ^{-b_v}\), let \(\epsilon _1 {:}{=}\min \{\epsilon / (2 \cdot d \cdot (c_{v, \mathrm {max}}+1)^{d} \cdot \Vert c\Vert _1), 1/2, C_I^{-1/2}C_v^{-1}\epsilon ^{b_v}\}\), and let \(\epsilon _2 {:}{=}\min \{\epsilon /(2 \cdot (\sqrt{d}+1) \cdot C_I^{d/2} \cdot (c_{v, \mathrm {max}}+1) \cdot \Vert c\Vert _1), 1/2 \}\).

Construction of the neural network. Invoking Corollary 4.1 we choose, for \(i=1, \dots , N_{\mathrm {1d}}\), NNs \(\varPhi _{\epsilon _1}^{v_{i}}\) so that

$$\begin{aligned} \left\| {{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1}^{v_{i}}) - v_{i} \right\| _{H^1(I)} \le \epsilon _1 |v_i|_{H^1(I)} \le \epsilon _1 c_{v, \mathrm {max}} \le C_v \epsilon _1 \epsilon ^{-b_v} \le 1. \end{aligned}$$

It follows that for all \(i \in \{1, \dots , N_{\mathrm {1d}}\}\)

$$\begin{aligned} \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) \right\| _{H^1(I)} \le&\left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) - v_{i}\right\| _{H^1(I)} + \left\| v_{i}\right\| _{H^1(I)} \le 1+c_{v,\max } \end{aligned}$$
(4.5)

and that, by Sobolev imbedding,

$$\begin{aligned} \begin{aligned} \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) \right\| _{L^\infty (I)}&\le \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) - v_{i}\right\| _{L^\infty (I)} + \left\| v_{i}\right\| _{L^\infty (I)}\\&\le C_I^{1/2} \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) - v_{i}\right\| _{H^1(I)} + 1 \\&\le C_I^{1/2}C_v \epsilon _1 \epsilon ^{-b_v} + 1 \le 2. \end{aligned} \end{aligned}$$
(4.6)

Then, let \(\varPhi _{\mathrm {basis}}\) be the NN defined as

$$\begin{aligned} \varPhi _{\mathrm {basis}}{:}{=}{{\,\mathrm{FP}\,}}\left( {{\,\mathrm{P}\,}}(\varPhi ^{v_{ 1}}_{\epsilon _1}, \dots , \varPhi ^{v_{ N_{\mathrm {1d}}}}_{\epsilon _1}), \dots , {{\,\mathrm{P}\,}}(\varPhi ^{v_{ 1}}_{\epsilon _1}, \dots , \varPhi ^{v_{N_{\mathrm {1d}}}}_{\epsilon _1}) \right) , \end{aligned}$$
(4.7)

where the full parallelization is of d copies of \({{\,\mathrm{P}\,}}(\varPhi ^{v_{ 1}}_{\epsilon _1}, \dots , \varPhi ^{v_{ N_{\mathrm {1d}}}}_{\epsilon _1})\). Note that \(\varPhi _{\mathrm {basis}}\) is a NN with d-dimensional input and \(dN_{\mathrm {1d}}\)-dimensional output. Subsequently, we introduce the \(N_{\mathrm {1d}}^d\) matrices \(E^{(i_1, \dots , i_d)} \in \{0,1\}^{d\times dN_{\mathrm {1d}}}\) such that, for all \((i_1, \dots , i_d)\in \{1, \dots ,N_{\mathrm {1d}}\}^d\),

$$\begin{aligned} E^{(i_1, \dots , i_d)} a = \{a_{(j-1)N_{\mathrm {1d}}+i_j} : j=1, \dots ,d\}, \quad \text {for all }a = (a_1, \dots , a_{dN_{\mathrm {1d}}})\in {\mathbb {R}}^{dN_{\mathrm {1d}}}. \end{aligned}$$

Note that, for all \((i_1, \dots , i_d)\in \{1, \dots , N_{\mathrm {1d}}\}^d\),

$$\begin{aligned} {{\,\mathrm{R}\,}}(((E^{(i_1, \dots , i_d)}, 0))\odot \varPhi _{\mathrm {basis}}) : (x_1, \dots , x_d)\mapsto \left\{ {{\,\mathrm{R}\,}}(\varPhi ^{v_{i_j}}_{\epsilon _1} )(x_j): j=1, \dots , d \right\} . \end{aligned}$$

Then, we set

$$\begin{aligned} \varPhi _{\epsilon } {:}{=}{{\,\mathrm{P}\,}}\left( \varPi _{\epsilon _2, 2}^d \odot ((E^{(i_1, \dots , i_d)}, 0)) :({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d \right) \odot \varPhi _{\mathrm {basis}}, \end{aligned}$$
(4.8)

where \(\varPi _{\epsilon _2, 2}^d\) is according to Proposition 3.6. Note that, by (4.6), the inputs of \(\varPi _{\epsilon _2, 2}^d\) are bounded in absolute value by 2. Finally, we define

$$\begin{aligned} \varPhi _{\epsilon , c} {:}{=}(( {{\,\mathrm{vec}\,}}( c )^\top , 0 )) \odot \varPhi _\epsilon , \end{aligned}$$

where \({{\,\mathrm{vec}\,}}(c) \in {\mathbb {R}}^{N_{\mathrm {1d}}^d}\) is the reshaping defined by, for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\),

$$\begin{aligned} ( {{\,\mathrm{vec}\,}}(c))_{i} = c_{{i_1\dots i_d}}, \qquad \text {with } i = 1+\sum _{j=1}^d (i_j-1) N_{\mathrm {1d}}^{j-1}. \end{aligned}$$
(4.9)

See Fig. 1 for a schematic representation of the NN \(\varPhi _{\epsilon , c}\).

Fig. 1
figure 1

Schematic representation of the neural network \(\varPhi _{\epsilon , c}\), for the case \(d=2\) constructed in the proof of Theorem 4.2. The circles represent subnetworks (i.e., the neural networks \(\varPhi ^{v_{ i}}_{\epsilon _1}\), \(\varPi ^d_{\epsilon _2, 2}\), and \((({{\,\mathrm{vec}\,}}(c)^\top ,0))\)). Along some branches, we indicate \(\varPhi _{i, k}(x_1, x_2) = {{\,\mathrm{R}\,}}\left( \varPi ^2_{\epsilon _2,2}\odot ((E^{(i,k)}, 0)) \odot \varPhi _{\mathrm {basis}}\right) (x_1, x_2)\)

Approximation accuracy. Let us now analyze if \(\varPhi _{\epsilon , c}\) has the asserted approximation accuracy. Define, for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\)

$$\begin{aligned} \phi _{i_1\dots i_d} = \bigotimes _{j = 1}^d v_{ i_j} , \end{aligned}$$

Furthermore, for each \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\), let \(\varPhi _{{i_1\dots i_d}}\) denote the NNs

$$\begin{aligned} \varPhi _{{i_1\dots i_d}} = \varPi _{\epsilon _2, 2}^d \odot ((E^{(i_1, \dots , i_d)}, 0)) \odot \varPhi _{\mathrm {basis}}. \end{aligned}$$

We estimate by the triangle inequality that

$$\begin{aligned} \begin{aligned}&\left\| \sum _{i_1, \dots , i_d=1}^{N_{\mathrm {1d}}}c_{i_1\dots i_d}\phi _{i_1\dots i_d}- {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , c}) \right\| _{H^1(Q)}\\&\quad = \left\| \sum _{i_1, \dots , i_d=1}^{N_{\mathrm {1d}}}c_{i_1\dots i_d}\phi _{i_1\dots i_d}- \sum _{i_1, \dots , i_d=1}^{N_{\mathrm {1d}}}c_{i_1\dots i_d}{{\,\mathrm{R}\,}}(\varPhi _{{i_1\dots i_d}}) \right\| _{H^1(Q)} \\&\quad \le \sum _{i_1, \dots , i_d=1}^{N_{\mathrm {1d}}}|c_{i_1\dots i_d}| \left\| \phi _{i_1\dots i_d}- {{\,\mathrm{R}\,}}(\varPhi _{{i_1\dots i_d}}) \right\| _{H^1(Q)}. \end{aligned} \end{aligned}$$
(4.10)

We have that

$$\begin{aligned}&\left\| \phi _{i_1\dots i_d}- {{\,\mathrm{R}\,}}(\varPhi _{{i_1\dots i_d}}) \right\| _{H^1(Q)}\\&\quad = \left\| \bigotimes _{j = 1}^d v_{ i_j} - {{\,\mathrm{R}\,}}\left( \varPi _{\epsilon _2, 2}^d\right) \circ \left[ {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_1}}\right) , \dots , {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_d}}\right) \right] \right\| _{H^1(Q)} \end{aligned}$$

and, by another application of the triangle inequality, we have that

$$\begin{aligned}&\left\| \phi _{i_1\dots i_d}- {{\,\mathrm{R}\,}}\left( \varPhi _{{i_1\dots i_d}}\right) \right\| _{H^1(Q)} \nonumber \\&\quad \le \left\| \bigotimes _{j = 1}^d v_{ i_j} - \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) \right\| _{H^1(Q)} \nonumber \\&\quad \quad + \left\| \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) -{{\,\mathrm{R}\,}}\left( \varPi _{\epsilon _2, 2}^d\right) \circ \left[ {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_1}}\right) , \dots , {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_d}}\right) \right] \right\| _{H^1(Q)} \nonumber \\&\quad \le \left\| \bigotimes _{j = 1}^d v_{i_j} - \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) \right\| _{H^1(Q)} + (\sqrt{d}+1)\epsilon _2 C_I^{d/2} (c_{v,\max }+1), \end{aligned}$$
(4.11)

where the last estimate follows from Proposition 3.6 and the chain rule:

$$\begin{aligned} \left\| \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) -{{\,\mathrm{R}\,}}\left( \varPi _{\epsilon _2, 2}^d\right) \circ \left[ {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_1}}\right) , \dots , {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_d}}\right) \right] \right\| _{L^2(Q)} \le \epsilon _2 C_I^{d/2} \end{aligned}$$

and

$$\begin{aligned}&\left| \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) -{{\,\mathrm{R}\,}}\left( \varPi _{\epsilon _2, 2}^d\right) \circ \left[ {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_1}}\right) , \dots , {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_d}}\right) \right] \right| _{H^1(Q)}^2 \\&\quad = \sum _{k=1}^d \left\| \frac{\partial }{\partial x_k} \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) - \frac{\partial }{\partial x_k} {{\,\mathrm{R}\,}}\left( \varPi _{\epsilon _2, 2}^d\right) \circ \left[ {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_1}}\right) , \dots , {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_d}}\right) \right] \right\| _{L^2(Q)}^2 \\&\quad = \sum _{k=1}^d \left\| \left( \bigotimes _{\begin{array}{c} j = 1 \\ j\ne k \end{array}}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) - \left( \frac{\partial }{\partial x_k} {{\,\mathrm{R}\,}}\left( \varPi _{\epsilon _2, 2}^d\right) \right) \circ \left[ {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_1}}\right) , \dots , {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_d}}\right) \right] \right) \right. \\&\qquad \left. \times \left( \frac{\partial }{\partial x} {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_k}} \right) \right) \right\| _{L^2(Q)}^2 \\&\quad \le \sum _{k=1}^d \epsilon _2^2 C_I^{d-1} \left\| \frac{\partial }{\partial x} {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_k}} \right) \right\| _{L^2(I)}^2 \le d \epsilon _2^2 C_I^{d-1} (c_{v,\max }+1)^2, \end{aligned}$$

where we used (4.5) and the fact that \({{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1}^{v_{i_k}} )\) depends only on \(x_k\). Integrating over the \(d-1\) other coordinates in Q gives the factor \(C_I^{d-1}\). We now use (4.6) to bound the first term in (4.11): for \(d = 3\), we have that, for all \(({i_1,\dots , i_d}) \in \{1, \dots , N_{\mathrm {1d}}\}^d\),

$$\begin{aligned}&\left\| \bigotimes _{j = 1}^d v_{ i_j} - \bigotimes _{j = 1}^d {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) \right\| _{H^1(Q)}\\&\quad \le \left\| (v_{ i_1} - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1}^{v_{i_1}}))\otimes \bigotimes _{j = 2}^d v_{i_j}\right\| _{H^1(Q)}\\&\qquad + \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_j}}\right) \otimes \left( v_{ i_2} - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i_2}}\right) \right) \otimes v_{ i_d}\right\| _{H^1(Q)}\\&\qquad + \left\| \bigotimes _{j = 1}^{d-1}{{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1}^{v_{i_j}}) \otimes (v_{ i_d} - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1}^{v_{i_d}}))\right\| _{H^1(Q)} {=}{:}\mathrm {(I)}. \end{aligned}$$

For \(d = 2\), we end up with a similar estimate with only two terms. By the tensor product structure, it is clear that \(\mathrm {(I)} \le d \epsilon _1 (c_{v, \mathrm {max}}+1)^{d}\). We have from (4.10) and the considerations above that

$$\begin{aligned}&\left\| \sum _{i_1, \dots , i_d=1}^{N_{\mathrm {1d}}}c_{i_1\dots i_d}\phi _{i_1\dots i_d}- {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , c}\right) \right\| _{H^1(Q)}\\&\quad \le \Vert c\Vert _1 \left( d \epsilon _1 (c_{v, \mathrm {max}}+1)^{d} + (\sqrt{d}+1) \epsilon _2 C_I^{d/2} (c_{v,\max }+1) \right) \le \epsilon . \end{aligned}$$

This yields (4.4).

Bound on the \(L^\infty \)-norm of the neural network. As we have already shown, \(\left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon _1}^{v_{i}}\right) \right\| _{L^\infty (I)} \le 2\). Therefore, by Proposition 3.6, \( \left\| {{\,\mathrm{R}\,}}\left( \varPhi _\epsilon \right) \right\| _{L^\infty (Q)} \le 2^d + \epsilon _2\). It follows that

$$\begin{aligned} \left\| {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , c}\right) \right\| _{L^\infty (Q)} \le \Vert c\Vert _1\left( 2^d + \epsilon _2 \right) \le (2^d+1)C_c (1 + \left| \log \epsilon \right| ^{2d}). \end{aligned}$$

Size of the neural network. Bounds on the size and depth of \(\varPhi _{\epsilon , c}\) follow from Proposition 3.6 and Corollary 4.1. Specifically, we start by remarking that there exists a constant \(C_1 > 0\) depending on \(C_v\), \(b_v\), \(C_c\), \(C_I\) and d only, such that \(\left| \log (\epsilon _1)\right| \le C_1 (1+\left| \log \epsilon \right| )\) and \(\left| \log (\epsilon _2)\right| \le C_1 (1+\left| \log \epsilon \right| )\). Then, by Corollary 4.1, there exist constants \(C_{4}\), \(C_{5}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a),\) and d only such that for all \(i=1, \dots , N_{\mathrm {1d}}\),

$$\begin{aligned}&{{\,\mathrm{L}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right) \le {} C_{4} (1 + \left| \log \epsilon \right| )(1 + \log (1+\left| \log \epsilon \right| )) \quad \text { and }\\&{{\,\mathrm{M}\,}}\left( \varPhi ^{v_{i}}_{\epsilon _1}\right) \le {} C_{5} (1 + \left| \log \epsilon \right| ^2). \end{aligned}$$

Hence, by Propositions 3.5 and 3.3, there exist \(C_{6}\), \(C_{7}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a),\) and d only such that

$$\begin{aligned}&{{\,\mathrm{L}\,}}(\varPhi _{\mathrm {basis}})\le C_{6} (1 + \left| \log \epsilon \right| )(1 + \log (1+\left| \log \epsilon \right| )) \quad \text { and } \\&{{\,\mathrm{M}\,}}(\varPhi _{\mathrm {basis}})\le C_{7} d N_{\mathrm {1d}}(1 + \left| \log \epsilon \right| ^2). \end{aligned}$$

Then, remarking that for all \(({i_1,\dots , i_d})\in \{1, \dots , N_{\mathrm {1d}}\}^d\) it holds that \(\Vert E^{({i_1,\dots , i_d})}\Vert _0 =d\) and, by Propositions 3.2, 3.6, and 3.3, we have

$$\begin{aligned}&{{\,\mathrm{L}\,}}(\varPhi _{\epsilon })\le C_{8} (1 + \left| \log \epsilon \right| )(1 + \log (1+\left| \log \epsilon \right| )) , \\&{{\,\mathrm{M}\,}}(\varPhi _{\epsilon })\le C_{9}\left( N_{\mathrm {1d}}^d(1+\left| \log \epsilon \right| ) + {{\,\mathrm{M}\,}}(\varPhi _{\mathrm {basis}})\right) . \end{aligned}$$

For \(C_{8}, C_{9}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a)\) and d only. Finally, we conclude that there exists a constant \(C_{10}>0\) depending on \(C_p, C_v, b_v, C_c, (b-a)\) and d only, such that

$$\begin{aligned} {{\,\mathrm{L}\,}}( \varPhi _{\epsilon , c} ) \le C_{10}(1 + \left| \log \epsilon \right| )(1 + \log (1+\left| \log \epsilon \right| )). \end{aligned}$$

Using also the fact that \(N_{\mathrm {1d}}\le C_{N_{\mathrm {1d}}} (1+\left| \log \epsilon \right| ^2)\) and since \(d\ge 2\),

$$\begin{aligned} {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , c})\le C_{11} (1+\left| \log \epsilon \right| )^{2d+1}, \end{aligned}$$

for a constant \(C_{11}>0\) depending on \(C_p, C_{N_{\mathrm {1d}}}, C_v, b_v, C_c, (b-a)\) and d only. \(\square \)

Next, we state our main approximation result, which describes the approximation of singular functions in \((0,1)^d\) by realizations of NNs.

Theorem 4.3

Let \(d \in \{2,3\}\) and \(Q {:}{=}(0,1)^d\). Let \({\mathcal {C}}=\{c\}\) where \(c\) is one of the corners of Q and let \({\mathcal {E}}= {\mathcal {E}}_c\) contain the edges adjacent to c when \(d=3\), \({\mathcal {E}}=\emptyset \) when \(d=2\). Assume furthermore that \(C_f, A_f>0\), and

$$\begin{aligned} \begin{array}{lll} {\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}, &{} \text {with } \gamma _c>1,\; \text {for all } c\in {\mathcal {C}}&{}\text { if } d = 2,\\ {\underline{\gamma }}= \{\gamma _c, \gamma _e: c\in {\mathcal {C}}, e\in {\mathcal {E}}\}, &{}\text {with } \gamma _c>3/2\text { and } \gamma _e>1,\; \text {for all }c\in {\mathcal {C}}\text { and }e\in {\mathcal {E}}&{} \text { if } d = 3. \end{array} \end{aligned}$$

Then, for every \(f\in {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}};C_f,A_f)\) and every \(0< \epsilon <1\), there exists a NN \(\varPhi _{\epsilon , f}\) so that

$$\begin{aligned} \left\| f - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , f}\right) \right\| _{H^1(Q)} \le \epsilon . \end{aligned}$$
(4.12)

In addition, \(\Vert {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , f}\right) \Vert _{L^\infty (Q)} = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d})\) for \(\epsilon \rightarrow 0\). Also, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d+1})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| \log (\left| \log \epsilon \right| ))\), for \(\epsilon \rightarrow 0\).

Proof

Denote \(I{:}{=}(0,1)\) and let \(f\in {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}}; C_f,A_f)\) and \(0<\epsilon <1\). Then, by Theorem 2.1 (applied with \(\epsilon /2\) instead of \(\epsilon \)) there exists \(N_{\mathrm {1d}}\in {\mathbb {N}}\) so that \(N_{\mathrm {1d}}= {\mathcal {O}}((1+\left| \log \epsilon \right| )^{2})\), \(c \in {\mathbb {R}}^{N_{\mathrm {1d}}\times \dots \times N_{\mathrm {1d}}}\) with \(\Vert c\Vert _1 \le C_c (1+\left| \log \epsilon \right| ^{2d})\), and, for all \((i_1, \dots , i_d)\in \{1, \dots , N_{\mathrm {1d}}\}^d\),

$$\begin{aligned} \phi _{i_1\dots i_d} = \bigotimes _{j = 1}^d v_{ i_j} , \end{aligned}$$

such that the hypotheses of Theorem 4.2 are met, and

$$\begin{aligned} \left\| f - \sum _{i_1, \dots i_d=1}^{N_{\mathrm {1d}}} c_{i_1\dots i_d} \phi _{i_1\dots i_d} \right\| _{H^1(Q)} \le \frac{\epsilon }{2}. \end{aligned}$$

We have, by Theorem 2.1 and the triangle inequality, that for \(\varPhi _{\epsilon , f} {:}{=} \varPhi _{\epsilon /2, c}\)

$$\begin{aligned} \left\| f - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , f}) \right\| _{H^1(Q)} \le \frac{\epsilon }{2} + \left\| \sum _{i_1, \dots , i_d=1}^{N_{\mathrm {1d}}}c_{i_1\dots i_d}\phi _{i_1\dots i_d}- {{\,\mathrm{R}\,}}(\varPhi _{\epsilon /2, c}) \right\| _{H^1(Q)}. \end{aligned}$$

Then, the application of Theorem 4.2 (with \(\epsilon /2\) instead of \(\epsilon \)) concludes the proof of (4.12). Finally, the bounds on \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , f}) = {{\,\mathrm{L}\,}}(\varPhi _{\epsilon /2, c})\), \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , f}) = {{\,\mathrm{M}\,}}(\varPhi _{\epsilon /2, c})\), and on \(\Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , f})\Vert _{L^\infty (Q)} = \Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon /2, c})\Vert _{L^\infty (Q)} \) follow from the corresponding estimates of Theorem 4.2. \(\square \)

Theorem 4.3 admits a straightforward generalization to functions with multivariate output, so that each coordinate is a weighted analytic function with the same regularity. Here, we denote for a NN \(\varPhi \) with N-dimensional output, \(N\in {\mathbb {N}}\), by \({{\,\mathrm{R}\,}}(\varPhi )_n\) the n-th component of the output (where \(n\in \{1, \dots , N\}\)).

Corollary 4.4

Let \(d \in \{2,3\}\) and \(Q {:}{=}(0,1)^d\). Let \({\mathcal {C}}=\{c\}\) where \(c\) is one of the corners of Q and let \({\mathcal {E}}= {\mathcal {E}}_c\) contain the edges adjacent to \(c\) when \(d=3\); \({\mathcal {E}}=\emptyset \) when \(d=2\). Let \(N_f\in {\mathbb {N}}\). Further assume that \(C_f, A_f>0\), and

$$\begin{aligned} \begin{array}{lll} {\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}, &{}\text {with } \gamma _c>1,\; \text {for all } c\in {\mathcal {C}}&{}\text { if } d = 2,\\ {\underline{\gamma }}= \{\gamma _c, \gamma _e: c\in {\mathcal {C}}, e\in {\mathcal {E}}\}, &{}\text {with } \gamma _c>3/2\text { and } \gamma _e>1,\; \text {for all }c\in {\mathcal {C}}\text { and }e\in {\mathcal {E}}\quad &{}\text { if } d = 3. \end{array} \end{aligned}$$

Then, for all \({\varvec{f}}= (f_1, \dots , f_{N_f}) \in \left[ {\mathcal {J}}^{\varpi }_{\underline{\gamma }}(Q;{\mathcal {C}}, {\mathcal {E}};C_f,A_f) \right] ^{N_f}\) and every \(0< \epsilon <1\), there exists a NN \(\varPhi _{\epsilon , {\varvec{f}}}\) with d-dimensional input and \(N_f\)-dimensional output such that, for all \( n=1, \dots , N_f\),

$$\begin{aligned} \left\| f_n - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , {\varvec{f}}}\right) _n \right\| _{H^1(Q)} \le \epsilon . \end{aligned}$$
(4.13)

In addition, \(\Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , {\varvec{f}}})_n\Vert _{L^\infty (Q)} = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d})\) for every \(n = \{1, \dots , N_f\}\), \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| ^{2d+1} + N_f\left| \log \epsilon \right| ^{2d})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , f}) = {\mathcal {O}}(\left| \log \epsilon \right| \log (\left| \log \epsilon \right| ))\), for \(\epsilon \rightarrow 0\).

Proof

Let \(\varPhi _\epsilon \) be as in (4.8) and let \(c^{(n)} \in {\mathbb {R}}^{N_{\mathrm {1d}}\times \cdots \times N_{\mathrm {1d}}}\), \(n=1, \dots , N_f\) be the matrices of coefficients such that, in the notation of the proof of Theorems 4.2 and 4.3, for all \(n\in \{1, \dots , N_f\}\),

$$\begin{aligned} \left\| f_n - \sum _{i_1, \dots i_d=1}^{N_{\mathrm {1d}}} c^{(n)}_{i_1\dots i_d} \phi _{i_1\dots i_d} \right\| _{H^1(Q)} \le \frac{\epsilon }{2}. \end{aligned}$$

We define, for \({{\,\mathrm{vec}\,}}\) as defined in (4.9), the NN \(\varPhi _{\epsilon , {\varvec{f}}}\) as

$$\begin{aligned} \varPhi _{\epsilon , {\varvec{f}}} {:}{=}{{\,\mathrm{P}\,}}\left( (({{\,\mathrm{vec}\,}}(c^{(1)} )^\top , 0)), \dots , (({{\,\mathrm{vec}\,}}(c^{(N_f)} )^\top ,0))\right) \odot \varPhi _\epsilon . \end{aligned}$$

The estimate (4.13) and the \(L^\infty \)-bound then follow from Theorem 4.2. The bound on \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , {\varvec{f}}})\) follows directly from Theorem 4.2 and Proposition 3.2. Finally, the bound on \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , {\varvec{f}}})\) follows by Theorem 4.2 and Proposition 3.2, as well as, from the observation that

$$\begin{aligned}&{{\,\mathrm{M}\,}}\left( {{\,\mathrm{P}\,}}\left( (({{\,\mathrm{vec}\,}}(c^{(1)} )^\top ,0)), \dots , (({{\,\mathrm{vec}\,}}(c^{(N_f)} )^\top , 0 ))\right) \right) \\&\quad \le N_f N_{\mathrm {1d}}^{d}\le C N_f (1+\left| \log \epsilon \right| ^{2d}), \end{aligned}$$

for a constant \(C>0\) independent of \(N_f\) and \(\epsilon \). \(\square \)

5 Exponential Expression Rates for Weighted Analytic Solution Classes of PDEs

In this section, we develop Theorem 4.3 into several exponentially decreasing upper bounds for the rates of approximation, by realizations of NNs with ReLU activation, for solution classes to elliptic PDEs with singular data (such as singular coefficients or domains with nonsmooth boundary). In particular, we consider elliptic PDEs in two-dimensional general polygonal domains, in three-dimensional domains that are a union of cubes, and elliptic eigenvalue problems with isolated point singularities in the potential which arise in models of electron structure in quantum mechanics.

In each class of examples, the solution sets belong to the class of weighted analytic functions introduced in Sect. 2.2. However, the approximation rates established in Sect. 4 only hold on tensor product domains with singularities on the boundary. Therefore, we will first extend the exponential NN approximation rates to functions which exhibit singularities on a set of isolated points internal to the domain, arising from singular potentials of nonlinear Schrödinger operators. In Sect. 5.2, we demonstrate, using an argument based on a partition of unity, that the approximation problem on general polygonal domains can be reduced to that on tensor product domains and Fichera-type domains, and establish exponential NN expression rates for linear elliptic source and eigenvalue problems. In Sect. 5.3, we show exponential NN expression rates for classes of weighted analytic functions on two- and three-dimensional Fichera-type domains.

5.1 Nonlinear Eigenvalue Problems with Isolated Point Singularities

Point singularities emerge in the solutions of elliptic eigenvalue problems, as arise, for example, for electrostatic interactions between charged particles that are modeled mathematically as point sources in \({\mathbb {R}}^3\). Other problems that exhibit point singularities appear in general relativity, and for electron structure models in quantum mechanics. We concentrate here on the expression rate of “ab initio” NN approximation of the electron density near isolated singularities of the nuclear potential. Via a ReLU-based partition of unity argument, an exponential approximation rate bound for a single, isolated point singularity in Theorem 5.1 is extended in Corollary 5.4 to electron densities corresponding to potentials with multiple point singularities at a priori known locations, modeling (static) molecules.

The numerical approximation in ab initio electron structure computations with NNs has been recently reported to be competitive with other established, methodologies (e.g., [25, 44] and the references there). The exponential ReLU expression rate bounds obtained here can, in part, underpin competitive performances of NNs in (static) electron structure computations.

We recall that all NNs are realized with the ReLU activation function, see (3.1).

5.1.1 Nonlinear Schrödinger Equations

Let \(\varOmega = {\mathbb {R}}^d/(2{\mathbb {Z}})^d\), where \(d \in \{2,3\}\), be a flat torus and let \(V:\varOmega \rightarrow {\mathbb {R}}\) be a potential such that \(V(x)\ge V_0>0\) for all \(x\in \varOmega \) and there exists \(\delta >0\) and \(A_V>0\) such that

$$\begin{aligned} \Vert r^{2+{|\alpha |}-\delta } \partial ^\alpha V\Vert _{L^\infty (\varOmega )} \le A_V^{{|\alpha |}+1}{|\alpha |}!,\qquad \forall \alpha \in {\mathbb {N}}_0^d, \end{aligned}$$
(5.1)

where \(r(x) = {{\,\mathrm{dist}\,}}(x, (0, \dots , 0))\). For \(k \in \{0, 1, 2\}\), we introduce the Schrödinger eigenproblem that consists in finding the smallest eigenvalue \(\uplambda \in {\mathbb {R}}\) and an associated eigenfunction \(u \in H^1(\varOmega )\) such that

$$\begin{aligned} \begin{aligned} (-\varDelta +V +|u|^k) u = \uplambda u \quad \text {in }\varOmega , \quad \Vert u\Vert _{L^2(\varOmega )} = 1. \end{aligned} \end{aligned}$$
(5.2)

The following approximation result holds.

Theorem 5.1

Let \(k \in \{0,1,2\}\) and \((\uplambda , u)\in {\mathbb {R}}\times H^1(\varOmega )\backslash \{ 0 \}\) be a solution of the eigenvalue problem (5.2) with minimal \(\uplambda \), where V satisfies (5.1).

Then, for every \(0< \epsilon \le 1\) there exists a NN \(\varPhi _{\epsilon , u}\) such that

$$\begin{aligned} \left\| u - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \right\| _{H^1(Q)} \le \epsilon . \end{aligned}$$
(5.3)

In addition, as \(\epsilon \rightarrow 0\),

$$\begin{aligned} {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{2d+1}), \qquad {{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|)) \;. \end{aligned}$$

Proof

Let \({\mathcal {C}}= \{(0, \dots , 0)\}\) and \({\mathcal {E}}=\emptyset \). The regularity of u is a consequence of [34, Theorem 2] (see also [35, Corollary 3.2] for the linear case \(k=0\)): there exists \(\gamma _c> d/2\) and \(C_u, A_u>0\) such that \(u\in {\mathcal {J}}^\varpi _{\gamma _c}(\varOmega ; {\mathcal {C}}, {\mathcal {E}}; C_u, A_u)\). Here, \(\gamma _c\) and the constants \(C_u\) and \(A_u\) depend only on \(V_0\), \(A_V\) and \(\delta \) in (5.1), and on k in (5.2).

Then, for all \(0 < \epsilon \le 1\), by Theorems 4.2 and A.25, there exists a NN \(\varPhi _{\epsilon , u}\) such that (5.3) holds. Furthermore, there exist constants \(C_1\), \(C_2 > 0\) dependent only on \(V_0\), \(A_V\), \(\delta \), and k, such that

$$\begin{aligned}&{{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) \le C_1(1 + |\log (\epsilon )|^{2d+1}) \qquad \text { and } \\&{{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u}) \le C_2 \big (1+|\log (\epsilon )|\big )\big (1+\log (1+|\log (\epsilon )|)\big ). \end{aligned}$$

\(\square \)

5.1.2 Hartree–Fock Model

The Hartree–Fock model is an approximation of the full many-body representation of a quantum system under the Born–Oppenheimer approximation, where the many-body wave function is replaced by a sum of Slater determinants. Under this hypothesis, for \(M, N \in {\mathbb {N}}\), the Hartree–Fock energy of a system with N electrons and M nuclei with positive charges \(Z_i\) at isolated locations \(R_i\in {\mathbb {R}}^3\), reads

$$\begin{aligned} E_{\mathrm {HF}}= & {} \inf \bigg \{ \sum _{i=1}^N\int _{{\mathbb {R}}^3}\left( |\nabla \varphi _i|^2 + V|\varphi _i|^2 \right) \nonumber \\&\quad +\frac{1}{2} \int _{{\mathbb {R}}^3}\int _{{\mathbb {R}}^3} \frac{\rho (x)\rho (y)}{\left\| x-y \right\| _{2}}dxdy -\frac{1}{2} \int _{{\mathbb {R}}^3}\int _{{\mathbb {R}}^3} \frac{\tau (x,y)^2}{\left\| x-y \right\| _{2}}dxdy : \nonumber \\&\quad (\varphi _1, \dots , \varphi _N)\in H^1({\mathbb {R}}^3)^N\text { such that }\int _{{\mathbb {R}}^3} \varphi _i\varphi _j = \delta _{ij} \bigg \}, \end{aligned}$$
(5.4)

where \(\delta _{ij}\) is the Kronecker delta, \(V(x) = -\sum _{i=1}^{M} Z_i/\left\| x-R_i \right\| _{2}\), \(\tau (x, y) = \sum _{i=1}^N\varphi _i(x)\varphi _i(y)\), and \(\rho (x) = \tau (x,x)\), see, e.g., [30, 31]. The Euler–Lagrange equations of (5.4) read

$$\begin{aligned}&(-\varDelta +V(x))\varphi _i(x) + \int _{{\mathbb {R}}^3}\frac{\rho (y)}{\left\| x-y \right\| _{2}}dy\varphi _i(x) - \int _{{\mathbb {R}}^3} \frac{\tau (x, y)}{\left\| x-y \right\| _{2}}\varphi _i(y) dy= \uplambda _i \varphi _i(x), \nonumber \\&\quad i=1, \dots , N, \, \text {and } x\in {\mathbb {R}}^3 \end{aligned}$$
(5.5)

with \(\int _{{\mathbb {R}}^3}\varphi _i\varphi _j=\delta _{ij}\).

Remark 5.2

It has been shown in [30] that, if \(\sum _{k=1}^MZ_k>N-1\), there exists a ground state \(\varphi _1,\dots , \varphi _N \) of (5.4), solution to (5.5).

The following statement gives exponential expression rate bounds of the NN-based approximation of electronic wave functions in the vicinity of one singularity (corresponding to the location of a nucleus) of the potential.

Theorem 5.3

Assume that (5.5) has N real eigenvalues \(\uplambda _1, \dots , \uplambda _N\) with associated eigenfunctions \(\varphi _1, \dots , \varphi _N\), such that \(\int _{{\mathbb {R}}^3}\varphi _i\varphi _j = \delta _{ij}\). Fix \(k\in \{1, \dots , M\}\), let \(R_k\) be one of the singularities of V and let \(a>0\) such that \(\left\| R_j-R_k \right\| _{\infty }>2a\) for all \(j\in \{1, \dots , M\}\setminus \{k\}\). Let \(\varOmega _k\) be the cube \(\varOmega _k = \left\{ x\in {\mathbb {R}}^3:\Vert x - R_k\Vert _{\infty }\le a \right\} \).

Then, there exists a NN \(\varPhi _{\epsilon , \varphi }\) such that \({{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi }) : {\mathbb {R}}^3\rightarrow {\mathbb {R}}^N\), satisfies

$$\begin{aligned} \left\| \varphi _i - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi })_i \right\| _{H^1(\varOmega _k)} \le \epsilon ,\qquad \forall i\in \{1, \dots , N\}. \end{aligned}$$
(5.6)

In addition, as \(\epsilon \rightarrow 0\), \(\Vert {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi })_i\Vert _{L^\infty (\varOmega _k)} = {\mathcal {O}}(\left| \log \epsilon \right| ^{6})\) for every \(i = \{1, \dots , N\}\),

$$\begin{aligned} {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , \varphi }) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{7} + N\left| \log (\epsilon )\right| ^{6}), \qquad {{\,\mathrm{L}\,}}(\varPhi _{\epsilon , \varphi }) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| )). \end{aligned}$$

Proof

Let \({\mathcal {C}}= \{(0, 0,0)\}\) and \({\mathcal {E}}= \emptyset \) and fix \(k\in \{1 ,\dots , M\}\). From the regularity result in [36, Corollary 1], see also [13, 14], there exist \(C_\varphi \), \(A_\varphi \), and \(\gamma _c>3/2\) such that \((\varphi _1, \dots , \varphi _N) \in \left[ {\mathcal {J}}^\varpi _{\gamma _c}(\varOmega _k; {\mathcal {C}}, {\mathcal {E}}; C_\varphi , A_\varphi )\right] ^N\). Then, (5.6), the \(L^\infty \) bound and the depth and size bounds on the NN \(\varPhi _{\epsilon , \varphi }\) follow from the hp approximation result in Theorem A.25 (centered in \(R_k\) by translation), from Theorem 4.2, as in Corollary 4.4. \(\square \)

The arguments in the preceding subsections applied to wave functions for a single, isolated nucleus with interaction modeled by the singular potential V as in (5.1) can then be extended to give upper bounds on the approximation rates achieved by realizations of NNs of the wave functions in a bounded, sufficiently large domain containing all singularities of the nuclear potential in (5.4).

Corollary 5.4

Assume that (5.5) has N real eigenvalues \(\uplambda _1, \ldots , \uplambda _N\) with associated eigenfunctions \(\varphi _1, \ldots , \varphi _N\), such that \(\int _{{\mathbb {R}}^3}\varphi _i\varphi _j = \delta _{ij}\). Let \(a_i, b_i\in {\mathbb {R}}\), \(i=1,2,3\), and such that \(\{R_j\}_{j=1}^M\subset \varOmega \). Then, for every \(0< \epsilon <1\), there exists a NN \(\varPhi _{\epsilon , \varphi }\) such that \({{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi }) : {\mathbb {R}}^3\rightarrow {\mathbb {R}}^N\) and

$$\begin{aligned} \left\| \varphi _i - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi })_i \right\| _{H^1(\varOmega )} \le \epsilon ,\qquad \forall i=1, \dots , N. \end{aligned}$$
(5.7)

Furthermore, as \(\epsilon \rightarrow 0\) \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , \varphi }) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{7} + N\left| \log (\epsilon )\right| ^{6})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , \varphi }) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\).

Proof

The proof is based on a partition of unity argument. We only sketch it at this point, but will develop it in detail in the proof of Theorem 5.6. Let \({\mathcal {T}}\) be a tetrahedral, regular triangulation of \(\varOmega \), and let \(\{\kappa _k \}_{k=1}^{N_\kappa }\) be the hat-basis functions associated with it. We suppose that the triangulation is sufficiently refined to ensure that, for all \(k\in \{1, \dots , N_\kappa \}\), exists a cube \({\widetilde{\varOmega }}_{k}\subset \varOmega \) such that \({{\,\mathrm{supp}\,}}(\kappa _k) \subset {\widetilde{\varOmega }}_{k}\) and that there exists at most one \(j\in \{1, \dots , M\}\) such that \({\overline{{\widetilde{\varOmega }}}}_k\cap R_j \ne \emptyset \).

For all \(k\in \{1, \dots , N_\kappa \}\), by [23, Theorem 5.2], which is based on [56], there exists a NN \(\varPhi ^{\kappa _k}\) such that

$$\begin{aligned} {{\,\mathrm{R}\,}}(\varPhi ^{\kappa _k}) (x) = \kappa _k(x), \qquad \forall x\in \varOmega . \end{aligned}$$

For all \(0<\epsilon <1\), let

$$\begin{aligned} \epsilon _1 {:}{=}\frac{\epsilon }{ 2 N_\kappa \left( \max _{k\in \{1, \dots , N_\kappa \}}\Vert \kappa _k \Vert _{W^{1,\infty }(\varOmega )} \right) }. \end{aligned}$$

For all \(k\in \{1, \dots , N_\kappa \}\) and \(i\in \{1,\ldots ,N\}\), it holds that \(\varphi _i|_{{\widetilde{\varOmega }}_k} \in {\mathcal {J}}^{\varpi }_\gamma ({\widetilde{\varOmega }}_k; \{R_1, \dots , R_M\}\cap {\overline{{\widetilde{\varOmega }}}}_k, \emptyset )\). Then, there exists a NN \(\varPhi _{\epsilon _1, \varphi }^{k}\), as defined in Theorem 5.3, such that

$$\begin{aligned} \Vert \varphi _i - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon _1, \varphi }^k)_i \Vert _{H^1({\widetilde{\varOmega }}_k)} \le \epsilon _1,\qquad \forall i\in \{1, \dots , N\}. \end{aligned}$$
(5.8)

Let

$$\begin{aligned} C_\infty {:}{=}\max _{k \in \{1, \dots , N_\kappa \}} \sup _{{\hat{\epsilon }} \in (0,1)} \frac{\Vert {{\,\mathrm{R}\,}}(\varPhi ^{k}_{{\hat{\epsilon }}, \varphi }) \Vert _{L^\infty ({\widetilde{\varOmega }}_k)}}{1+\left| \log {\hat{\epsilon }} \right| ^6} < \infty \end{aligned}$$

where the finiteness is due to Theorem 5.3. Then, we denote

$$\begin{aligned} \epsilon _{\times }{:}{=}\frac{\epsilon }{2N_\kappa ({|\varOmega |^{1/2}+} 1+\max _{i=1, \dots , N}|\varphi _i|_{H^1(\varOmega )} + \max _{k=1, \dots , N_\kappa }\Vert \kappa _k\Vert _{W^{1, \infty }(\varOmega )}|\varOmega |^{1/2})} \end{aligned}$$

and \(M_{\times }(\epsilon _1) {:}{=}C_\infty (1+\left| \log \epsilon _1 \right| ^6)\). As detailed in the proof of Theorem 5.6 below, after concatenating with identity NNs and possibly after increasing the constants, we assume that \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon _1, \varphi }^k)\) is independent of k and that the bound on \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon _1, \varphi }^k)\) is independent of k, and that the same holds for \(\varPhi ^{\kappa _k}\), \(k=1,\ldots ,N_\kappa \).

Let now, for \(i\in \{1, \dots , N\}\), \(E_i \in \{0,1\}^{2\times N+1}\) be the matrices such that, for all \(x = (x_1, \dots , x_{N+1})\), \(E_i x = (x_i, x_{N+1})\). Let also \(A = ({{\,\mathrm{Id}\,}}_{N\times N},\ldots ,{{\,\mathrm{Id}\,}}_{N\times N}) \in {\mathbb {R}}^{N\times N_\kappa N}\) be the block matrix comprising \(N_\kappa \) times the identity matrix \({{\,\mathrm{Id}\,}}_{N\times N}\in {\mathbb {R}}^{N\times N}\). Then, we introduce the NN

$$\begin{aligned} \varPhi _{\epsilon , \varphi }= & {} (A, 0)\nonumber \\&\odot&{{\,\mathrm{P}\,}}\left( \left\{ {{\,\mathrm{P}\,}}\left( \left\{ \varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)} \odot (E_i, 0) \right\} _{i=1}^N\right) \odot {{\,\mathrm{P}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi }, \varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L} \odot \varPhi ^{\kappa _k})\right\} _{k=1}^{N_\kappa } \right) ,\nonumber \\ \end{aligned}$$
(5.9)

where \(L\in {\mathbb {N}}\) is such that \({{\,\mathrm{L}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L} \odot \varPhi ^{\kappa _k}) = {{\,\mathrm{L}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })\), from which it follows that \({{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L}) \le C{{\,\mathrm{L}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })\). It holds, for all \(i\in \{1, \dots , N\}\), that

$$\begin{aligned} {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi })(x)_i = \sum _{k=1}^{N_\kappa } {{\,\mathrm{R}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1) }) ({{\,\mathrm{R}\,}}(\varPhi ^k_{\epsilon _1, \varphi })(x)_i, \kappa _k(x)), \qquad \forall x\in \varOmega . \end{aligned}$$

By the triangle inequality, [40, Theorem 2.1], (5.8), and Proposition 3.6, for all \(i\in \{1, \dots , N\}\),

$$\begin{aligned}&\Vert \varphi _i - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , \varphi })_i \Vert _{H^1(\varOmega )} \\&\quad \le \Vert \varphi _i - \sum _{i=1}^{N_\kappa }\kappa _k {{\,\mathrm{R}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })_i \Vert _{H^1(\varOmega )} \\&\qquad + \sum _{k=1}^{N_\kappa } \Vert {{\,\mathrm{R}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) \left( {{\,\mathrm{R}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })_i, \kappa _k \right) - \kappa _k{{\,\mathrm{R}\,}}(\varPhi ^{k}_{\epsilon _1, \varphi })_i \Vert _{H^1(\varOmega _k)} \\&\quad \le N_\kappa \left( \max _{k\in \{1, \dots , N_\kappa \}}\Vert \kappa _k\Vert _{W^{1, \infty }(\varOmega )} \right) \epsilon _1\\&\qquad + N_\kappa (|\varOmega |^{1/2}+ 1+\max _{i=1, \dots , N}|\varphi _i|_{H^1(\varOmega )} + \max _{k=1, \dots , N_\kappa }\Vert \kappa _k \Vert _{W^{1, \infty }(\varOmega )}|\varOmega |^{1/2}) \epsilon _{\times }\\&\quad \le \epsilon . \end{aligned}$$

The asymptotic bounds on the size and depth of \(\varPhi _{\epsilon , \varphi }\) can then be derived from (5.9), using Theorem 5.3, as developed in more detail in the proof of Theorem 5.6 below. \(\square \)

5.2 Elliptic PDEs in Polygonal Domains

We establish exponential expressivity for realizations of NNs with ReLU activation of solution classes to elliptic PDEs in polygonal domains \(\varOmega \), the boundaries \(\partial \varOmega \) of which are Lipschitz and consist of a finite number of straight line segments. Notably, \(\varOmega \subset {\mathbb {R}}^2\) need not be a finite union of axiparallel rectangles. In the following lemma, we construct a partition of unity in \(\varOmega \) subordinate to an open covering, of which each element is the affine image of one out of three canonical patches. Remark that we admit corners with associate angle of aperture \(\pi \); this will be instrumental, in Corollaries 5.10 and 5.11, for the imposition of different boundary conditions on \(\partial \varOmega \). The three canonical patches that we consider are listed in Lemma 5.5, item [P2]. Affine images of \((0,1)^2\) are used away from corners of \(\partial \varOmega \) and when the internal angle of a corner is smaller than \(\pi \). Affine images of \((-1,1)\times (0,1)\) are used near corners with internal angle \(\pi \). PDE solutions may exhibit point singularities near such corners, e.g., if the two neighboring edges have different types of boundary conditions. Affine images of \( (-1,1)^2\setminus (-1, 0]^2\) are used near corners with internal angle larger than \(\pi \). In the proof of Theorem 5.6, we use on each patch Theorem 4.3 or a result from Sect. 5.3 below.

A triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\) is defined as a partition of \({\mathbb {R}}^2\) into open triangles K such that \(\bigcup _{K\in {\mathcal {T}}} {\overline{K}} = {\mathbb {R}}^2\). A regular triangulation of \({\mathbb {R}}^2\) is a triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\) such that, additionally, for any two neighboring elements \(K_1, K_2\in {\mathcal {T}}\), \({\overline{K}}_1\cap {\overline{K}}_2\) is either a corner of both \(K_1\) and \(K_2\) or the closure of an entire edge of both \(K_1\) and \(K_2\). For a regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), we denote by \(S_1({\mathbb {R}}^2, {\mathcal {T}})\) the space of continuous functions on \({\mathbb {R}}^2\) such that for every \(K \in {\mathcal {T}}\), \(v|_{K} \in {\mathbb {P}}_1\).

We postpone the proof of Lemma 5.5 to “Appendix” B.1.

Lemma 5.5

Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon with Lipschitz boundary, consisting of straight sides, and with a finite set \({\mathcal {C}}\) of corners. Then, there exists a regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), such that for all \(K\in {\mathcal {T}}\) either \(K\subset \varOmega \) or \(K\subset \varOmega ^c\) and such that only finitely many \(K\in {\mathcal {T}}\) satisfy \(K\subset \varOmega \). Moreover, there exist \(N_p\in {\mathbb {N}}\), an open cover \(\{\varOmega _i\}_{i=1}^{N_p}\) of \(\varOmega \), and a partition of unity \(\{\phi _i\}_{i=1}^{N_p}\in \left[ S_1({\mathbb {R}}^2,{\mathcal {T}})\right] ^{N_p}\) on \(\varOmega \) (i.e., \(\sum _{i=1}^{N_p} \phi _i(x) = 1\) for all \(x\in \varOmega \), but this need not hold for \(x\in \varOmega ^c\)) such that

  1. [P1]

    \({{\,\mathrm{supp}\,}}(\phi _i)\cap \varOmega \subset \varOmega _i\) for all \(i=1,\ldots ,N_p\),

  2. [P2]

    for each \(i\in \{1, \dots , N_p\}\), there exists an affine map \(\psi _i :{\mathbb {R}}^2 \rightarrow {\mathbb {R}}^2\) such that \(\psi _i^{-1}(\varOmega _i) = {\widehat{\varOmega }}_i\) for

    $$\begin{aligned} {\widehat{\varOmega }}_i \in \{(0,1)^2,\; (-1,1)\times (0,1), \; (-1,1)^2\setminus (-1, 0]^2 \}, \end{aligned}$$
  3. [P3]

    \({\mathcal {C}} \cap {\overline{\varOmega }}_i \subset \psi _i(\{(0,0)\})\) for all \(i\in \{1, \dots , N_p\}\).

The following statement, then, provides expression rates for the NN approximation of functions in weighted analytic classes in polygonal domains.

We recall that all NNs are realized with the ReLU activation function, see (3.1).

Theorem 5.6

Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon with Lipschitz boundary \(\varGamma \) consisting of straight sides and with a finite set \({\mathcal {C}}\) of corners. Let \({\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}\) such that \(\min {\underline{\gamma }}>1\). Then, for all \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, \emptyset )\) and for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , u}\) such that

$$\begin{aligned} \Vert u - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , u}) \Vert _{H^1(\varOmega )}\le \epsilon . \end{aligned}$$
(5.10)

In addition, as \(\epsilon \rightarrow 0\),

$$\begin{aligned} {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5}), \qquad {{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,u}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| )). \end{aligned}$$

Proof

We introduce, using Lemma 5.5, a regular triangulation \({\mathcal {T}}\) of \({\mathbb {R}}^2\), an open cover \(\{\varOmega _i\}_{i=1}^{N_p}\) of \(\varOmega \), and a partition of unity \(\{\phi _i\}_{i=1}^{N_p} \in \left[ S_1({\mathbb {R}}^2,{\mathcal {T}}) \right] ^{N_p}\) on \(\varOmega \) such that the properties [P1] – [P3] of Lemma 5.5 hold.

We define \({\hat{u}}_i {:}{=}u_{|_{\varOmega _i}}\circ \psi _i : {\widehat{\varOmega }}_i\rightarrow {\mathbb {R}}\). Since \(u\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}}(\varOmega ; {\mathcal {C}}, \emptyset )\) with \(\min {\underline{\gamma }}>1\) and since the maps \(\psi _i\) are affine, we observe that for every \(i\in \{1, \dots , N_p\}\), there exists \({\underline{\gamma }}\) such that \(\min {\underline{\gamma }}>1\) and \({\hat{u}}_i\in {\mathcal {J}}^\varpi _{{\underline{\gamma }}}({\widehat{\varOmega }}_i, \{(0,0)\}, \emptyset )\), because of [P2] and [P3]. Let

$$\begin{aligned} \epsilon _1 {:}{=}\frac{\epsilon }{2 N_p\max _{i\in \{1, \dots , N_p\}} \Vert \phi _i\Vert _{W^{1,\infty }(\varOmega )} \left( \Vert \det J_{\psi _i}\Vert _{L^\infty ((0,1)^2)} \left( 1 + \Vert \Vert J_{\psi ^{-1}_i}\Vert _2\Vert _{L^\infty (\varOmega _i)}^{2} \right) \right) ^{1/2} }. \end{aligned}$$

By Theorem 4.3 and by Lemma 5.19 and Theorem 5.14 in the forthcoming Sect. 5.3, there exist \(N_p\) NNs \(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\), \(i\in \{1,\dots , N_p\}\), such that

$$\begin{aligned} \Vert {\hat{u}}_i - {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \Vert _{H^1({\widehat{\varOmega }}_i)}\le \epsilon _1, \qquad \forall i\in \{1, \dots , N_p\}, \end{aligned}$$
(5.11)

and there exists \(C_\infty >0\) independent of \(\epsilon _1\) such that, for all \(i \in \{1, \dots , N_p\}\) and all \({\hat{\epsilon }} \in (0,1)\)

$$\begin{aligned} \Vert {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{{\hat{\epsilon }}}) \Vert _{L^\infty ({\widehat{\varOmega }}_i)} \le C_\infty (1+\left| \log {\hat{\epsilon }} \right| ^4). \end{aligned}$$

The NNs given by Theorem 4.3, Lemma 5.19 and Theorem 5.14, which we here denote by \({\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}\) for \(i = 1,\ldots ,N_p\), may not have equal depth. Therefore, for all \(i=1,\ldots ,N_p\) and suitable \(L_i\in {\mathbb {N}}\) we define \(\varPhi ^{{\hat{u}}_i}_{\epsilon _1} {:}{=} \varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L_i} \odot {\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}\), so that the depth is the same for all \(i=1,\ldots ,N_p\). To estimate the size of the enlarged NNs, we use the fact that the size of a NN is not smaller than the depth unless the associated realization is constant. In the latter case, we could replace the NN by a NN with one non-zero weight without changing the realization. By this argument, we obtain for all \(i=1,\ldots ,N_p\) that \({{\,\mathrm{M}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \le 2{{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L_i}) + 2{{\,\mathrm{M}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}) \le C \max _{j=1,\ldots ,N_p} {{\,\mathrm{L}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_j}_{\epsilon _1}) + C {{\,\mathrm{M}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_i}_{\epsilon _1}) \le C \max _{j=1,\ldots ,N_p} {{\,\mathrm{M}\,}}({\widetilde{\varPhi }}^{{\hat{u}}_j}_{\epsilon _1})\). Furthermore, as shown in [23], there exist NNs \(\varPhi ^{\phi _i}\), \(i\in \{1, \dots , N_p\}\), such that

$$\begin{aligned} {{\,\mathrm{R}\,}}(\varPhi ^{\phi _i})(x) = \phi _i(x), \qquad \forall x\in \varOmega ,\; \forall i\in \{1, \dots , N_p\}. \end{aligned}$$

Here, we use that \({\mathcal {T}}\) is a partition of \({\mathbb {R}}^2\), so that \(\phi _i\) is defined on all of \({\mathbb {R}}^2\) and [23, Theorem 5.2] applies, which itself is based on [56].

Possibly after concatenating with identity networks in the same way as just described for \(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\), we can assume that \(\varPhi ^{\phi _i}\) for \(i=1,\ldots ,N_p\) all have equal depth and that the size of \(\varPhi ^{\phi _i}\) is bounded independent of i.

Since by [P2] the mappings \(\psi _i\) are affine and invertible, it follows that \(\psi _i^{-1}\) is affine for every \(i \in \{1, \dots , N_p\}\). Thus, there exist NNs \(\varPhi ^{\psi ^{-1}_i}\), \(i\in \{1, \dots , N_p\}\), of depth 1, such that

$$\begin{aligned} {{\,\mathrm{R}\,}}(\varPhi ^{\psi _i^{-1}})(x) = \psi _i^{-1}(x), \qquad \forall x\in \varOmega _i,\; \forall i\in \{1, \dots , N_p\}. \end{aligned}$$
(5.12)

Next, we define

$$\begin{aligned} \epsilon _{\times }{:}{=}\frac{\epsilon }{2N_p(|\varOmega |^{1/2}+ 1+|u|_{H^1(\varOmega )} + \max _{i=1, \dots , N_p}\Vert \phi _i\Vert _{W^{1, \infty }(\varOmega )}|\varOmega |^{1/2})} \end{aligned}$$

and \(M_{\times }(\epsilon _1){:}{=}C_\infty (1+\left| \log \epsilon _1 \right| ^4)\). Finally, we set

$$\begin{aligned} \varPhi _{\epsilon , u} {:}{=}((\underbrace{1, \dots , 1}_{N_p\text { times}}), 0) \odot {{\,\mathrm{P}\,}}\left( \left\{ \varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)} \odot {{\,\mathrm{P}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}, \varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L} \odot \varPhi ^{\phi _i})\right\} _{i=1}^{N_p} \right) ,\nonumber \\ \end{aligned}$$
(5.13)

where \(L\in {\mathbb {N}}\) is such that \({{\,\mathrm{L}\,}}(\varPhi ^{{\hat{u}}_1}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_1}) = {{\,\mathrm{L}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L} \odot \varPhi ^{\phi _1})\), which yields \({{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L}) \le C {{\,\mathrm{L}\,}}(\varPhi ^{{\hat{u}}_1}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_1})\).

Approximation accuracy. By (5.13), we have for all \(x\in \varOmega \),

$$\begin{aligned} {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , u})(x) = \sum _{i=1}^{N_p} {{\,\mathrm{R}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) \left( {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i})(x), {{\,\mathrm{R}\,}}(\varPhi ^{\phi _i})(x)\right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned} \Vert u - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon ,u}) \Vert _{H^1(\varOmega )}&\le \left\| u - \sum _{i=1}^{N_p}\phi _i{{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}) \right\| _{H^1(\varOmega )} \\&\quad + \sum _{i=1}^{N_p} \left\| {{\,\mathrm{R}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) \left( {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}), \phi _i \right) \right. \\&\quad \left. - \phi _i{{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi _i^{-1}}) \right\| _{H^1(\varOmega )} \\&= (I) + (II). \end{aligned} \end{aligned}$$
(5.14)

We start by considering term (I). For each \(i\in \{1, \dots , N_p\}\), thanks to (5.11) and denoting by \(\Vert J_{\psi ^{-1}_i} \Vert _2^2\) the square of the matrix 2-norm of the Jacobian of \(\psi _i^{-1}\), it holds that

$$\begin{aligned} \begin{aligned}&\Vert u - {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}) \Vert _{H^1(\varOmega _i)} \\&\quad = \Vert {\hat{u}}_i\circ \psi ^{-1}_i - {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1})\circ \psi ^{-1}_i \Vert _{H^1(\varOmega _i)} \\&\quad = \left( \int _{{\widehat{\varOmega }}_i} \left( \left| {\hat{u}}_i - {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \right| _{}^2 + \left\| \nabla \left( {\hat{u}}_i - {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \right) J_{\psi _i^{-1}} \right\| _{2}^2 \right) |\det J_{\psi _i}| dx \right) ^{1/2} \\&\quad \le \epsilon _1 \left( \Vert \det J_{\psi _i}\Vert _{L^\infty ({\widehat{\varOmega }}_i)} + \Vert \det J_{\psi _i}\Vert _{L^\infty ({\widehat{\varOmega }}_i)} \Vert \Vert J_{\psi ^{-1}_i} \Vert _2^2 \Vert _{L^\infty (\varOmega _i)} \right) ^{1/2} \\&\quad \le \epsilon _2{:}{=} \epsilon _1 \max _{i \in \{1, \dots , N_p\}} \left( \Vert \det J_{\psi _i}\Vert _{L^\infty ({\widehat{\varOmega }}_i)} + \Vert \det J_{\psi _i}\Vert _{L^\infty ({\widehat{\varOmega }}_i)} \Vert \Vert J_{\psi ^{-1}_i} \Vert _2^2 \Vert _{L^\infty (\varOmega _i)} \right) ^{1/2}. \end{aligned}\nonumber \\ \end{aligned}$$
(5.15)

By [40, Theorem 2.1],

$$\begin{aligned} (I) \le N_p \epsilon _2\max _{i\in \{1, \dots , N_p\}}\Vert \phi _i\Vert _{W^{1, \infty }(\varOmega )} \le \frac{\epsilon }{2}. \end{aligned}$$
(5.16)

We now consider term (II) in (5.14). By Theorem 4.3 and (5.12), it holds that

$$\begin{aligned} \Vert {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i})\Vert _{L^\infty (\varOmega _i)} = \Vert {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1})\Vert _{L^\infty ({\widehat{\varOmega }}_i)} \le C_{\infty } (1 + \left| \log \epsilon _1 \right| ^4) \end{aligned}$$

for all \(i\in \{1, \dots , N_p\}\). Furthermore, by [P1], \(\phi _i(x) = 0\) for all \(x\in \varOmega \setminus \varOmega _i\) and, by Proposition 3.6,

$$\begin{aligned} {{\,\mathrm{R}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) \left( {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i})(x), \phi _i (x)\right) = 0, \qquad \forall x\in \varOmega \setminus \varOmega _i. \end{aligned}$$

From (5.15), we also have

$$\begin{aligned}&| {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}) |_{H^1(\varOmega _i)}\\&\quad \le | u |_{H^1(\varOmega _i)} + \Vert u - {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}) \Vert _{H^1(\varOmega _i)} \le 1+ | u |_{H^1(\varOmega _i)}. \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} (II)&= \sum _{i=1}^{N_p} \Vert {{\,\mathrm{R}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) \left( {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}), \phi _i \right) - \phi _i{{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi _i^{-1}}) \Vert _{H^1(\varOmega _i)} \\&\le \sum _{i=1}^{N_p}\bigg ( \Vert {{\,\mathrm{R}\,}}(\varPi ^{2}_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) (a,b) - ab\Vert _{W^{1, \infty }([-M_{\times }(\epsilon _1), M_{\times }(\epsilon _1)]^2)} \\&\quad \cdot \left( |\varOmega |^{1/2}+ | {{\,\mathrm{R}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}\odot \varPhi ^{\psi ^{-1}_i}) |_{H^1(\varOmega _i)} +| \phi _i |_{H^1(\varOmega _i)} \right) \bigg ) \\&\le N_p \epsilon _{\times }\left( |\varOmega |^{1/2}+ 1 + | u |_{H^1(\varOmega _i)} + |\varOmega |^{1/2}\max _{i=1,\dots , N_p} \Vert \phi _i\Vert _{W^{1, \infty }(\varOmega )}\right) \\&\le \frac{\epsilon }{2}. \end{aligned} \end{aligned}$$
(5.17)

The asserted approximation accuracy follows by combining (5.14), (5.16), and (5.17).

Size of the neural network. To bound the size of the NN, we remark that \(N_p\) and the sizes of \(\varPhi ^{\psi _i^{-1}}\) and of \(\varPhi ^{\phi _i}\) only depend on the domain \(\varOmega \). Furthermore, there exist constants \(C_{\varOmega , i}\), \(i=1,2,3\), that depend only on \(\varOmega \) and u such that

$$\begin{aligned}&\left| \log \epsilon _1 \right| \le C_{\varOmega , 1} (1+\left| \log \epsilon \right| ),\qquad \qquad \left| \log \epsilon _\times \right| \le C_{\varOmega , 2}(1+ \left| \log \epsilon \right| ), \nonumber \\&\left| \log M_\times (\epsilon _1) \right| \le C_{\varOmega , 3}(1+ \log (1+\left| \log \epsilon \right| )). \end{aligned}$$
(5.18)

From Theorem 4.3 and Proposition 3.6, in addition, there exist constants \(C^L_{{\hat{u}}}, C^M_{{\hat{u}}}, C_{\times }>0\) such that, for all \(0< \epsilon _1 , \epsilon _\times \le 1\),

$$\begin{aligned}&{{\,\mathrm{L}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \le C^L_{{\hat{u}}} (1+\left| \log \epsilon _1 \right| )(1 + \log (1+\left| \log \epsilon _1 \right| )),\quad {{\,\mathrm{M}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) \le C^M_{{\hat{u}}} (1+\left| \log \epsilon _1 \right| ^{5}),\nonumber \\&\max ({{\,\mathrm{M}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}), {{\,\mathrm{L}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)})) \le C_\times (1+\log (M_{\times }(\epsilon _1)^2/\epsilon _{\times })). \end{aligned}$$
(5.19)

Then, by (5.13), we have

$$\begin{aligned} {{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u})= & {} 1 + {{\,\mathrm{L}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)}) +\max _{i=1, \dots , N_p} \left( {{\,\mathrm{L}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) + {{\,\mathrm{L}\,}}(\varPhi ^{\psi _i^{-1}}) \right) , \nonumber \\ {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u})\le & {} {C}\left( N_p + N_p {{\,\mathrm{M}\,}}(\varPi ^2_{\epsilon _{\times }, M_{\times }(\epsilon _1)})\right. \nonumber \\&\left. +\sum _{i=1}^{N_p} \left( {{\,\mathrm{M}\,}}(\varPhi ^{{\hat{u}}_i}_{\epsilon _1}) + {{\,\mathrm{M}\,}}(\varPhi ^{\psi _i^{-1}}) + {{\,\mathrm{M}\,}}(\varPhi ^{{{\,\mathrm{Id}\,}}}_{1,L}) + {{\,\mathrm{M}\,}}(\varPhi ^{\phi _i}) \right) \right) . \end{aligned}$$
(5.20)

The desired depth and size bounds follow from (5.18), (5.19), and (5.20). This concludes the proof. \(\square \)

The exponential expression rate for the class of weighted, analytic functions in \(\varOmega \) by realizations of NNs with ReLU activation in the \(H^1(\varOmega )\)-norm established in Theorem 5.6 implies an exponential expression rate bound on \(\partial \varOmega \), via the trace map and the fact that \(\partial \varOmega \) can be exactly parametrized by the realization of a shallow NN with ReLU activation. This is relevant for NN-based solution of boundary integral equations.

Corollary 5.7

(NN expression of Dirichlet traces) Let \(\varOmega \subset {\mathbb {R}}^2\) be a polygon with Lipschitz boundary \(\varGamma \) and a finite set \({\mathcal {C}}\) of corners. Let \({\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}\) such that \(\min {\underline{\gamma }}>1\). For any connected component \(\varGamma \) of \(\partial \varOmega \), let \(\ell _\varGamma >0\) be the length of \(\varGamma \), such that there exists a continuous, piecewise affine parametrization \(\theta :[0,\ell _\varGamma ]\rightarrow {\mathbb {R}}^2:t\mapsto \theta (t)\) of \(\varGamma \) with finitely many affine linear pieces and \(\left\| \tfrac{d}{dt}\theta \right\| _{2} = 1\) for almost all \(t\in [0,\ell _\varGamma ]\).

Then, for all \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, \emptyset )\) and for all \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , u, \theta }\) approximating the trace \(Tu {:}{=} u_{|_{\varGamma }}\) such that

$$\begin{aligned} \Vert Tu- {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , u, \theta })\circ \theta ^{-1} \Vert _{H^{1/2}(\varGamma )}\le \epsilon . \end{aligned}$$
(5.21)

In addition, as \(\epsilon \rightarrow 0\),

$$\begin{aligned} {{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u, \theta }) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5}), \qquad {{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u, \theta }) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| )). \end{aligned}$$

Proof

We note that both components of \(\theta \) are continuous, piecewise affine functions on \([0,\ell _\varGamma ]\), thus they can be represented exactly as realization of a NN of depth two, with the ReLU activation function. Moreover, the number of weights of these NNs is of the order of the number of affine linear pieces of \(\theta \). We denote the parallelization of the NNs emulating exactly the two components of \(\theta \) by \(\varPhi ^{\theta }\).

By continuity of the trace operator \(T: H^1(\varOmega ) \rightarrow H^{1/2}(\partial \varOmega )\) (e.g., [7, 15]), there exists a constant \(C_{\mathrm {\varGamma }}>0\) such that for all \(v\in H^{1}(\varOmega )\) it holds \( \left\| Tv \right\| _{H^{1/2}(\varGamma )} \le C_{\mathrm {\varGamma }}\left\| v \right\| _{H^{1}(\varOmega )}, \) and without loss of generality we may assume \(C_{\mathrm {\varGamma }}\ge 1\).

Next, for any \(\varepsilon \in (0,1)\), let \(\varPhi _{\epsilon /C_{\mathrm {\varGamma }}, u}\) be as given by Theorem 5.6. Define \(\varPhi _{\epsilon , u, \theta } {:}{=} \varPhi _{\epsilon /C_{\mathrm {\varGamma }}, u} \odot \varPhi ^{\theta }\). It follows that

$$\begin{aligned} \left\| Tu - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , u, \theta })\circ \theta ^{-1} \right\| _{H^{1/2}(\varGamma )}&= \left\| T\left( u - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon /C_{\mathrm {\varGamma }}, u}) \right) \right\| _{H^{1/2}(\varGamma )} \\&\le C_{\mathrm {\varGamma }}\left\| u - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon /C_{\mathrm {\varGamma }}, u}) \right\| _{H^{1}(\varOmega )} \le \varepsilon . \end{aligned}$$

The bounds on its depth and size follow directly from Proposition 3.2, Theorem 5.6, and the fact that the depth and size of \(\varPhi ^{\theta }\) are independent of \(\varepsilon \). This finishes the proof. \(\square \)

Remark 5.8

The exponent 5 in the bound on the NN size \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u, \theta })\) in Corollary 5.7 is likely not optimal, due to it being transferred from the NN rate in \(\varOmega \).

The proof of Theorem 5.6 established exponential expressivity of realizations of NNs with ReLU activation for the analytic class \({\mathcal {J}}^\varpi _{{\underline{\gamma }}}(\varOmega ; {\mathcal {C}}, \emptyset )\) in \(\varOmega \). This implies that realizations of NNs can approximate, with exponential expressivity, solution classes of elliptic PDEs in polygonal domains \(\varOmega \). We illustrate this by formulating concrete results for three problem classes: second-order, linear, elliptic source and eigenvalue problems in \(\varOmega \), and viscous, incompressible flow. To formulate the results, we specify the assumptions on \(\varOmega \).

Definition 5.9

(Linear, second-order, elliptic divergence-form differential operator with analytic coefficients) Let \(d\in \{2, 3\}\) and let \(\varOmega \subset {\mathbb {R}}^d\) be a bounded domain. Let the coefficient functions \(a_{ij},b_i, c:{\overline{\varOmega }}\rightarrow {\mathbb {R}}\) be real analytic in \({\overline{\varOmega }}\), and such that the matrix function \(A = (a_{ij})_{1\le i,j\le d}:\varOmega \rightarrow {\mathbb {R}}^{d \times d}\) is symmetric and uniformly positive definite in \(\varOmega \). With these functions, we define the linear, second-order, elliptic divergence-form differential operator \({{\mathcal {L}}}\) acting on \(w\in C^\infty _0(\varOmega )\) via (summation over repeated indices \(i,j\in \{1,\dots , d\}\))

$$\begin{aligned} ({{\mathcal {L}}}w)(x) {:}{=} -\partial _i(a_{ij}(x)\partial _j w(x)) + b_j(x)\partial _j w(x) + c(x)w(x) \;, \quad x\in \varOmega \;. \end{aligned}$$

Setting 1

We assume that \(\varOmega \subset {\mathbb {R}}^2\) is an open, bounded polygon with boundary \(\partial \varOmega \) that is Lipschitz and connected. In addition, \(\partial \varOmega \) is the closure of a finite number \(J \ge 3\) of straight, open sides \(\varGamma _j\), i.e., \(\varGamma _{i} \cap \varGamma _j =\emptyset \) for \(i\ne j\) and \(\partial \varOmega = \bigcup _{1\le j \le J} \overline{\varGamma _j}\). We assume the enumeration of the sides \(\varGamma _j\) to be J- cyclic, i.e., \(\varGamma _{J+1} = \varGamma _1\).

By \(n_j\), we denote the exterior unit normal vector to \(\varOmega \) on side \(\varGamma _j\) and by \({\varvec{c}}_j {:}{=} \overline{\varGamma _{j-1}} \cap \overline{\varGamma _j}\) the corner j of \(\varOmega \).

With \({{\mathcal {L}}}\) as in Definition 5.9, we associate on boundary segment \(\varGamma _j\) a boundary operator \({\mathcal {B}}_j \in \{\gamma ^j_0,\gamma ^j_1\}\), i.e., either the Dirichlet trace \(\gamma _0\) or the distributional (co-)normal derivative operator \(\gamma _1\), acting on \(w\in C^1({\overline{\varOmega }})\) via

$$\begin{aligned} \gamma ^j_0 w {:}{=} w|_{\varGamma _j} \, ,\qquad \gamma ^j_1 w {:}{=} (A\nabla w) \cdot n_j|_{\varGamma _j}, \quad j=1,...,J\;. \end{aligned}$$
(5.22)

We collect the boundary operators \({\mathcal {B}}_j\) in \({\mathcal {B}}{:}{=} \{ {\mathcal {B}}_j \}_{j=1}^J\).

The first corollary addresses exponential ReLU expressibility of solutions of the source problem corresponding to \(({{\mathcal {L}}},{\mathcal {B}})\).

Corollary 5.10

Let \(\varOmega \), \({{\mathcal {L}}}\), and \({\mathcal {B}}\) be as in Setting 1 with \(d=2\). For f analytic in \({\overline{\varOmega }}\), let u denote a solution to the boundary value problem

$$\begin{aligned} {{\mathcal {L}}}u = f \;\text { in }\;\varOmega , \qquad {\mathcal {B}}u = 0\;\text { on }\;\partial \varOmega \;. \end{aligned}$$
(5.23)

Then, for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , u}\) such that

$$\begin{aligned} \Vert u - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , u}) \Vert _{H^1(\varOmega )}\le \epsilon . \end{aligned}$$
(5.24)

In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,u}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\), as \(\epsilon \rightarrow 0\).

Proof

The proof is obtained by verifying weighted, analytic regularity of solutions. By [3, Theorem 3.1], there exists \({\underline{\gamma }}\) such that \(\min {\underline{\gamma }}>1\) and \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, \emptyset )\). Then, the application of Theorem 5.6 concludes the proof. \(\square \)

Next, we address NN expression rates for eigenfunctions of \(({{\mathcal {L}}},{\mathcal {B}})\).

Corollary 5.11

Let \(\varOmega \), \({{\mathcal {L}}}\), \({\mathcal {B}}\) be as in Setting 1 with \(d=2\), and \(b_i = 0\) in Definition 5.9, and let \( 0 \ne w\in H^1(\varOmega )\) be an eigenfunction of the elliptic eigenvalue problem

$$\begin{aligned} {{\mathcal {L}}}w = \uplambda w \text { in }\varOmega , \qquad {\mathcal {B}}w = 0 \text { on }\partial \varOmega . \end{aligned}$$
(5.25)

Then, for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , w}\) such that

$$\begin{aligned} \Vert w - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , w}) \Vert _{H^1(\varOmega )}\le \epsilon . \end{aligned}$$
(5.26)

In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , w}) = {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,w}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\), as \(\epsilon \rightarrow 0\).

Proof

The statement follows from the regularity result [4, Theorem 3.1], and Theorem 5.6 as in Corollary 5.10. \(\square \)

The analytic regularity of solutions u in the proof of Theorem 5.6 also holds for certain nonlinear, elliptic PDEs. We illustrate it for the velocity field of viscous, incompressible flow in \(\varOmega \).

Corollary 5.12

Let \(\varOmega \subset {\mathbb {R}}^2\) be as in Setting 1. Let \(\nu >0\) and let \({\varvec{u}}\in H^1_0(\varOmega )^2\) be the velocity field of the Leray solutions of the viscous, incompressible Navier–Stokes equations in \(\varOmega \), with homogeneous Dirichlet (“no slip”) boundary conditions

$$\begin{aligned} -\nu \varDelta {\varvec{u}} + ({\varvec{u}}\cdot \nabla ) {\varvec{u}} + \nabla p = {\varvec{f}} \text { in }\varOmega ,\quad \nabla \cdot {\varvec{u}} = 0\text { in }\varOmega ,\quad {\varvec{u}} ={\varvec{0}}\text { on }\partial \varOmega , \quad \end{aligned}$$
(5.27)

where the components of \({\varvec{f}}\) are analytic in \({\overline{\varOmega }}\) and such that \(\Vert {\varvec{f}}\Vert _{H^{-1}(\varOmega )}/ \nu ^2\) is small enough so that \({\varvec{u}}\) is unique.

Then, for every \(0< \epsilon < 1\), there exists a NN \(\varPhi _{\epsilon , {\varvec{u}}}\) with two-dimensional output such that

$$\begin{aligned} \Vert {\varvec{u}} - {{\,\mathrm{R}\,}}(\varPhi _{\epsilon , {\varvec{u}}}) \Vert _{H^1(\varOmega )}\le \epsilon . \end{aligned}$$
(5.28)

In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , {\varvec{u}}})= {\mathcal {O}}(\left| \log (\epsilon )\right| ^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,{\varvec{u}}}) = {\mathcal {O}}(\left| \log (\epsilon )\right| \log (\left| \log (\epsilon )\right| ))\), as \(\epsilon \rightarrow 0\).

Proof

The velocity fields of Leray solutions of the Navier-Stokes equations in \(\varOmega \) satisfy the weighted, analytic regularity \({\varvec{u}} \in \big [{\mathcal {J}}^\varpi _{{\underline{\gamma }}}(\varOmega ; {\mathcal {C}},\emptyset )\big ]^2\), with \(\min {\underline{\gamma }}>1\), see [24, 38]. Then, the application of Theorem 5.6 concludes the proof. \(\square \)

5.3 Elliptic PDEs in Fichera-Type Polyhedral Domains

Fichera-type polyhedral domains \(\varOmega \subset {\mathbb {R}}^3\) are, loosely speaking, closures of finite, disjoint unions of (possibly affinely mapped) axiparallel hexahedra with \(\partial \varOmega \) Lipschitz. In Fichera-type domains, analytic regularity of solutions of linear, elliptic boundary value problems from acoustics and linear elasticity in displacement formulation has been established in [8]. As an example of a boundary value problem covered by [8] and our theory, consider \(\varOmega {:}{=}(-1, 1)^d \setminus (-1, 0]^d\) for \(d=2,3\), displayed for \(d=3\) in Fig. 2.

Fig. 2
figure 2

Example of Fichera-type corner domain

We recall that all NNs are realized with the ReLU activation function, see (3.1).

We introduce the setting for elliptic problems with analytic coefficients in \(\varOmega \). Note that the boundary of \(\varOmega \) is composed of 6 edges when \(d=2\) and of 9 faces when \(d=3\).

Setting 2

We assume that \({{\mathcal {L}}}\) is an elliptic operator as in Definition 5.9. When \(d=3\), we assume furthermore that the diffusion coefficient \(A\in {\mathbb {R}}^{3\times 3}\) is a symmectric, positive matrix and \(b_i = c = 0\). On each edge (if \(d=2\)) or face (if \(d=3\)) \(\varGamma _j\subset \partial \varOmega \), \(j\in \{1, \dots , 3d\}\), we introduce the boundary operator \({\mathcal {B}}_j \in \{\gamma _0,\gamma _1\}\), where \(\gamma _0\) and \(\gamma _1\) are defined as in (5.22). We collect the boundary operators \({\mathcal {B}}_j\) in \({\mathcal {B}}{:}{=} \{ {\mathcal {B}}_j \}_{j=1}^{3d}\).

For a right hand side f, the elliptic boundary value problem we consider in this section is then

$$\begin{aligned} \begin{aligned} {{\mathcal {L}}}u = f \text { in }\varOmega , \qquad {\mathcal {B}}u = 0 \text { on }\partial \varOmega . \end{aligned} \end{aligned}$$
(5.29)

The following extension lemma will be useful for the approximation of the solution to (5.29) by NNs. We postpone its proof to “Appendix” B.2.

Lemma 5.13

Let \(d\in \{2,3\}\) and \(u\in W_{\mathrm {mix}}^{1,1}(\varOmega )\). Then, there exists a function \(v\in W_{\mathrm {mix}}^{1,1}((-1,1)^d)\) such that \(v|_{\varOmega } = u\). The extension is stable with respect to the \(W_{\mathrm {mix}}^{1,1}\)-norm.

We denote the set containing all corners of \(\varOmega \) (including the re-entrant one) as

$$\begin{aligned} {\mathcal {C}}= \{-1,0,1\}^d\setminus (-1, \dots , -1). \end{aligned}$$

When \(d=3\), for all \(c\in {\mathcal {C}}\), then we denote by \({\mathcal {E}}_c\) the set of edges abutting at \(c\) and we denote \({\mathcal {E}}{:}{=}\bigcup _{c\in {\mathcal {C}}}{\mathcal {E}}_c\).

Theorem 5.14

Let \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}})\) with

$$\begin{aligned} \begin{array}{lll} {\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}\}, &{} \text {with } \gamma _c>1,\; \text {for all } c\in {\mathcal {C}}&{} \text { if } d = 2,\\ {\underline{\gamma }}= \{\gamma _c, \gamma _e: c\in {\mathcal {C}}, e\in {\mathcal {E}}\}, &{}\text {with } \gamma _c>3/2\text { and } \gamma _e>1,\; \text {for all }c\in {\mathcal {C}}\text { and }e\in {\mathcal {E}}&{}\text { if } d = 3. \end{array} \end{aligned}$$

Then, for any \(0< \epsilon <1\) there exists a NN \(\varPhi _{\epsilon , u}\) so that

$$\begin{aligned} \left\| u - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \right\| _{H^1(\varOmega )} \le \epsilon . \end{aligned}$$
(5.30)

In addition, \(\Vert {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \Vert _{L^\infty (\varOmega )} = {\mathcal {O}}(1+\left| \log \epsilon \right| ^{2d})\), as \(\epsilon \rightarrow 0\). Also, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{2d+1})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|))\), as \(\epsilon \rightarrow 0\).

Proof

By Lemma 5.13, we extend the function u to a function \({{\tilde{u}}}\) such that

$$\begin{aligned} {{\tilde{u}}}\in W_{\mathrm {mix}}^{1,1}((-1, 1)^d) \qquad \text {and}\qquad {{\tilde{u}}}|_{\varOmega } = u. \end{aligned}$$

Note that, by the stability of the extension, there exists a constant \(C_{\mathrm {ext}}>0\) independent of u such that

$$\begin{aligned} \Vert {{\tilde{u}}}\Vert _{W_{\mathrm {mix}}^{1,1}((-1,1)^d)}\le C_{\mathrm {ext}} \Vert u\Vert _{W_{\mathrm {mix}}^{1,1}(\varOmega )}. \end{aligned}$$
(5.31)

Since \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}})\), it follows that \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(S; {\mathcal {C}}_S, {\mathcal {E}}_S)\) for all

(5.32)

with \({\mathcal {C}}_S = {\overline{S}}\cap {\mathcal {C}}\) and \({\mathcal {E}}_{S} = \{e\in {\mathcal {E}}: e\subset {\overline{S}}\}\). Since \(S \subset \varOmega \) and \({{\tilde{u}}}|_{\varOmega } = u|_{\varOmega }\), we also have

$$\begin{aligned} {{\tilde{u}}}\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(S; {\mathcal {C}}_S, {\mathcal {E}}_S) \;\text{ for } \text{ all } S \text{ satisfying } (5.32). \end{aligned}$$

By Theorem A.25 exist \(C_p>0\), \(C_{{\widetilde{N}}_{\mathrm {1d}}}>0\), \(C_{{\widetilde{N}}_{\mathrm {int}}}>0\), \(C_{{{\tilde{v}}}}>0\), \(C_{{\widetilde{c}}}>0\), and \(b_{{{\tilde{v}}}}>0\) such that, for all \(0<\epsilon \le 1\), there exists \(p\in {\mathbb {N}}\), a partition \({\mathcal {G}}_{\mathrm {1d}}\) of \((-1, 1)\) into \({\widetilde{N}}_{\mathrm {int}}\) open, disjoint, connected subintervals, a d-dimensional array \({\widetilde{c}}\in {\mathbb {R}}^{{\widetilde{N}}_{\mathrm {1d}}\times \dots \times {\widetilde{N}}_{\mathrm {1d}}}\), and piecewise polynomials \({{\tilde{v}}}_i \in {\mathbb {Q}}_p({\mathcal {G}}_{\mathrm {1d}})\cap H^1((-1,1))\), \(i=1, \dots , {\widetilde{N}}_{\mathrm {1d}}\), such that

$$\begin{aligned}&{\widetilde{N}}_{\mathrm {1d}}\le C_{{\widetilde{N}}_{\mathrm {1d}}}(1+\left| \log \epsilon \right| ^2),\quad {\widetilde{N}}_{\mathrm {int}}\le C_{{\widetilde{N}}_{\mathrm {int}}}(1+\left| \log \epsilon \right| ), \\&\quad \quad \Vert {\widetilde{c}}\Vert _{1} \le C_{{\widetilde{c}}}(1+\left| \log \epsilon \right| ^{2d}),\quad p \le C_p(1+\left| \log \epsilon \right| ) \end{aligned}$$

and

$$\begin{aligned} \Vert {{\tilde{v}}}_i\Vert _{H^1(I)}\le C_{{{\tilde{v}}}} \epsilon ^{-b_{{{\tilde{v}}}}}, \qquad \Vert {{{\tilde{v}}}}_i\Vert _{L^\infty (I)}\le 1,\qquad \forall i\in \{1, \dots , {\widetilde{N}}_{\mathrm {1d}}\}. \end{aligned}$$

Furthermore,

$$\begin{aligned} \Vert u - v_{{\mathsf {h}}{\mathsf {p}}} \Vert _{H^1(\varOmega )} = \Vert {{\tilde{u}}}- v_{{\mathsf {h}}{\mathsf {p}}} \Vert _{H^1(\varOmega )}\le \frac{\epsilon }{2}, \qquad v_{{\mathsf {h}}{\mathsf {p}}} = \sum _{{i_1,\dots , i_d}=1}^{{\widetilde{N}}_{\mathrm {1d}}} {\widetilde{c}}_{{i_1\dots i_d}} \bigotimes _{j=1}^d{{\tilde{v}}}_{i_j}. \end{aligned}$$

From the stability (5.31) and from Lemmas A.21 and A.22, it follows that

$$\begin{aligned} \Vert {\widetilde{c}}\Vert _1 \le C N_{\mathrm {int}}^{2d} \Vert u\Vert _{{\mathcal {J}}^d_{\underline{\gamma }}(\varOmega )}, \end{aligned}$$

i.e., the bound on the coefficients \({\widetilde{c}}\) is independent of the extension \({{\tilde{u}}}\) of u. By Theorem 4.2, there exists a NN \({\varPhi }_{\epsilon , u}\) with the stated approximation properties and asymptotic size bounds. The bound on the \(L^\infty (\varOmega )\)-norm of the realization of \(\varPhi _{\epsilon , u}\) follows as in the proof of Theorem 4.3. \(\square \)

Remark 5.15

Arguing as in Corollary 5.7, a NN with ReLU activation and two-dimensional input can be constructed so that its realization approximates the Dirichlet trace of solutions to (5.29) in \(H^{1/2}(\partial \varOmega )\) at an exponential rate in terms of the NN size \({{\,\mathrm{M}\,}}\).

The following statement now gives expression rate bounds for the approximation of solutions to the Fichera problem (5.29) by realizations of NNs with the ReLU activation function.

Corollary 5.16

Let f be an analytic function on \({\overline{\varOmega }}\), and let u be a solution to (5.29) with operators \({{\mathcal {L}}}\) and \({\mathcal {B}}\) as in Setting 2 and with source term f. Then, for any \(0< \epsilon <1\) there exists a NN \(\varPhi _{\epsilon , u}\) so that

$$\begin{aligned} \left\| u - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \right\| _{H^1(\varOmega )} \le \epsilon . \end{aligned}$$
(5.33)

In addition, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{2d+1})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|))\), for \(\epsilon \rightarrow 0\).

Proof

By [8, Corollary 7.1, Theorems 7.3 and 7.4] if \(d=3\) and [3, Theorem 3.1] if \(d=2\), there exists \({\underline{\gamma }}\) such that \(\gamma _c-d/2>0\) for all \(c\in {\mathcal {C}}\) and \(\gamma _e>1\) for all \(e\in {\mathcal {E}}\) such that \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega ; {\mathcal {C}}, {\mathcal {E}})\). An application of Theorem 5.14 concludes the proof. \(\square \)

Remark 5.17

By [8, Corollary 7.1 and Theorem 7.4], Corollary 5.16 holds verbatim also under the hypothesis that the right-hand side f is weighted analytic, with singularities at the corners/edges of the domain; specifically, (5.33) and the size bounds on the NN \(\varPhi _{\epsilon , u}\) hold under the assumption that there exists \({\underline{\gamma }}\) such that \(\gamma _c-d/2>0\) for all \(c\in {\mathcal {C}}\) and \(\gamma _e>1\) for all \(e\in {\mathcal {E}}\) such that

$$\begin{aligned} f \in {\mathcal {J}}^\varpi _{{\underline{\gamma }}-2}(\varOmega ; {\mathcal {C}}, {\mathcal {E}}). \end{aligned}$$

Remark 5.18

The numerical approximation of solutions for (5.29) with a NN in two dimensions has been investigated, e.g., in [33] using the so-called PINNs methodology. There, the loss function was based on minimization of the residual of the NN approximation in the strong form of the PDE. Evidently, a different (smoother) activation than the ReLU activations considered here had to be used. Starting from the approximation of products by NNs with smoother activation functions introduced in [51, Sec.3.3] and following the same line of reasoning as in the present paper, the results we obtain for ReLU-based realizations of NNs can be extended to large classes of NNs with smoother activations and similar architecture.

Furthermore, in [11, Sect. 3.1], a slightly different elliptic boundary value problem is numerically approximated by realizations of NNs. Its solutions exhibit the same weighted, analytic regularity as considered in this paper. The presently obtained approximation rates by NN realizations extend also to the approximation of solutions for the problem considered in [11].

In the proof of Theorem 5.6, we require in particular the approximation of weighted analytic functions on \((-1,1)\times (0,1)\) with a corner singularity at the origin. For convenient reference, we detail the argument in this case.

Lemma 5.19

Let \(d=2\) and \(\varOmega _{DN} {:}{=} (-1,1)\times (0,1)\). Denote \({\mathcal {C}}_{DN} = \{-1,0,1\} \times \{0,1\}\). Let \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega _{DN}; {\mathcal {C}}_{DN}, \emptyset )\) with \({\underline{\gamma }}= \{\gamma _c: c\in {\mathcal {C}}_{DN}\}\), with \(\gamma _c>1\) for all \(c\in {\mathcal {C}}_{DN}\).

Then, for any \(0< \epsilon <1\) there exists a NN \(\varPhi _{\epsilon , u}\) so that

$$\begin{aligned} \left\| u - {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \right\| _{H^1(\varOmega _{DN})} \le \epsilon . \end{aligned}$$
(5.34)

In addition, \(\Vert {{\,\mathrm{R}\,}}\left( \varPhi _{\epsilon , u}\right) \Vert _{L^\infty (\varOmega _{DN})} = {\mathcal {O}}(1+\left| \log \epsilon \right| ^{4})\) , for \(\epsilon \rightarrow 0\). Also, \({{\,\mathrm{M}\,}}(\varPhi _{\epsilon , u}) = {\mathcal {O}}(|\log (\epsilon )|^{5})\) and \({{\,\mathrm{L}\,}}(\varPhi _{\epsilon ,u}) = {\mathcal {O}}(|\log (\epsilon )|\log (|\log (\epsilon )|))\), for \(\epsilon \rightarrow 0\).

Proof

Let \({{\tilde{u}}}\in W_{\mathrm {mix}}^{1,1}((-1, 1)^2)\) be defined by

$$\begin{aligned} {\left\{ \begin{array}{ll} {{\tilde{u}}}(x_1,x_2) = u(x_1,x_2) &{} \text { for all } (x_1,x_2)\in (-1,1)\times [0,1),\\ {{\tilde{u}}}(x_1,x_2) = u(x_1,0) &{} \text { for all } (x_1,x_2)\in (-1,1)\times (-1,0), \end{array}\right. } \end{aligned}$$

such that \({{\tilde{u}}}|_{\varOmega _{DN}} = u\). Here, we used that there exist continuous imbeddings \({\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega _{DN}; {\mathcal {C}}_{DN}, \emptyset ) \hookrightarrow W_{\mathrm {mix}}^{1,1}(\varOmega _{DN}) \hookrightarrow C^0(\overline{\varOmega _{DN}})\) (see Lemma A.22 for the first imbedding), i.e., u can be extended to a continuous function on \(\overline{\varOmega _{DN}}\).

As in the proof of Lemma 5.13, this extension is stable, i.e., there exists a constant \(C_{\mathrm {ext}}>0\) independent of u such that

$$\begin{aligned} \Vert {{\tilde{u}}}\Vert _{W_{\mathrm {mix}}^{1,1}((-1,1)^d)}\le C_{\mathrm {ext}} \Vert u\Vert _{W_{\mathrm {mix}}^{1,1}(\varOmega _{DN})}. \end{aligned}$$
(5.35)

Because \(u \in {\mathcal {J}}^\varpi _{\underline{\gamma }}(\varOmega _{DN}; {\mathcal {C}}_{DN}, \emptyset )\), it holds with \({\mathcal {C}}_S = {\overline{S}}\cap {\mathcal {C}}_{DN}\) that \(u\in {\mathcal {J}}^\varpi _{\underline{\gamma }}(S; {\mathcal {C}}_S, \emptyset )\) for all

The remaining steps are the same as those in the proof of Theorem 5.14. \(\square \)

6 Conclusions and Extensions

We review the main findings of the present paper and outline extensions of the present results, and perspectives for further research.

6.1 Principal Mathematical Results

We established exponential expressivity of realizations of NNs with the ReLU activation function in the Sobolev norm \(H^1\) for functions which belong to certain countably normed, weighted analytic function spaces in cubes \(Q=(0,1)^d\) of dimension \(d=2,3\). The admissible function classes comprise functions which are real analytic at points \(x\in Q\), and which admit analytic extensions to the open sides \(F\subset \partial Q\), but may have singularities at corners and (in space dimension \(d=3\)) edges of Q. We have also extended this result to cover exponential expressivity of realizations of NNs with ReLU activation for solution classes of linear, second-order elliptic PDEs in divergence form in plane, polygonal domains and of elliptic, nonlinear eigenvalue problems with singular potentials in three space dimensions. Being essentially an approximation result, the DNN expression rate bound in Theorem 5.6 will apply to any elliptic boundary value problem in polygonal domains where weighted, analytic regularity is available. Apart from the source and eigenvalue problems, such regularity is in space dimension \(d=2\) also available for linearized elastostatics, Stokes flow and general elliptic systems [8, 17, 20].

The established approximation rates of realizations of NNs with ReLU activation are fundamentally based on a novel exponential upper bound on approximation of weighted analytic functions via tensorized hp approximations on multi-patch configurations in finite unions of axiparallel rectangles/hexahedra. The hp approximation result is presented in Theorem A.25 and of independent interest in the numerical analysis of spectral elements.

The proofs of exponential expressivity of NN realizations are, in principle, constructive. They are based on explicit bounds on the coefficients of hp projections and on corresponding emulation rate bounds for the (re)approximation of modal hp bases.

6.2 Extensions and Future Work

The tensor structure of the hp approximation considered here limited geometries of domains that are admissible for our results. Curvilinear, mapped domains with analytic domain maps will allow corresponding approximation rates, with the NN approximations obtained by composing the present constructions with NN emulations of the domain maps and the fact that compositions of NNs are again NNs.

The only activation function considered in this work is the ReLU. Following the same proof strategy, exponential expression rate bounds can be obtained for functions with smoother, nonlinear activation functions. We refer to Remark 5.18 and to the discussion in [51, Sec. 3.3].

The principal results in Sect. 5.1 prove exponential expressivity of realizations of deep NNs with ReLU activation on solutions sets of singular eigenvalue problems with multiple, isolated point singularities and analytic potentials as arise in electron-structure models for static molecules with known loci of the nuclei. Inspection of our proofs reveals that the expression rate bounds are robust with respect to perturbations of the nuclei sites; only interatomic distances enter the constants in the expression rate bounds of Sect. 5.1.2. Given the closedness of NNs under composition, obtaining similar expression rates also for solutions of the vibrational Schrödinger equation appears in principle possible.

The presently proved deep ReLU NN expression rate bounds can, in connection with recently proposed, residual-based DNN training methodologies (see, e.g., [1, 22, 53] and the references there) imply exponential convergence rates of numerical NN approximations of PDE solutions based on machine learning approaches.