Distributed Learning via Filtered Hyperinterpolation on Manifolds

Montúfar, Guido; Wang, Yu Guang

doi:10.1007/s10208-021-09529-5

Distributed Learning via Filtered Hyperinterpolation on Manifolds

Open access
Published: 12 July 2021

Volume 22, pages 1219–1271, (2022)
Cite this article

Download PDF

You have full access to this open access article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Distributed Learning via Filtered Hyperinterpolation on Manifolds

Download PDF

Guido Montúfar^1,2 &
Yu Guang Wang^2,3,4

2033 Accesses
1 Altmetric
Explore all metrics

Abstract

Learning mappings of data on manifolds is an important topic in contemporary machine learning, with applications in astrophysics, geophysics, statistical physics, medical diagnosis, biochemistry, and 3D object analysis. This paper studies the problem of learning real-valued functions on manifolds through filtered hyperinterpolation of input–output data pairs where the inputs may be sampled deterministically or at random and the outputs may be clean or noisy. Motivated by the problem of handling large data sets, it presents a parallel data processing approach which distributes the data-fitting task among multiple servers and synthesizes the fitted sub-models into a global estimator. We prove quantitative relations between the approximation quality of the learned function over the entire manifold, the type of target function, the number of servers, and the number and type of available samples. We obtain the approximation rates of convergence for distributed and non-distributed approaches. For the non-distributed case, the approximation order is optimal.

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Article Open access 26 July 2022

Bolstering stochastic gradient descent with model building

Article Open access 15 April 2024

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Article 15 April 2024

1 Introduction

Learning functions over manifolds has become an increasingly important topic in machine learning. The performance of many machine learning algorithms depends strongly on the geometry of the data. In real-world applications, one often has huge data sets with noisy samples. In this paper, we propose distributed filtered hyperinterpolation on manifolds, which combines filtered hyperinterpolation and distributed learning [22, 23]. Filtered hyperinterpolation [31, 38] provides a constructive approach to modeling mappings between inputs and outputs in a way that can reduce the influence of noise. The distributed strategy assigns the learning task of the input–output mapping to multiple local servers, enabling parallel computing for massive data sets. Each server handles a small fraction of all data by filtered hyperinterpolation. It then synthesizes the local estimators as a global estimator. We show the precise quantitative relation between the approximation error of the distributed filtered hyperinterpolation, the number of the local servers, and the amount of data. The approximation error (over the entire manifold) converges to zero provided the available amount of data increases sufficiently fast with the number of servers.

Filtered hyperinterpolation was introduced by Sloan and Womersley [31] on the two-sphere $\mathbb {S}^2$, which is a form of filtered polynomial approximation method motivated by hyperinterpolation [29]. Hyperinterpolation uses a Fourier expansion where the integral for the Fourier coefficients is approximated by numerical integration with a quadrature rule. The filtered hyperinterpolation adopts a similar strategy as hyperinterpolation but uses a filter to modify the Fourier expansion. The filter is a restriction on the eigenvalues of the basis functions. Effectively this restricts the capacity of the approximation class and yields a reproducing property for polynomials of a certain degree specified by the filter. It has some similarities to kernel methods. Filtering improves the approximation accuracy of plain hyperinterpolation for noiseless data that is sampled deterministically [18]. With appropriate choice of filter, the filtered hyperinterpolation achieves the best approximation by polynomials of a given degree depending on the amount of data (see Sect. 3.1). As shown in the left part of Fig. 1, one aims at finding the closest approximation of $f^*$ within the polynomial space $\varPi _n$ on the manifold $\mathcal {M}$, which, nevertheless, is difficult to achieve. The filtered hyperinterpolation is an approximator $V_{D,n}$ constructed from data $D=(\mathbf {x}_i, y_i)_{i=1}^N$ which lies in a slightly larger polynomial space $\varPi _{2n}$ and whose distance to $f^*$ is very close to the distance between $f^*$ and $\varPi _n$.

Motivated by the problem of handling massive amounts of data, we propose a distributed computational strategy based on filtered hyperinterpolation. As shown in the right part of Fig. 1, we can split estimation task of filtered hyperinterpolation into multiple servers $j=1,\ldots ,m$, each of which computes a filtered hyperinterpolation $V_{D_j,n}$, for a small subset $D_j$ of all the training data. It consists of creating a filtered expansion in terms of eigenfunctions of the manifold to best-fit the corresponding fraction of the training data set. The “best-fit” means that the local servers can achieve best approximation for noisy data $y_i=f^*(\mathbf {x}_i)+\epsilon _i$, $i=1,\ldots , N$, for any continuous function $f^*:\mathcal {M}\rightarrow \mathbb {R}$ on the manifold and independent bounded noise $\epsilon _i$. The central processor then takes a weighted average of the filtered hyperinterpolations obtained in the local servers to synthesize as a global estimator $V_{D,n}^{(m)}$. We call the global estimator the distributed filtered hyperinterpolation.

The remaining of the paper is organized as follows. In Sect. 2, we introduce the main mathematical settings and notation. Then, we proceed with the study of non-distributed and distributed filtered hyperinterpolation on manifolds, for which we derive upper bounds on the error. Our bounds depend on (1) the dimension d of the manifold and the smoothness r of the Sobolev space that contains the target function, (2) the degree n of the approximating polynomials, which is tied to the number N of available data points, (3) the smoothness of the filter, (4) the presence of noise in the output data points. Here, we base the analysis on properties of the quadrature formulas, which we couple with the arrangement of the input data points (deterministic or random). For the deterministic case, we require the quadrature rule has polynomial exactness of degree $3n-1$; for the random case, the condition that the volume measure on the manifold controls the distribution of the sampling points.

In Sect. 3, we study non-distributed filtered interpolation on manifolds. We obtain an error bound $\mathcal {O}_{}\left( N^{-r/d}\right) $ for the noiseless setting on general manifolds (see Theorem 6). This result generalizes the same bound that was previously obtained on the sphere [35]. Since the bound on the sphere is optimal, the new bound is also optimal. We further study learning with noisy output data. The error bound for the noisy case is $\mathcal {O}_{}\left( N^{-2r/(2r +d)}\right) $. Due to the impact of the noise, it does not entirely reduce to the error bound of the noiseless case. To the best of our knowledge, this is the first error upper bound for noisy learning on general Riemannian manifolds. The optimality of this bound remains open at this point.

In Sect. 4, we study distributed learning. We obtain similar rates of convergence as in the non-distributed setting, provided the number of servers satisfies a certain upper bound in terms of the total amount of data. As it turns out, the distributed estimator has the same convergence rate as the non-distributed estimator for a class of functions with given smoothness. Compared with the clean data case, the distributed filtered hyperinterpolation with noisy data has slightly lower convergence order than the non-distributed. See Theorems 13 and 14.

Section 5 illustrates definitions, methods, and convergence results on a concrete numerical example. Section 6 summarizes and compares the convergence rates of the different methods and settings (see Table 1). It also presents a concise description of the implementation (see Algorithm 1). All the proofs are deferred to Appendix A. The proofs utilize the wavelet decomposition of filtered hyperinterpolation, Marcinkiewicz–Zygmund inequality, Nikolskiî-type inequality on a manifold, bounds of best approximation on a manifold, and concentration inequality, estimates of covering number and bounds of sampling operators from learning theory. We also show a table of the notations used throughout the article in Appendix B for readers’ reference.

2 Preliminaries on Approximation on Manifolds

In this section, we discuss $L_p$ and Sobolev spaces of functions on manifolds, assumptions on the manifolds and embedding theorems to the space of continuous functions.

We start with a brief description of $L_p$ spaces and norms. Let $\mathcal {M}$ be a compact and smooth Riemannian manifold of dimension $d\ge 1$ with smooth or empty boundary and Riemannian measure $\mu $ normalized to have the total volume $\mu (\mathcal {M})=1$. For $1\le p<\infty $, let $L_{p}(\mathcal {M})=L_{p}(\mathcal {M},\mu )$ be the complex-valued $\mathbb {L}_{p}$-function space with respect to the measure $\mu $ on $\mathcal {M}$, endowed with the $\mathbb {L}_{p}$ norm

$$\begin{aligned} \Vert f\Vert _{L_{p}(\mathcal {M})}:=\left\{ \int _{\mathcal {M}}|f(\mathbf {x})|^{p}\mathrm {d}{\mu (\mathbf {x})}\right\} ^{1/p},\quad f\in L_{p}(\mathcal {M}). \end{aligned}$$

For $p=\infty $, let $L_{\infty }(\mathcal {M}):=C(\mathcal {M})$ be the space of continuous functions on $\mathcal {M}$ with norm

$$\begin{aligned} \Vert f\Vert _{L_{\infty }(\mathcal {M})}:= \sup _{\mathbf {x}\in \mathcal {M}}|f(\mathbf {x})|,\quad f\in C(\mathcal {M}). \end{aligned}$$

We will write $\Vert f(\mathbf {x})\Vert _{L_{p}(\mathcal {M}),\mathbf {x}}=\Vert f\Vert _{L_{p}(\mathcal {M})}$ to indicate the variable for integration when necessary. For $p=2$, $L_{2}(\mathcal {M})$ is a Hilbert space with inner product $\left( f,g\right) _{L_{2}(\mathcal {M})}:=\int _{\mathcal {M}}f(\mathbf {x})\overline{g(\mathbf {x})}\mathrm {d}{\mu (\mathbf {x})}$, $f,g\in L_{2}(\mathcal {M})$, where $\overline{g}$ is the complex conjugate to g.

2.1 Diffusion Polynomial Space

Diffusion polynomials are a generalization of regular polynomials. We will use them to construct approximations of real-valued functions on manifolds. Let $\mathbb {N}:=\{1,2,\dots \}$ be the set of positive integers and let $\mathbb {N}_{0}=\mathbb {N}\cup \{0\}$. Let $\varDelta $ be the Laplace–Beltrami operator on $\mathcal {M}$, which has a sequence of eigenvalues $\{\lambda _{\ell }\}_{\ell \in \mathbb {N}}$ and a corresponding sequence of orthonormal eigenfunctions $\{\phi _{\ell }\in L_{2}(\mathcal {M})\, |\, \varDelta \phi _{\ell }= -\lambda _{\ell }^{2}\, \phi _{\ell },\; \ell \in \mathbb {N}\}$. We let $\lambda _{0}:=0$ and $\phi _{0}:=1$. For $n\in \mathbb {N}_{0}$, the span $\varPi _{n}:=\mathrm {span}\{\phi _{\ell }| \lambda _{\ell }\le n\}$ is called the diffusion polynomial space of degree n on $\mathcal {M}$, and an element of $\varPi _{n}$ is called a diffusion polynomial of degree n. In the following, we will refer to diffusion polynomials simply as polynomials.

Let $\rho (\mathbf {x},\mathbf {y})$ be the geodesic distance of points $\mathbf {x}$ and $\mathbf {y}$ induced by the Riemannian metric on $\mathcal {M}$. For $\mathbf {x}\in \mathcal {M}$ and $\beta>\alpha >0$, let $B\left( \mathbf {x},\alpha \right) :=\{\mathbf {y}\in \mathcal {M}|\rho (\mathbf {x},\mathbf {y})\le \alpha \}$ be the ball with center $\mathbf {x}$ and radius $\alpha $, and let $B\left( \mathbf {x},\alpha ,\beta \right) := B\left( \mathbf {x},\beta \right) -B\left( \mathbf {x},\alpha \right) $ and $B\left( \mathbf {x},0,\beta \right) :=B\left( \mathbf {x},\beta \right) $. We make the following assumptions for the measure $\mu $ and the eigenfunctions of $\varDelta $ on $\mathcal {M}$. The first is a standard assumption about the regularity of the measure on the manifold.

Assumption 1

(Volume of ball) There exist positive constants $c,c'$, $c'\le c$, depending only upon the measure $\mu $ and the dimension d such that for all $\beta>\alpha >0$ and $\mathbf {x}\in \mathcal {M}$,

$$\begin{aligned} \mu \left( B\left( \mathbf {x},\alpha \right) \right) \le c \alpha ^{d},\quad \mu \left( B\left( \mathbf {x},\alpha ,\beta \right) \right) \le c' (\beta ^{d}-\alpha ^{d}). \end{aligned}$$

Here $c,c'$ are the constants depending only on d.

The second is an assumption stating that the space of polynomials is closed under multiplication.

Assumption 2

(Product of eigenfunctions) For $\ell ,\ell '\in \mathbb {N}_{0}$, the product of eigenfunctions $\phi _{\ell }\in \varPi _{n}$, $\phi _{\ell '}\in \varPi _{n'}$ for the Laplace–Beltrami operator $\varDelta $ on $\mathcal {M}$ is a polynomial of degree $n+n'$, i.e., $\phi _{\ell }\, \phi _{\ell '}\in \varPi _{n+n'}$.

Assumption 2 implies that the product $PP'$ of two polynomials $P\in \varPi _{n}$ and $P'\in \varPi _{n'}$ of degrees $n\in \mathbb {N}_{0}$ and $n'\in \mathbb {N}_{0}$, respectively, is a polynomial of degree $n+n'$. Assumptions 1 and 2 are satisfied by typical manifolds, such as hypercubes $[0,1]^{d}$, unit spheres and balls in real or complex Euclidean coordinate spaces [10, 19], flat tori $\mathbb {T}^{d}$, $d\ge 1$, and Grassmannians [4, 5], simplices in $\mathbb {R}^{d}$ [41, 48], with Lebesgue measures induced by the corresponding Riemannian metric, and also graphs equipped with an atomic measure on the graph nodes with the graph Laplacian defined as the difference of the identity matrix and the adjacency matrix [42].

2.2 Generalized Sobolev Spaces

We give a brief introduction to the Sobolev spaces on a Riemannian manifold $\mathcal {M}$. The Fourier coefficients for f in $L_{1}(\mathcal {M})$ are

$$\begin{aligned} \widehat{f}_{\ell }:=\int _{\mathcal {M}}f(\mathbf {x})\overline{\phi _{\ell }(\mathbf {x})} \mathrm {d}{\mu (\mathbf {x})}, \quad \ell =0,1,\dots . \end{aligned}$$

For $r>0$, the generalized Sobolev space $\mathbb {W}_{p}^{r}(\mathcal {M})$ may be defined as the set of all functions $f\in L_{p}(\mathcal {M})$ satisfying $\sum _{\ell =0}^{\infty }(1+\lambda _{\ell })^{r/2}\widehat{f}_{\ell }\, \phi _{\ell }\, \in \, L_{p}(\mathcal {M})$. The Sobolev space $\mathbb {W}_{p}^{r}(\mathcal {M})$ forms a Banach space with norm

$$\begin{aligned} \Vert f\Vert _{\mathbb {W}_{p}^{r}(\mathcal {M})}:=\Big \Vert \sum _{\ell =0}^{\infty }(1+\lambda _{\ell })^{r/2} \widehat{f}_{\ell }\, \phi _{\ell }\Big \Vert _{L_{p}(\mathcal {M})}. \end{aligned}$$

We let $\mathbb {W}_{p}^{0}(\mathcal {M}):=L_{p}(\mathcal {M})$.

In the context of numerical analysis, we need to use the following Lemma 1, which is an embedding theorem of Sobolev space into the space of continuous functions on a manifold, see, e.g., [1, Section 2.7]. It guarantees that any function in the Sobolev space has a representation by a continuous function so that the numerical integration is valid and the quadrature rule can be applied.

Lemma 1

Let $d\ge 1$ and $\mathcal {M}$ be a compact Riemannian manifold of dimension d. The Sobolev space $\mathbb {W}_{p}^{r}(\mathcal {M})$ is continuously embedded into $C(\mathcal {M})$ if $r>d/p$.

2.3 Filtered Approximation on Manifolds

This section defines the filtered polynomial approximation on a compact Riemannian manifold $\mathcal {M}$ in terms of the eigenfunctions of the Laplace–Beltrami operator $\varDelta $ on $\mathcal {M}$. Given a target function $f^*\in L_{p}(\mathcal {M})$ with $1\le p\le \infty $, the filtered polynomial approximation converges to functions in $L_{p}(\mathcal {M})$ as the degree n tends to infinity.

Filter A real-valued continuous compactly supported function on $\mathbb {R}_{+}$ is called a filter. Without loss of generality, we will only consider filters with support a subset of [0, 2]. In this paper, we focus on the following function $H$ on $\mathbb {R}_{+}$ as the filter.

Definition 1

(Filter H) Let $H$ be a function on $\mathbb {R}_{+}$ satisfying $H(t)=1$, $0\le t\le 1$; $H(t)=0$, $t\ge 2$, and $H\in C^{\kappa }(\mathbb {R}_{+})$ for some $\kappa \in \mathbb {N}$. Then, $H$ is called a filter.

Definition 2

(Filtered kernel) A filtered kernel of degree n for $n\in \mathbb {N}$ on $\mathcal {M}$ with filter $H$ is defined by

$$\begin{aligned} K_{n}^{}(\mathbf {x},\mathbf {y}):=K_{n,H}^{}(\mathbf {x},\mathbf {y}):=\sum _{\ell =0}^{\infty }H\Bigl (\frac{\lambda _{\ell }}{n}\Bigr )\,\phi _{\ell }(\mathbf {x})\overline{\phi _{\ell }(\mathbf {y})}. \end{aligned}$$

(1)

Here $\lambda _\ell $ and $\phi _\ell $ are eigenvalues and eigenfunctions of the Laplace–Beltrami operator on $\mathcal {M}$.

For a kernel $G:\mathcal {M}\times \mathcal {M}\rightarrow \mathbb {R}$ and $f\in L_{1}(\mathcal {M})$, the convolution of f with G is defined as

$$\begin{aligned} (G*f)(\mathbf {x}):= \int _{\mathcal {M}}G(\mathbf {x},\mathbf {z})f(\mathbf {z})\mathrm {d}{\mu (\mathbf {z})},\quad x\in \mathcal {M}. \end{aligned}$$

(2)

Definition 3

(Filtered approximation) We can define a filtered approximation $V_{n}^{}$ on $L_{1}(\mathcal {M})$ as an integral operator with the filtered kernel $K_{n,H}^{}(\cdot ,\cdot )$: for $f\in L_{1}(\mathcal {M})$ and $\mathbf {x}\in \mathcal {M}$,

$$\begin{aligned} V_{n}^{}(f;\mathbf {x}) := V_{n,H}^{}(f;\mathbf {x}):= (K_{n,H}^{}*f)(\mathbf {x}) :=\int _{\mathcal {M}}K_{n,H}^{}(\mathbf {x},\mathbf {z})f(\mathbf {z})\mathrm {d}{\mu (\mathbf {z})}. \end{aligned}$$

(3)

Note that for $n=0$ this is just the integral of f. By (1) and (3),

$$\begin{aligned} V_{n}^{}(f) = \sum _{\ell =0}^{\infty }H\left( \frac{\lambda _{\ell }}{n}\right) \widehat{f}_{\ell }\,\phi _{\ell }, \quad f\in L_{1}(\mathcal {M}). \end{aligned}$$

The following lemma, as given by Maggioni and Mhaskar [24, Theorem 4.1], shows that a filtered kernel is highly localized when the filter is sufficiently smooth.

Lemma 2

Let $d\ge 1$. Let $\mathcal {M}$ be a compact Riemannian manifold of dimension d. Let $H$ be a filter in $C^{\kappa }(\mathbb {R}_{+})$ with $\kappa \ge d+1$. Then, for $n\ge 1$,

$$\begin{aligned} \bigl |K_{n}^{}(\mathbf {x},\mathbf {y})\bigr | \le \frac{c\, n^{d}}{(1+n\rho (\mathbf {x},\mathbf {y}))^{\kappa }},\quad \mathbf {x},\mathbf {y}\in \mathcal {M}, \end{aligned}$$

(4)

where the constant c depends only on $d,H$ and $\kappa $ and $\rho (\mathbf {x},\mathbf {y})$ is the geodesic distance between $\mathbf {x}$ and $\mathbf {y}$.

By (4), if $\mathbf {y}$ is not close to $\mathbf {x}$, and $|K_n(\mathbf {x},\mathbf {y})|$ decays to zero with rate $n^{\kappa -d}$. It means given $\mathbf {x}$, the kernel $|K_n(\mathbf {x},\cdot )|$ is concentrated on a small neighborhood of $\mathbf {x}$, although it is supported on the whole manifold. This localization is essential to the boundedness of the filtered approximation operator.

Lemma 2 by Maggioni and Mhaskar [24, Eq. 6.28] implies the following estimate for the $L_p$-norm of the filtered kernel.

Lemma 3

Let $d\ge 1$ and $1\le p\le \infty $. Let $\mathcal {M}$ be a compact Riemannian manifold of dimension d. Let $H$ be a filter in $C^{\kappa }(\mathbb {R}_{+})$ with $\kappa \ge d+1$. Then, for $n\ge 1$ and $\mathbf {x}\in \mathcal {M}$,

$$\begin{aligned} \big \Vert K_{n}^{}(\cdot ,\mathbf {x})\big \Vert _{L_{p}(\mathcal {M})} \le c\, n^{d(1-1/p)}, \end{aligned}$$

where the constant c depends only on $d, p,H$ and $\kappa $.

Remark 1

For the case that $\mathcal {M}$ is a sphere, the above lemma for $p=1$ was proved by Wang et al. [38] (see also [27] for $\kappa \ge d+1$); the case $p> 1$ can be obtained from the case $p=1$ with the fact that $K_n\in \varPi _{2n}^d$ and the Nikolskiî inequality for spherical polynomials [25].

Using the interpolation theorem with (3) gives

$$\begin{aligned} \Vert V_{n}^{}(f)\Vert _{L_{p}(\mathcal {M})} \le \max _{\mathbf {x}\in \mathcal {M}}\Vert K_{n}(\cdot ,\mathbf {x})\Vert _{L_{1}(\mathcal {M})}\,\Vert f\Vert _{L_{p}(\mathcal {M})}. \end{aligned}$$

This with Lemma 3 implies the following boundedness of the filtered approximation on $L_{p}(\mathcal {M})$.

Theorem 3

Let $d\ge 1$, $1\le p\le \infty $. Let $\mathcal {M}$ be a compact Riemannian manifold of dimension d. Let $H$ be a filter in $C^{\kappa }(\mathbb {R}_{+})$ with $\kappa \ge d+1$. Then, for $n\ge 1$, the operator norm of $V_{n}^{}$ on $L_{p}(\mathcal {M})$

$$\begin{aligned} \Vert V_{n}^{}\Vert _{{p}\rightarrow {p}} \le c, \end{aligned}$$

where the constant c depends only on $d,H$ and $\kappa $.

Polynomial space and best approximation Recall $\varPi _n := {\text {span}}\{\phi _{\ell }|\lambda _{\ell }\le n\}$ is the (diffusion) polynomial space of degree n on manifold $\mathcal {M}$. Given $1\le p\le \infty $ and $n\in \mathbb {N}$, let $E_{n}(f)_{p}:=E_{n}(L_{p}(\mathcal {M});f):=\inf \bigl \{\big \Vert f-P\big \Vert _{L_{p}(\mathcal {M})} | P\in \varPi _{n}\bigr \}$ be the best approximation of degree n for $f\in L_{p}(\mathcal {M})$. Since $\cup _{n=0}^{\infty }\varPi _{n}$ is dense in $L_{p}(\mathcal {M})$, $E_{n}(f)_{p}$ goes to zero as $n\rightarrow \infty $.

The following theorem proves the convergence error for the filtered approximation of $f\in L_{p}(\mathcal {M})$.

Theorem 4

Let $d\ge 1$, $1\le p\le \infty $ and $\mathcal {M}$ be a compact Riemannian manifold of dimension d. Let $V_{n}^{}$ be the filtered approximation with filter $H$ given by Definition 1 satisfying $\kappa \ge d+1$. Then, for $f\in L_{p}(\mathcal {M})$ and $n\in \mathbb {N}_{0}$,

$$\begin{aligned} \big \Vert f-V_{n}^{}(f)\big \Vert _{L_{p}(\mathcal {M})} \le c\, E_{n}(f)_{p}, \end{aligned}$$

where the constant c depends only on d, $H$ and $\kappa $.

Remark 2

For $L_p([0,1])$, the case of filtered approximation with an appropriate filter reduces to a classic result of de la Vallée-Poussin approximation [20]. Stein [32] proved in a general context the convergence of de La Vallée-Poussin approximation to the target function. The sphere case of Theorem 4 was proved by Rustamov [28], Le Gia and Mhaskar [21], and Sloan [30].

The following lemma gives the convergence error of the best approximation for $f\in \mathbb {W}_{p}^{s}(\mathcal {M})$, see [24].

Lemma 4

Let $d\ge 1$, $1\le p\le \infty $, $r>0$, and $\mathcal {M}$ be a compact Riemannian manifold of dimension d. For $f\in \mathbb {W}_{p}^{r}(\mathcal {M})$ and $n\in \mathbb {N}$,

$$\begin{aligned} E_{n}(f)_{p} \le c\, n^{-r}\,\Vert f\Vert _{\mathbb {W}_{p}^{r}(\mathcal {M})}, \end{aligned}$$

where the constant c depends only on d, p and r.

Theorem 4 and Lemma 4 imply the following convergence order for the filtered approximation of a smooth function on a compact Riemannian manifold.

Theorem 5

Let $d\ge 1$, $1\le p\le \infty $ and $\mathcal {M}$ be a compact Riemannian manifold of dimension d. Let $V_{n}^{}$ be the filtered approximation with filter $H$ given by Definition 1 satisfying $\kappa \ge d+1$. Then, for $f\in \mathbb {W}_{p}^{r}(\mathcal {M})$ and $n\in \mathbb {N}$,

$$\begin{aligned} \big \Vert f-V_{n}^{}(f)\big \Vert _{L_{p}(\mathcal {M})} \le c\, n^{-r}\Vert f\Vert _{\mathbb {W}_{p}^{r}(\mathcal {M})}, \end{aligned}$$

where the constant c depends only on d, p, r, $H$ and $\kappa $.

In Sects. 3 and 4, we will study the non-distributed and distributed filtered hyperinterpolation which use single and multiple servers to find a global estimator, respectively. For both non-distributed and distributed learning by filtered hyperinterpolation, we need to take account of the data type (noise or noiseless) and the quadrature point type (deterministic or random). There are in total 8 cases for which we have to treat separately.

3 Non-Distributed Filtered Hyperinterpolation on Manifolds

In this section, we study the non-distributed version of filtered hyperinterpolation (NDFH) on a manifold. We consider the cases when the data is either clean or noisy, and the input samples are either deterministic or random. It turns out that the NDFH for clean data achieves the optimal convergence order of the approximation error, while noise on the data would reduce the convergence order.

Filtered hyperinterpolation is a special type of regression, and the primary tool that we will use. Within this approach, as introduced in Definition 3, a target function $f^*$ is approximated by the filtered polynomial approximation

$$\begin{aligned} \sum _{\ell =0}^{\infty } H(\lambda _{\ell }/n) (\widehat{f^*})_{\ell } \phi _{\ell }(\mathbf {x}). \end{aligned}$$

(5)

Here, H is a filter for the eigenvalues $\lambda _{\ell }$ to eigenfunctions $\phi _{\ell }$, and $(\widehat{f^*})_{\ell }$ are the Fourier coefficients. The Fourier coefficients cannot be computed in practice, because they would require to integrate the unknown target function. Instead, they are estimated from samples. This estimation is conducted via a quadrature formula,

$$\begin{aligned} (\widehat{f^*})_{\ell } = \langle f^*, \phi _{\ell }\rangle = \int _{\mathcal {M}} f^*(\mathbf {y}) \overline{\phi _{\ell }(\mathbf {y})} \mathrm {d}{\mu (\mathbf {y})} \approx \sum _{i=1}^N w_i f^*(\mathbf {x}_i)\overline{\phi _{\ell }(\mathbf {x}_i)}. \end{aligned}$$

We rewrite (5) as

$$\begin{aligned} \sum _{\ell =0}^{\infty } H(\lambda _{\ell }/n) (\widehat{f^*})_{\ell } \phi _{\ell }(\mathbf {x}) \approx \sum _{\ell =0}^{\infty } H(\lambda _{\ell }/n) \phi _{\ell }(\mathbf {x})\sum _{i=1}^N w_i f^*(\mathbf {x}_i)\overline{\phi _{\ell }(\mathbf {x}_i)}. \end{aligned}$$

After rearranging, our approximation takes the form

$$\begin{aligned} \sum _{i=1}^N w_i f^*(\mathbf {x}_i) K_n(\mathbf {x}_i,\mathbf {x}), \end{aligned}$$

(6)

which is a weighted sum of kernels $K_n(\mathbf {x}_i,\mathbf {x})=\sum _{\ell =0}^{\infty }H(\lambda _\ell /n)\overline{\phi _{\ell }(\mathbf {x}_i)}\phi _\ell (\mathbf {x})$ centered at the data locations $\mathbf {x}_i$. In practice, the estimator of (6) is scaled by the observed values $y_i$ instead of $f^*(\mathbf {x}_i)$.

In the following, we define the (non-distributed) filtered hyperinterpolation (approximation) on a compact Riemannian manifold $\mathcal {M}$ for a data set D. Besides the traditional deterministic quadrature rule, we also consider the filtered hyperinterpolation with random quadrature rule where the quadrature points are distributed with some probability measure on the manifold. We first introduce some notion about data and quadrature rule.

Data Let $\mathcal {M}$ be a compact Riemannian manifold of dimension d for $d\ge 1$. A data set $D=\{(\mathbf {x}_i,y_i)\}_{i=1}^{N}$, $N=|D|$ on the manifold $\mathcal {M}$ is a set of pairs of points $\varLambda _{D}:=\{\mathbf {x}_i\}_{i=1}^{N}$ on the manifold and real numbers $y_i$. Elements of D are called data points. The points $\mathbf {x}_i$ of $\varLambda _{D}$ are called input samples. The $y_i$ are called data values. A continuous function $f^*$ on the manifold is called an (ideal) target function for data D if

$$\begin{aligned} y_i=f^*(\mathbf {x}_i)+\epsilon _i,\quad i=1,\dots ,|D| \end{aligned}$$

(7)

for noises $\epsilon _i$.

Deterministic and random sampling In this paper, we consider two types of input samples depending on whether they are randomly sampled: the deterministic sampling and random sampling. The data D has random sampling if $\mathbf {x}_i$ are randomly chosen with respect some probability measure on $\mathcal {M}$. In contrast, D has deterministic sampling if the $\mathbf {x}_i$ are fixed.

Noisy and noiseless data We also distinguish data types by its data values $y_i$. We say D is noiseless data or clean data if $y_i$ is equal to the function value of the associated (ideal) target function value $f^*(\mathbf {x}_i)$ (that is, the noises $\epsilon _i\equiv 0$). We say D is noisy data if the noises $\epsilon _i$ in (7) are nonzero.

Quadrature rule A set

$$\begin{aligned} \mathcal {Q}_{D}=\{(w_{i},\mathbf {x}_{i})|w_{i}\in \mathbb {R}, \mathbf {x}_{i}\in \mathcal {M}, i=1,\dots ,N\} \end{aligned}$$

is said to be a quadrature rule for numerical integration on $\mathcal {M}$. We say $\mathcal {Q}_{D}$ is a positive quadrature rule if all weights $w_{i}>0$, $i=1,\dots ,N$. In this paper, we only consider positive quadrature rules.

For general Riemannian manifolds, the construction of quadrature formulas depends on the existence of designs on manifolds, which has been proved in [15]. See also [3]. On a flat torus, the weights are equal for a regular grid in $[-\pi ,\pi ]^d$. On the real spheres, numerically constructive spherical designs with double precision (cubature with equal weights) can be found in [45]. However, in general, the weights do not have to be equal. For example, for a hypercube $[0,1]^d$, non-equal quadrature rule examples can be found in [6]; for spheres or spherical caps, quadrature rules with non-equal area weights examples can be found in [19]; for complex spheres [40]; Grassmannians [5], where the weights are not necessarily equal.

Definition 4

(Non-distributed filtered hyperinterpolation) Let $D=\{(\mathbf {x}_i,y_i)\}_{i=1}^{|D|}$ be a data set on compact Riemannian manifold $\mathcal {M}$, $\mathcal {Q}_{D}=\{(w_{i},\mathbf {x}_{i})\}_{i=1}^{|D|}$ a positive quadrature rule on $\mathcal {M}$ and $H$ be a filter in Definition 2 on $\mathbb {R}_{+}$. For $n\in \mathbb {N}$, the non-distributed filtered hyperinterpolation (NDFH) for data D and quadrature rule $\mathcal {Q}_{D}$ is

$$\begin{aligned} V_{D,n}^{}(\mathbf {x}) := V_{D,n,H,\mathcal {Q}_{D}}^{}(\mathbf {x}) :=\sum _{i=1}^{|D|} w_{i}y_iK_{n,H}^{}(\mathbf {x},\mathbf {x}_i). \end{aligned}$$

(8)

If we let $D^*:=D^*(f^*):=\{(\mathbf {x}_i,f^*(\mathbf {x}_i))\}_{i=1}^N$ be the noiseless data for the ideal target function $f^*$ and data D, then (8) becomes

$$\begin{aligned} V_{D^*,n}^{}(\mathbf {x}):=V_{D^*,n}^{}(f^*,\mathbf {x}) := \sum _{i=1}^{|D|} w_{i}f^*(\mathbf {x}_i)K_{n,H}^{}(\mathbf {x},\mathbf {x}_i). \end{aligned}$$

(9)

We call $V_{D^*,n}^{}(f^*)$ non-distributed filtered hyperinterpolation (NDFH) for clean data set or for quadrature rule $\mathcal {Q}_{D}$, for the function $f^*$.

Remark 3

Non-distributed filtered hyperinterpolation on the sphere was studied by Sloan and Womersley [31].

3.1 Non-Distributed Filtered Hyperinterpolation for Clean Data

We first assume that we have a quadrature rule that has polynomial exactness. That is, the weighted sum by the quadrature rule can recover the integral for polynomials on manifolds. The non-distributed filtered hyperinterpolation with polynomial-exact quadrature rule can reach the same optimal convergence order as the filtered approximation in Sect. 2.3, and the convergence rate is optimal.

Let $\ell \in \mathbb {N}_{0}$. A positive quadrature rule $\mathcal {Q}_{D}:=\mathcal {Q}(\ell ,N):=\{(w_{i},\mathbf {x}_{i})\}_{i=1}^{N}$ on $\mathcal {M}$ is said to be exact for degree $\ell $ if for all polynomials $P\in \varPi _{\ell }$,

$$\begin{aligned} \int _{\mathcal {M}}P(\mathbf {x})\mathrm {d}{\mu (\mathbf {x})} = \sum _{i=1}^{N} w_{i}P(\mathbf {x}_{i}). \end{aligned}$$

That the quadrature is exact for polynomials is a strong assumption, as the optimal-order number of points is $\mathcal {O}_{}\left( N^d\right) $ in typical examples of manifolds, see, e.g., [6, 19].

The following lemma shows that the filtered hyperinterpolation $V_{D,n}^{}$ with filter $H$ given by Definition 1 reproduces polynomials of degree up to n if the associated quadrature rule $\mathcal {Q}_{D}$ is exact for degree $3n-1$.

Lemma 5

Let $n\in \mathbb {N}_{0}$ and $\mathcal {M}$ be a d-dimensional compact Riemannian manifold. Let $\mathcal {Q}_{D}:= \{(w_{i},\mathbf {x}_{i})\}_{i=1}^{N}$ be a positive quadrature rule on $\mathcal {M}$ exact for polynomials of degree up to $3n-1$ and let $V_{D^*,n}^{}$ be a non-distributed filtered hyperinterpolation on $\mathcal {M}$ for quadrature rule $\mathcal {Q}_{D}$ with filter $H$ given by Definition 1. Then,

$$\begin{aligned} V_{D^*,n}^{}(P) = P,\quad P\in \varPi _{n}. \end{aligned}$$

Theorem 6

(NDFH with clean data and deterministic samples) Let $d\ge 1$, $1\le p\le \infty $ and $n\ge 1$. Let $\mathcal {M}$ be a compact Riemannian manifold of dimension d. Let $H$ be a filter given by Definition 1 with $\kappa \ge d+1$ and $\mathcal {Q}_{D}$ be a positive quadrature rule exact for polynomials of degree up to $3n-1$. Then, for $f^*\in \mathbb {W}_{p}^{r}(\mathcal {M})$ with $r>d/p$, the NDFH for the quadrature rule $\mathcal {Q}_{D}$ has the error upper bounded by

$$\begin{aligned} \big \Vert f^*-V_{D^*,n}^{}(f^*)\big \Vert _{L_{p}(\mathcal {M})} \le c\, n^{-r}\Vert f^*\Vert _{\mathbb {W}_{p}^{r}(\mathcal {M})}, \end{aligned}$$

(10)

where the constant c depends only on d, p, r, $H$ and $\kappa $.

From the perspective of information-based complexity it is interesting to observe that if the target function $f^*$ is in the Sobolev space $\mathbb {W}_{p}^{s}(\mathcal {M})$, $s>0$, the convergence rate is optimal in the sense of optimal recovery. This is due to the fact that on a real unit sphere when one uses optimal-order number of points $N =\mathcal {O}_{}\left( n^d\right) $, the order $n^{-s}=N^{-s/d}$ in (10) is optimal, as proved by Wang and Sloan [35] and Wang and Wang [36]. Theorem 6 can be viewed as the non-distributed filtered hyperinterpolation for clean data, where the estimator uses the whole data set in one machine.

We now introduce the (non-distributed) filtered hyperinterpolation for clean data with random sampling. We say a data set D has random sampling (with distribution $\nu $) if the sampling points $\mathbf {x}_i$ of D are independent and identically distributed (i.i.d.) random points with distribution $\nu $ on $\mathcal {M}$. To construct the filtered hyperinterpolation for random sampling, we need the following lemma, which shows that there exist N quadrature weights given N i.i.d. random points $\mathbf {x}_i$ such that the resulting quadrature rule is exact for polynomials for degree n with high probability. For $1\le p\le \infty $, let $L_{p,\nu }(\mathcal {M})$ be $L_p$ space on manifold $\mathcal {M}$ with respect to probability measure $\nu $.

Lemma 6

(Quadrature rule for random samples) For $N\ge 2$, let $X_N=\{\mathbf {x}_i\}_{i=1}^{N}$ be a set of N i.i.d. random points on $\mathcal {M}$ with distribution $\nu $, where $\nu $ satisfies

$$\begin{aligned} \Vert f\Vert _{L_1(\mathcal {M})} \le c \Vert f\Vert _{L_{1,\nu }(\mathcal {M})}\quad \forall f\in L_1(\mathcal {M})\cap L_{1,\nu }(\mathcal {M}), \end{aligned}$$

(11)

for a positive absolute constant c. Then, for integer n satisfying $N/n^{2d}>c$ for sufficiently large constant c, there exists a quadrature rule $\{(\mathbf {x}_i,w_{i,n}^*)\}_{i=1}^{N}$ such that

$$\begin{aligned} \int _{\mathcal {M}}P_{n}(\mathbf {x})\mathrm {d}\nu (\mathbf {x})= \sum _{i=1}^{N}w_{i,n}^*P_{n}(\mathbf {x}_i) \quad \forall P_n\in \varPi _n^d \end{aligned}$$

holds, and $\sum _{i=1}^{N}| w_{i,n}^*|^2\le 2/N$, with confidence at least $1-4\exp \left\{ -CN/n^d\right\} $, where C is a constant depending only on dimension d.

Definition 5

(Quadrature rule for random samples) We modify the weights of the set $\{(\mathbf {x}_i,w_{i,n}^*)\}_{i=1}^{N}$ in Lemma 6, as follows.

1.
If $\text{ the } \text{ event }~ \sum _{i=1}^{N}| w_{i,n}^*|^2\le 2/N ~\text{ happens }$, we let $w_{i,n} = w_{i,n}^*\,\, \forall i=1,\dots ,N$.
2.
If $\text{ the } \text{ event }~ \sum _{i=1}^{N}| w_{i,n}^*|^2>2/N ~\text{ happens }$, we let $w_{i,n} \equiv 0$.

We call the set $\{(\mathbf {x}_i,w_{i,n})\}_{i=1}^{N}$ the quadrature rule for random samples on the manifold $\mathcal {M}$ for measure $\nu $.

The following theorem gives the approximation error of the non-distributed filtered hyperinterpolation with clean data and random sampling for sufficiently smooth functions. Here, we want to obtain an estimated value of the expected error and take the expectation over the distribution of the data P(X)P(Y|X).

Theorem 7

(NDFH with clean data and random samples) Let $d\ge 2$ and $r>d/2$. Let the clean data set $D^*$ with i.i.d. random sampling points $\{\mathbf {x}_i\}_{i=1}^{|D^*|}$ on $\mathcal {M}$ and distribution $\nu $ satisfying (11). Given some $\tau $, $0<\tau \le d$, for $cn^{d+\tau }\le |D^*|\le c' n^{2d}$ with two positive constants $c,c'$, the filtered hyperinterpolation $V_{D^*,n}$ for clean data set $D^*$ with target function $f^*\in {\mathbb {W}}_2^r(\mathcal {M})$, as given by (9) with the quadrature rule for random samples $\{(\mathbf {x}_i,w_{i,n})\}_{i=1}^{|D^*|}$ in Definition 5, has the approximation error

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_1 |D^*|^{-\frac{2r}{d}}, \end{aligned}$$

where $C_1:=c_5^2\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2 + c'\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$, where $c_5$ and $c'$ depend only on $\mu (\mathcal {M}), c_1, d, r$, and the filter H and its smoothness $\kappa $, and C from Lemma 6.

Theorem 7 shows that the filtered hyperinterpolation with random sampling for clean data can achieve the same optimal convergence rate as the filtered hyperinterpolation with deterministic sampling. We give the proof of Theorem 7 in Sect. A.1.

3.2 Non-Distributed Filtered Hyperinterpolation for Noisy Data

In the following, we describe non-distributed filtered hyperinterpolation with deterministic or random sampling for noisy data. The data $y_i$ are the values of a function $f^*$ on $\mathcal {M}$ plus noise. Here, we assume that the noise has mean zero and is bounded. To be precise, we let

$$\begin{aligned} y_i=f^*(\mathbf {x}_i)+\epsilon _i, \quad \mathbf {E}[\epsilon _i]=0,\quad |\epsilon _i|\le M \quad \forall i=1,\dots ,|D|. \end{aligned}$$

(12)

The D satisfying (12) is then called noisy data set associated with $f^*$. For real data, the $f^*$ is an unknown mapping from input to output. We study the performance of the non-distributed filtered hyperinterpolation for a noisy data set D whose data are stored in a sufficiently big machine.

We first consider the case where the locations of sampling points are fixed, which we call filtered hyperinterpolation with deterministic sampling. The kernel $K_n$ provides a smoothing method for the function $f^*$ using data D. As we shall see below, the approximation error of this filtered hyperinterpolation has the convergence rate depending on the smoothness of function $f^*$. The following assumes that there exists a quadrature rule with N nodes and N “almost equal” weights which are exact for polynomials of degree approximately $N^{1/d}$.

Assumption 8

(Polynomial-exact quadrature) Let $\mathcal {M}$ be a d-dimensional compact Riemannian manifold. For a point set $X_N:=\{\mathbf {x}_1,\dots ,\mathbf {x}_N\}\subset \mathcal {M}$, there exist N positive weights $\{w_j\}_{j=1}^N$ and constants $c_2$ and $c_3$ such that $0<w_j<c_2 N^{-1}$ and

$$\begin{aligned} \int _{\mathcal {M}}f(\mathbf {x})\mathrm {d}{\mu (\mathbf {x})} = \sum _{j=1}^{N}w_j f(\mathbf {x}_j)\quad \forall f\in \varPi _{c_3 N^{1/d}}. \end{aligned}$$

(13)

Remark 4

For the sphere of any dimension, Assumption 8 always holds [26]. In order to construct the quadrature rule for general Riemannian manifolds, one needs to find weights that make the worst case error vanish. This corresponds to solving a particular equation

$$\begin{aligned} \sum _{i,j=1}^N \omega _i\omega _j \mathcal {K}(\mathbf {x}_i,\mathbf {x}_j)=0\quad \text {subject to } \sum _{i=1}^N \omega _i=1, \end{aligned}$$

where $\mathcal {K}(\mathbf {x}_i,\mathbf {x}_j)$ is the reproducing kernel removing the constant 1 of $\varPi _{c_3 N^{1/d}}$, given by $\mathcal {K}(\mathbf {x}_i,\mathbf {x}_j):=\sum _{\lambda _{\ell }\le c_3 N^{1/d}}\phi _{\ell }(\mathbf {x}_i)\phi _{\ell }(\mathbf {x}_j)$, where $\mathbf {x}_i,\mathbf {x}_j\in \mathcal {M}$.

The following theorem shows that the filtered hyperinterpolation $V_{D,n}$ can approximate $f^*$ well, provided that the support of the filtered kernel is appropriately tuned and Assumption 8 holds.

Theorem 9

(NDFH for noisy data and deterministic samples) Let $d\ge 2$ and $r>d/2$. The sampling point set of the data set D satisfies Assumption 8. Then, for $\frac{c_3}{6} |D|^{\frac{1}{2r+d}} \le n\le \frac{c_3}{2} |D|^{\frac{1}{2r+d}}$ with constant $c_3$ in (13), the filtered hyperinterpolation $V_{D,n}$ for noisy data set D with target function $f^*\in {\mathbb {W}}_2^r(\mathcal {M})$ satisfies

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_2 |D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

(14)

where $C_2:=4^r c_5^2 c_3^{-2r} \Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + c_1 c_2^2 c_3^d M^2$ is a constant only depending upon r, d, M and the Sobolev norm of the target function $f^*$, where M is the upper bound for the noise $\epsilon $ as given by (12), and $c_1, c_2, c_3, c_5$ are constants depending only on d, r, the filter H and its smoothness r, where $c_1$ is the constant c in Lemma 3, and $c_2, c_3$ are from Assumption 8.

Here, in contrast to Theorem 6, y contains noise. The expectation in (14) is with respect to the noise on y. The variance of the noise enters in $C_1$.

Remark 5

Here the condition $r>d/2$ is the embedding condition such that any function in ${\mathbb {W}}_2^r(\mathcal {M})$ has a representation of a continuous function on $\mathcal {M}$, which makes quadrature rule of filtered hyperinterpolation feasible for numerical computation.

Theorem 9 illustrates that if the scattered data $\varLambda _{D}$ has polynomial exactness, and the support of the filter $\eta $ is appropriately chosen, then the filtered hyperinterpolation for noisy data set D can approximate sufficiently smooth target function $f^*$ on the manifold in high precision in probabilistic sense. By Györfi et al. [17], the rate $|D|^{-2r/(2r+d)}$ in (14) cannot be essentially improved in the scenario of (12). Theorem 9 thus provides a feasibility analysis of the filtered hyperinterpolation for manifold-structured data with random noise.

Now, we introduce the (non-distributed) filtered hyperinterpolation for noisy data with random sampling. Let $D=\{(\mathbf {x}_i,y_i)\}_{i=1}^{|D|}$ where the $\mathbf {x}_i$ are i.i.d. random points with distribution $\nu $ on $\mathcal {M}$. The following theorem gives the approximation error of the non-distributed filtered hyperinterpolation for sufficiently smooth functions. Here, we want to get an estimated value of the expected error and take the expectation over the distribution P(X)P(Y|X) of the data. For two sequences $\{a_n\}_{n=1}^{\infty }, \{b_n\}_{n=1}^{\infty }$, $a_n\asymp b_n$ means that there exist constants $c'$, c such that $c' b_n\le a_n\le c b_n$.

Theorem 10

(NDFH for noisy data and random samples) Let $d\ge 2$ and $r>d/2$. Let the noisy data set D take i.i.d. random sampling points $\{\mathbf {x}_i\}_{i=1}^{|D|}$ on $\mathcal {M}$ with distribution $\nu $ satisfying (11). For $n\asymp |D|^{1/(2r+d)}$, the filtered hyperinterpolation $V_{D,n}$ given by (8) with the quadrature rule for random samples $\{(\mathbf {x}_i,w_{i,n})\}_{i=1}^{|D|}$ in Definition 5, with target function $f^*\in {\mathbb {W}}_2^r(\mathcal {M})$, has the approximation error

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_3|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

where $C_3:=c''M^2+c_5^2\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2+c_5'\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$, where M is the upper bound for the noise $\epsilon $ as given by (12), and $c'', c_5, c'_5$ are constants depending only on $\mu (\mathcal {M}), d, r$, and filter H and its smoothness $\kappa $, and C from Lemma 6.

Theorems 9 and 10 show that the filtered hyperinterpolation approximations with deterministic sampling and random sampling can achieve the same optimal convergence rate. We give the proofs of Theorems 9 and 10 in Sect. A.2.

4 Distributed Filtered Hyperinterpolation on Manifolds

In this section, we describe the distributed learning by filtered hyperinterpolation for clean data with deterministic and random sampling’s.

Distributed data sets We say a large data set D is distributively stored in m local servers if for $j=1,\dots ,m$, $m\ge 2$, the jth server contains a subset $D_j$ of D, and there is no common data between any pair of servers, that is, $D_j\cap D_{j'}=\emptyset $ for $j\ne j'$, and $D=\cup _{j=1}^m D_j$. The data sets $D_1,\dots ,D_m$ are called distributed data sets of D. In this case, the filtered hyperinterpolation $V_{D,n}$ which needs access to the entire data set D is infeasible. Instead, in this section, we construct a distributed filtered hyperinterpolation for the distributed data sets $\{D_j\}_{j=1}^{m}$ of D by the divide and conquer strategy [22].

Definition 6

(Distributed filtered hyperinterpolation) Let $D:={(\mathbf {x}_i,y_i)}_{i=1}^N$ be a data set on manifold $\mathcal {M}$. The distributed filtered hyperinterpolation (DFH) for distributed data sets $\{D_j\}_{j=1}^{m}$ of D is a synthesized estimator of local estimators $V_{D_j,n}$, $j=1,2,\dots ,m$, each of which is the filtered hyperinterpolation for noisy data $D_j$:

$$\begin{aligned} V_{D,n}^{(m)}(\mathbf {x}):=V_{D,n}(\{D_j\}_{j=1}^{m};\mathbf {x}) := \sum _{j=1}^m \frac{|D_j|}{|D|} V_{D_j,n}(\mathbf {x}),\quad \mathbf {x}\in \mathcal {M}, \end{aligned}$$

(15)

where for $j=1,\dots ,m$, the local estimator is a filtered hyperinterpolation on $D_j$:

$$\begin{aligned} V_{D_j,n}(\mathbf {x})= \sum _{\mathbf {x}_i\in D_j} w_{i}y_i K_n(\mathbf {x},\mathbf {x}_i). \end{aligned}$$

For noiseless data sets $D^*=\{(\mathbf {x}_i,f^*(\mathbf {x}_i))\}_{i=1}^N$ and $D^*_j$ associated with the target function $f^*$, denote the distributed filtered hyperinterpolation by

$$\begin{aligned} V_{D^*,n}^{(m)}(\mathbf {x}) = \sum _{j=1}^m \frac{|D^*_j|}{|D^*|} V_{D^*_j,n}(\mathbf {x}),\quad \mathbf {x}\in \mathcal {M}. \end{aligned}$$

(16)

The synthesis here is a process when the local estimators communicate to a central processor to produce the global estimator $V_{D,n}^{(m)}$. The weight in the sum of (15) for each local server is proportional to the amount of data used in the server.

4.1 Distributed Filtered hyperinterpolation for Clean Data

The synthesis here is a process when the local estimators communicate to a central processor to produce the global estimator $V_{D,n}^{(m)}$. Like the non-distributed case, we start with the case of deterministic sampling. The following theorem shows that the distributed filtered hyperinterpolation $V_{D,n}^{(m)}$ has a similar approximation performance as the non-distributed $V_{D,n}$ when the number of local servers is not too large as compared with the amount of data.

Theorem 11

(DFH for clean and deterministic data) Let $d\ge 2$ and $1\le p\le \infty $, $r>d/p$, $n\in \mathbb {N}$, $m\ge 2$ and $D^*$ a clean data set satisfying (12). Let $\{D^*_j\}_{j=1}^{m}$ be m distributed data sets of $D^*$ satisfying $\min _{j=1,\dots ,m}|D_j^*|\ge |D^*|^{\frac{d}{2r+d}}$. Let $H$ be a filter given by Definition 1 with $\kappa \ge d+1$. For $j=1,\dots ,m$, the data set $D^*_j$ on the jth server satisfies that $\mathcal {Q}_{D_j^*}$ is a positive quadrature rule exact for polynomials of degree up to $3n-1$. Then, for $f^*\in \mathbb {W}_{p}^{s}(\mathcal {M})$ with $s>d/p$, the distributed filtered hyperinterpolation $V_{D_j^*,n}^{(m)}$ with $\frac{c_3}{6}|D^*|^{\frac{1}{2r+d}}\le n\le \frac{c_3}{3}|D^*|^{\frac{1}{2r+d}}$ for the constant $c_3$ in (13) has the error

$$\begin{aligned} \big \Vert V_{D^*,n}^{(m)}-f^*\big \Vert _{L_{2}(\mathcal {M})} \le C_4 |D^*|^{-\frac{r}{2r+d}}. \end{aligned}$$

where $C_4:=6^rc_3^{-r} c_5 \Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}$ with constant $c_5$ only dependent on d, r and the filter H and its smoothness $\kappa $.

Theorem 11 illustrates that the distributed filtered hyperinterpolation for the deterministic samples has a slightly slower approximation rate as the non-distributed case of Theorem 6, where the latter processes all the distributed data sets in a single server.

Remark 6

The condition $V_{D_j^*,n}^{(m)}$ with $\frac{c_3}{6}|D^*|^{\frac{1}{2r+d}}\le n\le \frac{c_3}{3}|D^*|^{\frac{1}{2r+d}}$ and $\min _{j=1,\dots ,m}|D_j^*| \ge |D^*|^{\frac{d}{2r+d}}$ imply the existence of the quadrature rule exact for polynomials of degree n for the points set of each local server $D_j^*$.

The distributed filtered hyperinterpolation with random sampling is a weighted average of individual non-distributed filtered hyperinterpolations on local servers, where each weight is in proportion to the amount of the data used by the corresponding local server. To be precise:

Definition 7

(DFH for random samples) Let $\{D_j\}_{j=1}^{m}$ be a m-partition for a data set $D:=\{(\mathbf {x}_i,y_i)\}_{i=1}^N$ on manifold $\mathcal {M}$ for $m\le 2$, and their points are random samples. Denote each subset by $D_j=\{(\mathbf {x}_i^{(j)},y_i^{(j)})\}_{i=1}^{|D_j|}$, and we assign the weights to $D_j$ to make a quadrature rule for random samples $\{(\mathbf {x}_i^{(j)},w_i^{(j)})\}_{i=1}^{|D_j|}$ that satisfy Definition 5 and Lemma 6. Then, the distributed filtered hyperinterpolation $V_{D,n}^{(m)}$ for D with random samples is defined by Definition 6, where each $V_{D_j,n}$ uses the quadrature rule $\{(\mathbf {x}_i^{(j)},w_i^{(j)})\}_{i=1}^{|D_j|}$.

The $V_{D,n}^{(m)}$ is well-defined as the points $\{x_{i}^{(j)}\}_{i=1}^{|D_j|}$ for each j are i.i.d. for a common distribution $\mu $ in Lemma 6.

For random samples, we first consider the clean data case. Let $V_{D^*_j,n}$ be the non-distributed filtered hyperinterpolation for clean data $D^*_j$ with random sampling points. We define the global estimator $V_{D^*,n}^{(m)}$ as (15). In the following theorem, we show that the approximation error for $V_{D^*,n}^{(m)}$ on d-manifold converges at rate $|D^*|^{-\frac{2r}{2r+d}}$ where r is the smoothness of the target function. It has the same convergence rate as the distributed case for deterministic samples in Theorem 11.

Theorem 12

(DFH for clean data and random samples) Let $d\ge 2$, $r>d/2$, $m\ge 2$ and clean data set $D^*$ with and its m partition sets $D^*_j$, $j=1,\dots ,m$. The sampling points $\{\mathbf {x}_i\}_{i=1}^{|D|}$ are i.i.d. random points on $\mathcal {M}$ with distribution $\mu $ in (11). If $n\asymp |D|^{1/(2r+d)}$ and $\min _{j=1,\dots ,m}|D_j|\ge |D|^{\frac{d+\tau }{2r+d}}$ for some $\tau $ in (0, 2r), then for the target function $f^*\in {\mathbb {W}}_2^r(\mathcal {M})$, the distributed filtered hyperinterpolation $V_{D^*,n}^{(m)}$ in Definition 7 has the approximation error

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*,n}^{(m)}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_5 |D^*|^{-\frac{2r}{2r+d}}, \end{aligned}$$

where $C_5:=c'\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2+c''\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$, where $c', c''$ depend only on $\mu (\mathcal {M}), d, r$, and the filter H and its smoothness r, and constant C from Lemma 6.

The proofs of Theorems 11 and 12 are deferred to Sect. A.3.

4.2 Distributed Filtered Hyperinterpolation for Noisy Data

In this subsection, we describe the distributed learning by filtered hyperinterpolation for noisy data with deterministic and random sampling. As shown in the following theorem, we prove that the distributed filtered hyperinterpolation $V_{D,n}^{(m)}$ has similar approximation performance as the non-distributed $V_{D,n}$ if the number of local servers is not large or each server has a sufficient amount of data.

Theorem 13

(DFH for noisy data and deterministic samples) Let $d\ge 2$, $r>d/2$, $m\ge 2$ and D a noisy data set satisfying (12). Let $\{D_j\}_{j=1}^{m}$ be m distributed data sets of D. For $j=1,\dots ,m$, the sampling point set $\varLambda _{D_j}$ of D satisfies Assumption 8. For the distributed filtered hyperinterpolation $V_{D,n}^{(m)}$ given by Definition 7, if $\{D_j\}_{j=1}^{m}$ satisfies that the target function $f^*$ is in ${\mathbb {W}}_2^r(\mathcal {M})$, $\frac{c_3}{6}|D|^{\frac{1}{2r+d}}\le n\le \frac{c_3}{3}|D|^{\frac{1}{2r+d}}$ for the constant $c_3$ in (13), and $\min _{j=1,\dots ,m}|D_j|\ge |D|^{\frac{d}{2r+d}}$, then $V_{D,n}^{(m)}$ has the approximation error

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}^{(m)}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_6 |D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

where $C_6 = 2^{2r+1}\cdot 3^{2r} c_5^2 c_3^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + 3^{-d} c_1 c_2^2 c_3^{d} M^2$, where M is the upper bound for the noise $\epsilon $ as given by (12), and $c_1, c_2, c_3, c_5$ are constants depending only on d, r, the filter H and its smoothness r, where $c_1$ is the constant c in Lemma 3, and $c_2, c_3$ are from Assumption 8.

The distributed filtered hyperinterpolation for deterministic sampling has the same order $|D|^{-\frac{2r}{2r+d}}$ of the approximation error as compared to the non-distributed case in Theorem 9. Thus, appropriately distributing data to local servers, the divide-and-conquer strategy does not reduce the approximation capability of filtered hyperinterpolation. We will see that it is also true when the sampling is random.

Remark 7

Suppose each server takes the same number of data. With less than $|D|^{\frac{2r}{2r+d}}$ servers, the $L_2$ error for the product space $\varOmega \times L_2(\mathcal {M})$ converges at rate $|D|^{\frac{1}{1+d/(2r)}}$. The condition $\min _{j=1,\dots ,m}|D_j|\ge |D|^{\frac{d}{2r+d}}$ has a close connection to the number m of local servers. In particular, if $|D_1|=\dots =|D_m|$, the condition $\min _{j=1,\dots ,m}|D_j|\ge |D|^{\frac{d}{2r+d}}$ is equivalent with $m\le |D|^\frac{r}{r+d/2}$.

When the data D is noisy with random sampling points, the distributed $V_{D,n}^{(m)}$ in (15) has the same approximation rate as the non-distributed case in Theorem 10.

Theorem 14

(DFH for noisy data and random samples) Let $d\ge 2$, $r>d/2$, $m\ge 2$ and D a noisy data set satisfying (12). The sampling points are i.i.d. random points on $\mathcal {M}$ with distribution $\mu $ in (11). If the target function $f^*\in {\mathbb {W}}_2^r(\mathcal {M})$, $n\asymp |D|^{1/(2r+d)}$ and $\min _{j=1,\dots ,m}|D_j|\ge |D|^{\frac{d+\tau }{2r+d}}$ for some $\tau $ in (0, 2r), then the distributed filtered hyperinterpolation $V_{D,n}^{(m)}$ for noisy D, as given by Definition 7, has the approximation error

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}^{(m)}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_7|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

(17)

where $C_7:=c'M^2+c''\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2+c'''\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$, where $c', c''$ and $c'''$ depend only on $\mu (\mathcal {M}), d, r$, and the filter H and its smoothness r, and constant C from Lemma 6.

From Theorems 11 to 14, the distributed filtered hyperinterpolations for all cases (in terms of data type and sampling type) always have a lower rate than the non-distributed cases of Theorems 6 and 7. But the noise added to the data also deteriorates the rate of the non-distributed cases of Theorems 9 and 10, which leads to the same rate as the distributed cases.

Remark 8

Note that for the distributed filtered hyperinterpolation for noisy data, the approximation rate $|D|^{-2r/(2r+d)}$ in (17) is the same as Theorem 13 when the sampling points are deterministic. It means with appropriate random distribution of the sampling points, the randomness of sampling does not reduce the approximation performance of distributed filtered hyperinterpolation. If $|D_1|=\cdots =|D_m|$, the condition $\min _{j=1,\dots ,m}|D_j|\ge |D|^{\frac{d+\tau }{2r+d}}$ is equivalent with $m\le |D|^\frac{r-\tau /2}{r+d/2}$.

We postpone the proofs of Theorems 13 and 14 to Sect. A.4.

5 Examples and Numerical Evaluation

We illustrate the notions and filtered hyperinterpolation for single and multiple servers on the 2-d mathematical torus $\mathbb {T}^{2}$. The torus $\mathbb {T}^{2}$ can be parameterized by the product of unit circles $\mathbb {S}^{1}\times \mathbb {S}^{1}$ and is equivalent to $[-\pi ,\pi ]^2$. Denote $L_2(\mathbb {T}^{2})$ the $L_2$ space on $\mathbb {T}^{2}$ with the Lebesgue measure. On the manifold $\mathbb {T}^{2}$, the Laplacian

$$\begin{aligned} \varDelta :=\frac{\partial ^{2}}{\partial x_{1}^{2}} + \frac{\partial ^{2}}{\partial x_{2}^{2}} \end{aligned}$$

is the Laplace–Beltrami operator with eigenfunctions $\{\frac{1}{2\pi }\exp ({\mathrm {i}}\, \mathbf {k}\cdot \mathbf {x})\}_{\mathbf {k}\in \mathbb {Z}^{2}}$ of $\mathbf {x}\in \mathbb {T}^{2}$ and eigenvalues $\{|\mathbf {k}|^{2}\}_{\mathbf {k}\in \mathbb {Z}^{2}}$, where ${\mathrm {i}}:=\sqrt{-1}$ is the imaginary unit, $\mathbf {k}\cdot \mathbf {x}=k_{1}x_{1}+k_2 x_2$ and $|\mathbf {k}|:=\sqrt{k_1^2+k_2^2}$. Here $\mathbf {k}=(k_1,k_2)$ and $\mathbf {x}=(x_1,x_2)$. The space of polynomials of degree n is $\varPi _n:=\mathrm{span}\{\frac{1}{2\pi } e^{{\mathrm {i}}\, \mathbf {k}\cdot \mathbf {x}}: |\mathbf {k}|\le n\}$. For $1\le p\le \infty $, let $\mathbb {L}_{p}(\mathbb {T}^{d})$ be the $\mathbb {L}_{p}$ space with respect to the normalized Lebesgue measure $\,\mathrm {d}{\mathbf {x}}$ on $\mathbb {T}^{2}$.

For our illustration, we define the filter $H$ by the piece-wise polynomial function with $H(t)=1$ for $0\le t\le 1$;

$$\begin{aligned} H(t)&= 1 + (t-1)^6\bigl [-462 + 1980(t-1) - 3465(t-1)^2 + 3080(t-1)^3 \nonumber \\&\quad -\, 1386(t-1)^4 + 252(t-1)^5\bigr ] \end{aligned}$$

(18)

for $t\in (1,2)$; and $H(t)=0$ for $t\ge 2$. Then, $H$ is in $C^5(\mathbb {R}_{+})$ and satisfies Definition 1. Figure 2 shows the plot of this filter. This particular filter has been used in previous works for the sphere, see [31, 37,38,39]. We observe that the filter, which is constant 1 over [0, 1], enables the filtered approximation and filtered hyperinterpolation of degree n (as given below) to reproduce polynomials with degree up to n on $\mathbb {T}^{2}$. The finite support [0, 2] of H makes the filtered hyperinterpolation a polynomial of degree up to $2n-1$. The middle polynomial over [1, 2] which is sufficiently smooth at the two ends modifies the Fourier coefficients from degree $n+1$ to $2n-1$ and makes the resulting filtered hyperinterpolation a near best approximator, as shown by Theorem 6.

With the filter (18), the filtered kernel on $\mathbb {T}^{2}$ with filter H is, for $n\in \mathbb {N}$ and $\mathbf {x},\mathbf {y}\in \mathbb {T}^{2}$,

$$\begin{aligned} K_{n,H}^{}(\mathbf {x},\mathbf {y}):= \frac{1}{(2\pi )^2}\sum _{\mathbf {k}\in \mathbb {Z}^{2}}H\left( \frac{|\mathbf {k}|}{n}\right) e^{{\mathrm {i}}\mathbf {k}\cdot (\mathbf {x}-\mathbf {y})}. \end{aligned}$$

As the support of filter H is [0, 2], the summation over $\mathbf {k}$ is constrained to $|\mathbf {k}|\le 2n-1$. The filtered approximation for $f\in L_2(\mathbb {T}^{2})$ is then

$$\begin{aligned} V_{n,H}(f;\mathbf {x}):= \int _{\mathbb {T}^{2}} K_{n,H}(\mathbf {x},\mathbf {y})f(\mathbf {y})\,\mathrm {d}{\mathbf {y}}. \end{aligned}$$

As corresponds to Definition 4, this is the ideal approximation of degree n, which is hard to compute as it would require integrating the unknown target function.

To construct a non-distributed filtered hyperinterpolation on $\mathbb {T}^{2}$, we consider $N=9n_0^2$ points $\mathbf {x}_{j,l}=(2j\pi /(3n_0),2l\pi /(3n_0))$. For these we can use the quadrature rule $\mathcal {Q}_{D}=\{(w_{j,l},\mathbf {x}_{j,l}): j,l=0,1,\dots ,3n_0-1\}$ with $N=9n_0^2$ equal weights $w_{j,l}\equiv (2\pi )^2/N$. The quadrature rule $\mathcal {Q}_{D}$ is exact for polynomials of degree n. To satisfy the conditions of Theorems 9 and 13, we can let hyperparameter $n_0=n$. In general, the quadrature weights need to be constructed depending on the location of the input points $\mathbf {x}_{j,l}$. We have been able to show the existence of such weights for randomly sampled inputs (see Lemma 6); however, the explicit construction in such cases is yet to be explored.

Consider a noisy data set $D=\{(\mathbf {x}_{j,l},y_{j,l}): j,l=0,1,\dots ,n_0-1\}$ with $y_{j,l} = f^*(\mathbf {x}_{j,l}) + \epsilon _{j,l}$, $j,l=0,1,\dots ,n_0 -1$, and $f^*\in C(\mathcal {M})$. Here $f^*$ is the ideal (noiseless) target function. The non-distributed filtered hyperinterpolation of degree n with filter H and quadrature rule $\mathcal {Q}_{D}$ for data D is given by

$$\begin{aligned} V_{D,n}(\mathbf {x}) = \frac{1}{N}\sum _{j,l=0,1,\dots ,3n_0-1} y_{j,l} \sum _{|\mathbf {k}|\le 2n-1}H\left( \frac{|\mathbf {k}|}{n}\right) e^{{\mathrm {i}}(\mathbf {x}-\mathbf {x}_{j,l})\cdot \mathbf {k}}, \quad \mathbf {x}\in \mathbb {T}^{2}. \end{aligned}$$

(19)

This is our construction to obtain an approximation. It corresponds to Definition 4. The summation index $\mathbf {k}$ runs over a ball of radius $2n-1$ due to the compact support of the filter H. Thus, $V_{D,n}(\mathbf {x})$ is fully discrete and computable. By Theorem 9, the approximation error of $V_{D,n}$ for $f^*\in \mathbb {W}^r_2(\mathbb {T}^{2})$, $r>1$, has convergence rate at least of order $|D|^{-r/(r+1)}$ as $n_0$ (controlling the number of data points) and n (controlling the degree of the approximation) increase. In particular, if $f^*$ is a basis element in $\{\frac{1}{2\pi }e^{{\mathrm {i}}\mathbf {x}\cdot \mathbf {k}}\}$, then $r=\infty $, since a polynomial is infinitely smooth.

In practice, we use the real part as the approximation and discard the complex part. Note that the amount of data, $9n^2$, determines the degree of the polynomials. In this example, the diffusion polynomials of degree n are given by sums of eigenfunctions with momentum vector $\mathbf {k}=(k_1,k_2)$ in the same grid defining the data.

The corresponding distributed filtered hyperinterpolation with m servers is

$$\begin{aligned} V_{D,n}^{(m)}(\mathbf {x}) = \sum _{j=1}^{m} \frac{|D_j|}{|D|} V_{D_j,n}(\mathbf {x}),\quad \mathbf {x}\in \mathbb {T}^{2}, \end{aligned}$$

where $D_j$ is the data set on the jth local server, for $j=1,\dots ,m$. This corresponds to Definition 6. By Theorem 13, the approximation error of the distributed strategy $V_{D,n}^{(m)}$ with m servers for $f^*\in \mathbb {W}_2^r(\mathbb {T}^{2})$, $r>1$, is at least of order $|D|^{-r/(r+1)}$ provided the number of servers used satisfies $m\le |D|^{\frac{r}{r+d/2}}$.

When the points of the data set D are randomly distributed and satisfy the condition of Lemma 6, the non-distributed filtered hyperinterpolation remains the same as (19) with the points $\mathbf {x}_{j,l}$ replaced by the set of random points $\varLambda _{D}$. But the local estimator $V_{D_j,n}$ in the distributed filtered hyperinterpolation uses the modified weights

$$\begin{aligned} w^*_{j,l}:=\left\{ \begin{array}{ll} (2\pi )^2/N,&{} \quad \text{ if }\ \sum _{0\le k,l\le n-1}|w_{j,l}|^2\le 2/m,\\ 0,&{}\quad \text{ otherwise }, \end{array}\right. \end{aligned}$$

in the place of (19).

For our illustration, we use the Wendland function on the torus as the target function:

$$\begin{aligned} f(\mathbf {x}) = \phi (\mathbf {x}-\mathbf {x}_c),\quad \mathbf {x}\in \mathbb {T}^{2}, \end{aligned}$$

(20)

where $\phi (u)$ is the one-dimensional Wendland function

$$\begin{aligned} \phi (u) := (1-u)_{+}^{8}(32u^3 + 25u^2 + 8u + 1), \end{aligned}$$

and $\mathbf {x}_c=(0,0)$ is the center, see [44, 47]. We show in the right part of Figure 2 the Wendland function in (20) which is in $C^{6}(\mathbb {T}^{2})$. We generate noisy data set by adding Gaussian white noise at a particular noise level to the values from the Wendland function.

Figure 3 shows the $L_2$ squared errors of both training and generalization for the approximation by non-distributed filtered hyperinterpolation on noisy data from the Wendland function (20) on $\mathbb {T}^{2}$, with six levels of noise from 0 to 0.1. The degree n for the approximation is up to 40, and the sample size is $N=(3n)^2$. The right part shows a few examples of the noisy training data, all at a noise level of 0.01, and the corresponding learned functions. For noiseless case, the training and generalization errors both converge to zero rapidly at a rate of approximately $\Vert V_{D,n}-f^*\Vert _{L_2(\mathcal {M})}\sim N^{-4}$. This is consistent with the theoretical upper bound $N^{-3}$ given in Theorem 6 where $s\ge 6$ and $d=2$. The slightly higher rate $N^{-4}$ is due to that the $\phi (\mathbf {x})$ may have a higher smoothness. For noisy data, the convergence of the error stops at a particular degree. The convergence rate is higher when the noise level is smaller. The mean squared error on the training data converges to a value close to the square of the noise level, which indicates that the trained function is filtering out the noise. For both noisy and noiseless cases, the generalization error is slightly lower than the training error. Also, the result has consistent stability over repetitions in all cases.

Figure 4 shows the $L_2$ squared errors for distributed filtered hyperinterpolation. We also generate the data from Wendland function. For this experiment, we partition the data set equally into $m=4$ servers. The ith server computes a filtered hyperinterpolation on the data $D_i$ which are defined on interleaved grids of the form $\mathrm{mod}(\mathbf {x}_{j,l}+ \mathbf {s}_i,2\pi )$, where $\mathbf {s}_i$ is a shift number between $(0,2\pi )$ and $\mathbf {s}_i$ are distinct for different subsets $D_i$. The quadrature rule $\mathcal {Q}_{D_i}$ utilizes equal weights as the non-distributed case. The distributed filtered hyperinterpolation combines the results from all servers, which has similar approximation behavior as the non-distributed case. If using noisy data in training, the approximation error has saturation after a particular degree; while with noiseless data, the error decays to zero all through the degree. We observe here that the generalization error has a more significant gap with training error as compared to the non-distributed case, which may be partly due to the distributed strategy (on multiple servers) are adopted. These experiments show consistent results as the theory in previous sections.

6 Discussion

Rates of convergence In Table 1, we compare the theoretical convergence rates of the non-distributed and distributed filtered hyperinterpolation in noiseless and noisy cases, as obtained in the previous sections. It shows that the filtered hyperinterpolation for clean data can achieve an optimal convergence rate $N^{-r/d}$ in both non-distributed and distributed cases and both deterministic and random sampling cases. For noisy data, the non-distributed filtered hyperinterpolation has a slightly lower approximation rate at $N^{-r/(r+d/2)}$, $r>d/2$, which in the limiting case $r\rightarrow d/2$ becomes the optimal rate $N^{-r/d}$. The distributed strategy preserves the convergence rate $N^{-r/(r+d/2)}$ of the non-distributed filtered hyperinterpolation for noisy data, provided that the number of data N increases sufficiently fast with the number of servers, but the condition of the number of servers in the deterministic sampling case is weaker than the random sampling case.

Table 1 Behavior of the error upper bound in the four settings that we considered in the paper, depending on the number of data points $N=|D|$, the smoothness r of the target function, the manifold dimension d, and the number of servers m

Full size table

Implementation and complexity We already illustrated the computation of the method in Sect. 5. A summary of the implementation is shown in Algorithm 1.^{Footnote 1} In deterministic sampling, we start with some given input data and a suitable quadrature rule for those input values. In the random sampling case, we begin with the data, which in theory is only assumed sampled from some distribution, and then construct a suitable quadrature rule. There are various details to consider for the implementation. First, we need to choose a filter H, which should be sufficiently smooth depending on the dimension of the manifold $\mathcal {M}$. The support of the filter will constrain the degree of the polynomials in the approximation. Second, we need a quadrature rule. Once a quadrature rule has been determined for the input data on the manifold, it can be applied to any output data. For important families of manifolds and configurations of points, quadrature rules are available from the literature. For instance, on the torus, cubes [12, 34]; sphere: Gauss-Legendre, spherical design [2, 11, 19, 45]; Graph: its nodes. The practical computation of quadrature rules for general types of data (or random input data) is an interesting problem in its own right, which has yet to be developed in more detail. Once a quadrature rule is available, the time complexity for Algorithm 1 is $\mathcal {O}_{}\left( \max _{j=1,\dots ,m} |D|^{\frac{d}{2r+d}} |D_j|\right) $. If $|D_j|$ are all equal, the time complexity becomes $\mathcal {O}_{}\left( |D|^{\frac{d}{2r+d}+1}/m\right) $.

Final remarks We have provided the first complete theoretical foundation for distributed learning on manifolds by filtered hyperinterpolation. One appealing aspect of filtered hyperinterpolation is that it comes with strong theoretical guarantees on the error, which apply to the population error or generalization error. Obtaining accurate bounds of this kind with neural networks is an active topic of research (which needs to incorporate not only the theoretical capacity of the neural network but also implicit regularization effects from the parameter initialization and optimization procedures). In filtered hyperinterpolation, once the data and the corresponding approximation degree are given, the approximating function is computed in closed form, meaning that we do not require parameter optimization. Also, filtered hyperinterpolation is a method that allows us to tune the model complexity directly in terms of the amount of available data in a principled way. As we observe in numerical experiments, the population error often is better than the training error. An interpretation is that this method imposes priors in terms of the polynomial degree and thus it is able to filter out noise. The method incorporates the geometry of the input space through the basis functions which are utilized to construct the approximations. Here, the basis functions are eigenfunctions of the Laplace–Beltrami operator on the manifold. It also contributes to the interpretability of the approximations, which live in polynomial spaces for which we have a good intuition. On the downside, to obtain the approximating function, the method relies on numerical integration techniques, in particular, quadrature rules, which is non-trivial in general to obtain. For general Riemannian manifolds, we can use the eigenvalues and eigenvectors of the discrete version of the Laplacian to approximate the Laplace–Beltrami operator, where the sampling points can estimate the discrete Laplacian, see, e.g., [7, 13, 33].

Notes

The condition $m\le \sqrt{N}$ in Algorithm 1 is a consequence of Remarks 7 and 8.

References

Aubin, T.: Some nonlinear problems in Riemannian geometry. Springer Monogr. Math. Springer-Verlag, Berlin (1998). https://doi.org/10.1007/978-3-662-13006-3.
Bondarenko, A., Radchenko, D., Viazovska, M.: Optimal asymptotic bounds for spherical designs. Ann. of Math. (2) 178(2), 443–452 (2013). https://doi.org/10.4007/annals.2013.178.2.2.
Article MathSciNet MATH Google Scholar
Brandolini, L., Choirat, C., Colzani, L., Gigante, G., Seri, R., Travaglini, G.: Quadrature rules and distribution of points on manifolds. Annali della Scuola normale superiore di Pisa - Classe di scienze. Serie V 13(4), 889–923 (2014). https://doi.org/10.2422/2036-2145.201103_007
Article MathSciNet MATH Google Scholar
Breger, A., Ehler, M., Gräf, M.: Quasi monte carlo integration and kernel-based function approximation on grassmannians. In: Frames and Other Bases in Abstract and Function Spaces, pp. 333–353. Springer (2017)
Breger, A., Ehler, M., Gräf, M., Peter, T.: Cubatures on Grassmannians: Moments, Dimension Reduction, and Related Topics, pp. 235–259. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-69802-1_8.
Cools, R.: An encyclopaedia of cubature formulas. J. Complexity 19(3), 445–453 (2003). https://doi.org/10.1016/S0885-064X(03)00011-6. Numerical integration and its complexity (Oberwolfach, 2001)
Crane, K., De Goes, F., Desbrun, M., Schröder, P.: Digital geometry processing with discrete exterior calculus. In: ACM SIGGRAPH 2013 Courses, pp. 1–126 (2013)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.) 39(1), 1–49 (2002). https://doi.org/10.1090/S0273-0979-01-00923-5.
Article MathSciNet MATH Google Scholar
Dai, F.: On generalized hyperinterpolation on the sphere. Proc. Amer. Math. Soc. 134(10), 2931–2941 (2006). https://doi.org/10.1090/S0002-9939-06-08421-8. URL http://dx.doi.org.ezproxy.cityu.edu.hk/10.1090/S0002-9939-06-08421-8
Dai, F., Xu, Y.: Approximation theory and harmonic analysis on spheres and balls, vol. 23. Springer (2013)
Delsarte, P., Goethals, J.M., Seidel, J.J.: Spherical codes and designs. Geometriae Dedicata 6(3), 363–388 (1977). https://doi.org/10.1007/bf03187604.
Article MathSciNet MATH Google Scholar
Driscoll, T.A., Hale, N., Trefethen, L.N.: Chebfun Guide. Pafnuty Publications, Oxford (2014)
Google Scholar
Dunson, D.B., Wu, H.T., Wu, N.: Diffusion based gaussian process regression via heat kernel reconstruction. arXiv preprint arXiv:1912.05680 (2019)
Filbir, F., Mhaskar, H.N.: Marcinkiewicz-Zygmund measures on manifolds. J. Complexity 27(6), 568–596 (2011). https://doi.org/10.1016/j.jco.2011.03.002. URL http://dx.doi.org.ezproxy.cityu.edu.hk/10.1016/j.jco.2011.03.002
Gariboldi, B., Gigante, G.: Optimal asymptotic bounds for designs on manifolds. arXiv: Analysis of PDEs (2018)
Guo, Z.C., Lin, S.B., Zhou, D.X.: Learning theory of distributed spectral algorithms. Inverse Probl. 33(7), 074009 (2017). URL http://stacks.iop.org/0266-5611/33/i=7/a=074009
Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer-Verlag, New York (2002). https://doi.org/10.1007/b97848
Hesse, K., Sloan, I.H.: Cubature over the sphere $S^2$ in Sobolev spaces of arbitrary order. J. Approx. Theory 141(2), 118–133 (2006). https://doi.org/10.1016/j.jat.2006.01.004.
Article MathSciNet MATH Google Scholar
Hesse, K., Sloan, I.H., Womersley, R.S.: Numerical integration on the sphere. Handbook of Geomathematics pp. 1185–1219 (2010)
de La Vallée Poussin, C.: Leçons sur l’approximation des Fonctions d’une Variable Réelle. Gauthiers-Villars, Paris (1919). 2nd edn. Chelsea Publ. Co., New York 1970
Le Gia, Q.T., Mhaskar, H.N.: Localized linear polynomial operators and quadrature formulas on the sphere. SIAM J. Numer. Anal. 47(1), 440–466 (2008). https://doi.org/10.1137/060678555
Article MathSciNet MATH Google Scholar
Lin, S.B., Guo, X., Zhou, D.X.: Distributed learning with regularized least squares. J. Mach. Learn. Res. 18, Paper No. 92, 31 (2017)
Lin, S.B., Zhou, D.X.: Distributed kernel-based gradient descent algorithms. Constr. Approx. 47(2), 249–276 (2018). https://doi.org/10.1007/s00365-017-9379-1.
Article MathSciNet MATH Google Scholar
Maggioni, M., Mhaskar, H.N.: Diffusion polynomial frames on metric measure spaces. Appl. Comput. Harmon. Anal. 24(3), 329–353 (2008). https://doi.org/10.1016/j.acha.2007.07.001.
Article MathSciNet MATH Google Scholar
Mhaskar, H.N., Narcowich, F.J., Ward, J.D.: Approximation properties of zonal function networks using scattered data on the sphere. Adv. Comput. Math. 11(2-3), 121–137 (1999). https://doi.org/10.1023/A:1018967708053
Article MathSciNet MATH Google Scholar
Mhaskar, H.N., Narcowich, F.J., Ward, J.D.: Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature. Math. Comp. 70(235), 1113–1130 (2001). https://doi.org/10.1090/S0025-5718-00-01240-0
Article MathSciNet MATH Google Scholar
Narcowich, F.J., Petrushev, P., Ward, J.D.: Localized tight frames on spheres. SIAM J. Math. Anal. 38(2), 574–594 (2006). https://doi.org/10.1137/040614359
Article MathSciNet MATH Google Scholar
Rustamov, K.P.: On the approximation of functions on a sphere. Izv. Ross. Akad. Nauk Ser. Mat. 57(5), 127–148 (1993). https://doi.org/10.1070/IM1994v043n02ABEH001566.
Article Google Scholar
Sloan, I.H.: Polynomial interpolation and hyperinterpolation over general regions. J. Approx. Theory 83(2), 238–254 (1995). https://doi.org/10.1006/jath.1995.1119
Article MathSciNet MATH Google Scholar
Sloan, I.H.: Polynomial approximation on spheres-generalizing de La Vallée-Poussin. Comput. Methods Appl. Math. 11(4), 540–552 (2011)
Article MathSciNet Google Scholar
Sloan, I.H., Womersley, R.S.: Filtered hyperinterpolation: a constructive polynomial approximation on the sphere. GEM Int. J. Geomath. 3(1), 95–117 (2012). https://doi.org/10.1007/s13137-011-0029-7
Article MathSciNet MATH Google Scholar
Stein, E.M.: Interpolation in polynomial classes and Markoff’s inequality. Duke Math. J. 24(3), 467–476 (1957). https://doi.org/10.1215/S0012-7094-57-02453-5.
Article MathSciNet MATH Google Scholar
Sunada, T.: Discrete geometric analysis. Proceedings of Symposia in Pure Mathematics 77, 51–86 (2008). https://doi.org/10.1090/pspum/077/2459864
Article MathSciNet MATH Google Scholar
Trefethen, L.N.: Approximation theory and approximation practice, vol. 128. SIAM (2013)
Wang, H., Sloan, I.H.: On filtered polynomial approximation on the sphere. J. Fourier Anal. Appl. 23(4), 863–876 (2017). https://doi.org/10.1007/s00041-016-9493-7.
Article MathSciNet MATH Google Scholar
Wang, H., Wang, K.: Optimal recovery of Besov classes of generalized smoothness and Sobolev classes on the sphere. J. Complexity 32(1), 40–52 (2016). https://doi.org/10.1016/j.jco.2015.07.003.
Article MathSciNet MATH Google Scholar
Wang, Y.: Filtered polynomial approximation on the sphere. Bull. Aust. Math. Soc. 93(1), 162–163 (2016)
Article MathSciNet Google Scholar
Wang, Y.G., Le Gia, Q.T., Sloan, I.H., Womersley, R.S.: Fully discrete needlet approximation on the sphere. Appl. Comput. Harmon. Anal. 43(2), 292–316 (2017). https://doi.org/10.1016/j.acha.2016.01.003
Article MathSciNet MATH Google Scholar
Wang, Y.G., Sloan, I.H., Womersley, R.S.: Riemann localisation on the sphere. J. Fourier Anal. Appl. 24(1), 141–183 (2018)
Article MathSciNet Google Scholar
Wang, Y.G., Womersley, R.S., Wu, H.T., Yu, W.H.: Numerical computation of triangular complex spherical designs with small mesh ratio. arXiv:1907.13493 (2019)
Wang, Y.G., Zhu, H.: Analysis of framelet transforms on a simplex. In: Contemporary Computational Mathematics – A Celebration of the 80th Birthday of Ian Sloan, pp. 1175–1189. Springer (2018)
Wang, Y.G., Zhuang, X.: Tight framelets on graphs for multiscale data analysis. In: Wavelets and Sparsity XVIII, vol. 11138, p. 111380B. International Society for Optics and Photonics (2019)
Wang, Y.G., Zhuang, X.: Tight framelets and fast framelet filter bank transforms on manifolds. Appl. Comput. Harmon. Anal. 48(1), 64–95 (2020)
Article MathSciNet Google Scholar
Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4(4), 389–396 (1995). https://doi.org/10.1007/BF02123482.
Article MathSciNet MATH Google Scholar
Womersley, R.S.: Efficient spherical designs with good geometric properties. In: Contemporary Computational Mathematics – A Celebration of the 80th Birthday of Ian Sloan. Vol. 1, 2, pp. 1243–1285. Springer, Cham (2018)
Wu, Q., Zhou, D.X.: SVM soft margin classifiers: linear programming versus quadratic programming. Neural Comput. 17(5), 1160–1187 (2005). https://doi.org/10.1162/0899766053491896
Article MathSciNet MATH Google Scholar
Wu, Z.M.: Compactly supported positive definite radial functions. Adv. Comput. Math. 4(3), 283–292 (1995). https://doi.org/10.1007/BF03177517.
Article MathSciNet MATH Google Scholar
Xu, Y.: Fourier series and approximation on hexagonal and triangular domains. Constr. Approx. 31(1), 115 (2010)
Article MathSciNet Google Scholar
Zhou, D.X.: The covering number in learning theory. J. Complexity 18(3), 739–767 (2002). https://doi.org/10.1006/jcom.2002.0635.
Article MathSciNet MATH Google Scholar
Zhou, D.X., Jetter, K.: Approximation with polynomial kernels and SVM classifiers. Adv. Comput. Math. 25(1-3), 323–344 (2006). https://doi.org/10.1007/s10444-004-7206-2
Article MathSciNet MATH Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Mathematics and Department of Statistics, University of California, Los Angeles, CA, USA
Guido Montúfar
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Guido Montúfar & Yu Guang Wang
Institute of Natural Sciences, School of Mathematical Sciences, and MOE-LSC, Shanghai Jiao Tong University, Shanghai, China
Yu Guang Wang
School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia
Yu Guang Wang

Authors

Guido Montúfar
View author publications
You can also search for this author in PubMed Google Scholar
Yu Guang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Guang Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Frances Kuo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guido Montúfar and Yu Guang Wang acknowledge the support of funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757983). Yu Guang Wang also acknowledges support from the Australian Research Council under Discovery Project DP180100506. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 while the authors were in residence at the Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the Collaborate@ICERM on “Geometry of Data and Networks.” Part of this research was performed while the authors were at the Institute for Pure and Applied Mathematics (IPAM), which is supported by the National Science Foundation (Grant No. DMS-1440415)

Appendices

Proofs

The appendices contain the proofs of the theorems in Sects. 3.1, 3.2, 4.1 and 4.2 in turn.

1.1 Proofs for Section 3.1

Proof

(Lemma 5) Let $P\in \varPi _{n}$ and $\mathbf {x}\in \mathcal {M}$. By $\mathrm {supp}\,H\subset [0,2]$ and Assumption 2, $K_{n,H}^{}(\mathbf {x},\cdot )P(\cdot )$, for each $i=1,\dots , N$, is a polynomial of degree $3n-1$. Since $H(t)=1$ for $t\in [0,1]$, and since P and $\phi _{\ell }$, $\lambda _{\ell }\ge n+1$, are orthogonal, then for $\mathbf {x}\in \mathcal {M}$,

$$\begin{aligned} V_{n,H}^{}(P;\mathbf {x})&= \int _{\mathcal {M}} \sum _{\lambda _{\ell }\le 2n}H\Bigl (\frac{\lambda _{\ell }}{n}\Bigr )\,\phi _{\ell }(\mathbf {x})\overline{\phi _{\ell }}(\mathbf {z}) P(\mathbf {z}) \mathrm {d}{\mu (\mathbf {z})}\nonumber \\&= \int _{\mathcal {M}} \sum _{\lambda _{\ell }\le n}\phi _{\ell }(\mathbf {x})\overline{\phi _{\ell }}(\mathbf {z}) P(\mathbf {z}) \mathrm {d}{\mu (\mathbf {z})} = P(\mathbf {x}). \end{aligned}$$

(21)

The exactness of $\mathcal {Q}_{D}$ for degree $3n-1$ with (21) then gives

$$\begin{aligned} V_{D,n}^{}(P;\mathbf {x})&= \sum _{i=1}^{N} w_{i}\, K_{n,H}^{}(\mathbf {x},\mathbf {x}_{i}) P(\mathbf {x}_{i})\\&= \int _{\mathcal {M}}K_{n,H}^{}(\mathbf {x},\mathbf {y})P(\mathbf {y}) \mathrm {d}{\mu (\mathbf {y})} = V_{n,H}^{}(P;\mathbf {x}) = P(\mathbf {x}), \end{aligned}$$

thus completing the proof. $\square $

Proof

(Theorem 4) Let $P\in \varPi _{n}$. By the linearity of $V_{n,H}^{}$ and Lemma 5,

$$\begin{aligned} \big \Vert f-V_{n,H}^{}(f)\big \Vert _{L_{p}(\mathcal {M})}&\le \Vert f-P\Vert _{L_{p}(\mathcal {M})} + \big \Vert V_{n,H}^{}(f-P)\big \Vert _{L_{p}(\mathcal {M})}\\&\le \left( 1+\Vert V_{n,H}^{}\Vert _{{p}\rightarrow {p}}\right) \Vert f-P\Vert _{L_{p}(\mathcal {M})}, \end{aligned}$$

which, as P is an arbitrary polynomial in $\varPi _{n}$, together with Theorem 3 gives

$$\begin{aligned} \big \Vert f-V_{n,H}^{}(f)\big \Vert _{L_{p}(\mathcal {M})} \le c_{d,H,\kappa }\, E_{n}(f)_{p}, \end{aligned}$$

thus completing the proof. $\square $

We go to prove Theorem 6, for which we need some lemmas as given below. The following theorem shows a Marcinkiewicz–Zygmund inequality for a quadrature rule on $\mathcal {M}$.

Lemma 7

Let $\mathcal {Q}_{D}=\{(w_{i},\mathbf {x}_{i})\}_{i=1}^{N}$ be a positive quadrature rule on $\mathcal {M}$ satisfying for some $1\le p_{0}<\infty $, $c_{0}>0$ and $n\ge 0$,

$$\begin{aligned} \sum _{i=1}^{N} w_{i}| P(\mathbf {x}_{i})|^{p_{0}} \le c_{0} \int _{\mathcal {M}}|P(\mathbf {y})|^{p_{0}}\mathrm {d}{\mu (\mathbf {y})}, \quad P\in \varPi _{n}. \end{aligned}$$

(22)

Then, for all $1\le p_{1}<\infty $ and $\ell >n$,

$$\begin{aligned} \sum _{i=1}^{N} w_{i}| P(\mathbf {x}_{i})|^{p_{1}} \le c_{1} \left( \frac{\ell }{n}\right) ^{d}\int _{\mathcal {M}}|P(\mathbf {y})|^{p_{1}}\mathrm {d}{\mu (\mathbf {y})},\quad P\in \varPi _{\ell }, \end{aligned}$$

(23)

where $c_{1}$ depends only on d, $p_{0}$ and $c_{0}$.

Remark 9

Dai [9] proved Lemma 7 when $\mathcal {M}$ is the unit sphere $\mathbb {S}^{d}$, $d\ge 1$.

The proof of Lemma 7 relies on the following lemma of Filbir and Mhaskar [14], which shows that the sum of the weights, the corresponding nodes of which lie in the region $B\left( \mathbf {x}_{0},\beta ,\beta +\alpha \right) $, is bounded by a constant multiple of the measure of this region.

Lemma 8

Let $d\ge 1$ and let $\mathcal {M}$ be a d-dimensional compact Riemannian manifold. Let $\mathcal {Q}_{D}:=\{(w_{i},\mathbf {x}_{i})\}_{i=1}^{N}$ be a positive quadrature rule on $\mathcal {M}$ satisfying (22) for some $1\le p_{0}<\infty $, $c_{0}>0$ and $n\in \mathbb {N}_{0}$. Then, for $\beta \ge 0$, $\alpha \ge 1/n$ and $\mathbf {x}_{0}\in \mathcal {M}$,

$$\begin{aligned} \sum _{\mathbf {x}_{i}\in B\left( \mathbf {x}_{0},\beta ,\beta +\alpha \right) }\quad w_{i}\le c\, \mu (B\left( \mathbf {x}_{0},\beta ,\beta +\alpha \right) ), \end{aligned}$$

where the constant c depends only on d.

Let

$$\begin{aligned} A_{\ell }(\theta ):= \frac{\ell ^{d}}{(1+ \ell \theta )^{d+1}},\quad \ell \in \mathbb {N},\;\theta \in [0,\pi ]. \end{aligned}$$

(24)

Lemma 8 implies the following estimate for a quadrature rule.

Lemma 9

Let $d\ge 1$ and let $\mathcal {M}$ be a d-dimensional compact Riemannian manifold. Let $\mathcal {Q}_{D}:=\{(w_{i},\mathbf {x}_{i})\}_{i=1}^{N}$ be a quadrature rule on $\mathcal {M}$ satisfying (22) for some $1\le p_{0}<\infty $, $c_{0}>0$ and $n\in \mathbb {N}_{0}$. Let $A_{n}(\theta )$ be given by (24). Then, for $\ell \ge n$,

$$\begin{aligned} \max _{\mathbf {x}\in \mathcal {M}} \sum _{i=1}^{N}w_{i}\, A_{\ell }(\rho (\mathbf {x},\mathbf {x}_{i})) \le c\, \left( \frac{\ell }{n}\right) ^{d}, \end{aligned}$$

where the constant c depends only on d.

Proof

Let $\mathbf {x}\in \mathcal {M}$. Since $\mathcal {M}$ is compact, $\mathcal {M}$ is bounded, i.e., there exists $0<r<\infty $ such that $\mathcal {M}\subseteq B\left( \mathbf {x},r\right) $. Using Lemma 8,

$$\begin{aligned}&\sum _{i=1}^{N} w_{i}\, A_{\ell }(\rho (\mathbf {x},\mathbf {x}_{i}))\\&\quad \le \ell ^{d}\sum _{\mathbf {x}_{i}\in B\left( \mathbf {x},1/n\right) }w_{i}+ \sum _{k=1}^{ \left\lfloor r n \right\rfloor -1}\sum _{\mathbf {x}_{i}\in B\left( \mathbf {x},k/n,(k+1)/n\right) }w_{i}\, \ell ^{d}\left( \frac{\ell k}{n}\right) ^{-(d+1)} \\&\qquad +\, \ell ^{-1} \sum _{\mathbf {x}_{i}\in B\left( \mathbf {x},\left\lfloor r n \right\rfloor /n,r\right) }w_{i}\\&\quad \le c\, \ell ^{d}\mu \bigl (B\left( \mathbf {x},1/n\right) \bigr ) + c\,\ell ^{-1} \sum _{k=1}^{\left\lfloor rn \right\rfloor -1} \left( \frac{n}{k}\right) ^{d+1} \mu \bigl (B\left( \mathbf {x},k/n,(k+1)/n\right) \bigr ) \\&\qquad +\, c\,\ell ^{-1}\mu \bigl (B\left( \mathbf {x},\left\lfloor rn \right\rfloor /n,r\right) \bigr )\\&\quad \le c_{d}\, \left( \frac{\ell }{n}\right) ^{d}, \end{aligned}$$

where the last inequality uses Assumption 1 and $\mu \bigl (B\left( \mathbf {x},k/n,(k+1)/n\right) \bigr ) \le c_{d} \,k^{d-1}/n^d$. $\square $

Proof

(Lemma 7) For $1\le p_{1}<\infty $, using (21) and Hölder’s inequality gives, for $P\in \varPi _{n}$ and $\mathbf {x}\in \mathcal {M}$,

$$\begin{aligned} |P(\mathbf {x})|^{p_{1}} \le \left( \int _{\mathcal {M}}|K_{n,H}^{}(\mathbf {x},\mathbf {z})||P(\mathbf {z})|^{p_{1}}\mathrm {d}{\mu (\mathbf {z})}\right) \left( \int _{\mathcal {M}}|K_{n,H}^{}(\mathbf {x},\mathbf {z})|\mathrm {d}{\mu (\mathbf {z})}\right) ^{p_{1}-1}. \end{aligned}$$

(25)

Lemma 3 shows that the second integral of the filtered kernel on the right-hand side is bounded. This with (25) gives

$$\begin{aligned} |P(\mathbf {x})|^{p_{1}} \le c\,\left( \int _{\mathcal {M}}|K_{n,H}^{}(\mathbf {x},\mathbf {z})||P(\mathbf {z})|^{p_{1}}\mathrm {d}{\mu (\mathbf {z})}\right) , \end{aligned}$$

where the constant c depends only on $d,p_{1},H$ and $\kappa $. Summing over quadrature nodes, we then obtain by Lemmas 9 and 2 that

$$\begin{aligned} \sum _{i=1}^{N}w_{i}|P(\mathbf {x}_{i})|^{p_{1}}&\le c \int _{\mathcal {M}}|P(\mathbf {z})|^{p_{1}}\sum _{i=1}^{N}w_{i}|K_{n,H}^{}(\mathbf {x}_{i},\mathbf {z})|\mathrm {d}{\mu (\mathbf {z})}\\&\le c \left( \max _{\mathbf {z}\in \mathcal {M}}\sum _{i=1}^{N}w_{i}\,A_{\ell }\bigl (\rho (\mathbf {x}_{i},\mathbf {z})\bigr )\right) \Vert P\Vert _{L_{p_{1}}(\mathcal {M})}^{p_{1}}\\&\le c \left( \frac{\ell }{n}\right) ^{d}\Vert P\Vert _{L_{p_{1}}(\mathcal {M})}^{p_{1}}, \end{aligned}$$

where the constant c depends only on $d,p_{1},H$ and $\kappa $. $\square $

The proof of optimal-order error for filtered hyperinterpolation utilizes its decomposition by framelets on manifolds [35, 38, 43]. Given $H\in C^{\kappa }(\mathbb {R}_{+})$, $\kappa \ge 1$, we define recursively the contributions of levels $j\in \mathbb {N}_{0}$ for $f\in L_{p}(\mathcal {M})$ by

$$\begin{aligned} \mathcal {U}_{0}(f) := V_{2^{-1}}^{}(f) := 1,\quad \mathcal {U}_{j}(f) := V_{2^{j-1}}^{}(f) - V_{2^{j-2}}^{}(f), \;\; j\in \mathbb {N}, \end{aligned}$$

(26)

The following lemma shows that $\mathcal {U}_{j}(f)$ forms a decomposition of $f\in L_{p}(\mathcal {M})$, and it gives an upper bound of the $L_p$-norm of $\mathcal {U}_{j}(f)$ for $f\in \mathbb {W}_{p}^{s}(\mathcal {M})$.

Lemma 10

Let $1\le p\le \infty $, $d\ge 2$, $s>0$. Then,

$$\begin{aligned}&\lim _{J\rightarrow \infty }\Big \Vert \sum _{j=0}^{J}\mathcal {U}_{j}(f)-f\Big \Vert _{L_{p}(\mathcal {M})} = 0,\quad f\in L_{p}(\mathcal {M}), \end{aligned}$$

(27)

$$\begin{aligned}&\big \Vert \mathcal {U}_{j}(f)\big \Vert _{L_{p}(\mathcal {M})} \le c\, 2^{-js}\, \Vert f\Vert _{\mathbb {W}_{p}^{s}(\mathcal {M})},\quad j\in \mathbb {N}, \; f\in \mathbb {W}_{p}^{s}(\mathcal {M}), \end{aligned}$$

(28)

where the constant c depends only on d, p, s, $H$ and $\kappa $.

Proof

For $f\in L_{p}(\mathcal {M})$, Theorem 4 with (26) gives

$$\begin{aligned} \Big \Vert \sum _{j=0}^{J}\mathcal {U}_{j}(f)-f\Big \Vert _{L_{p}(\mathcal {M})} = \big \Vert V_{2^{J-1}}^{}(f)-f\big \Vert _{L_{p}(\mathcal {M})}\le c_{d,H,\kappa }\, E_{2^{J-1}}(f)_{p}. \end{aligned}$$

This with $\lim _{J\rightarrow \infty }E_{2^{J-1}}(f)_{p}=0$ gives (27). For $f\in \mathbb {W}_{p}^{s}(\mathcal {M})$ and $j\in \mathbb {N}$, Theorem 5 with (26) gives

$$\begin{aligned} \big \Vert \mathcal {U}_{j}(f)\big \Vert _{L_{p}(\mathcal {M})}&\le \big \Vert V_{2^{j-1}}^{}(f)-f\big \Vert _{L_{p}(\mathcal {M})} + \big \Vert V_{2^{j-2}}^{}(f)-f\big \Vert _{L_{p}(\mathcal {M})}\\&\le c\, 2^{-js}\,\Vert f\Vert _{\mathbb {W}_{p}^{s}(\mathcal {M})}, \end{aligned}$$

where the constant c depends only on d, p, s, $H$ and $\kappa $. $\square $

Proof

(Theorem 6) For $p=\infty $, Lemma 5 with the linearity of $V_{D^*,n}^{}$ shows that for $q\in \varPi _{n}$,

$$\begin{aligned} \big \Vert f-V_{D^*,n}^{}(f)\big \Vert _{L_{\infty }(\mathcal {M})}&= \big \Vert (f-q)-V_{D^*,n}^{}(f-q)\big \Vert _{L_{\infty }(\mathcal {M})}\nonumber \\&\le \left( 1+\big \Vert V_{D^*,n}^{}\big \Vert _{{\infty }\rightarrow {\infty }}\right) \,\big \Vert f-q\big \Vert _{L_{\infty }(\mathcal {M})}, \end{aligned}$$

(29)

where using standard arguments,

$$\begin{aligned} \big \Vert V_{D^*,n}^{}\big \Vert _{{\infty }\rightarrow {\infty }} := \sup _{\mathbf {x}\in \mathcal {M}} \sum _{i=1}^{N} w_{i}\, |K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|. \end{aligned}$$

(30)

Taking the minimum over $q\in \varPi _{n}$ of the right-hand side of (29) with Lemma 4 gives

$$\begin{aligned} \big \Vert f-V_{D^*,n}^{}(f)\big \Vert _{L_{\infty }(\mathcal {M})}&= \big \Vert (f-q)-V_{D^*,n}^{}(f-q)\big \Vert _{L_{\infty }(\mathcal {M})}\\&\le \left( 1+\big \Vert V_{D^*,n}^{}\big \Vert _{{\infty }\rightarrow {\infty }}\right) \,E_{L}(f)_{\infty }\\&\le c_{d,s}\,\left( 1+\big \Vert V_{D^*,n}^{}\big \Vert _{{\infty }\rightarrow {\infty }}\right) \, n^{-s}\,\Vert f\Vert _{\mathbb {W}_{\infty }^{s}(\mathcal {M})}. \end{aligned}$$

Since the quadrature rule $\mathcal {Q}_{D}$ is exact for degree $3n-1$, the condition of Lemma 7 is satisfied for $p_{0}=2$, then (23) holds for $p_{1}=1$. This with (30) and Lemma 3 gives

$$\begin{aligned} \big \Vert V_{D^*,n}^{}\big \Vert _{{\infty }\rightarrow {\infty }} \le c_{d} \sup _{\mathbf {x}\in \mathcal {M}}\int _{\mathcal {M}} |K_{n}^{}(\mathbf {x},\mathbf {y})|\mathrm {d}{\mu (\mathbf {y})}\le c_{d,H,\kappa }. \end{aligned}$$

Thus,

$$\begin{aligned} \big \Vert f-V_{D^*,n}^{}(f)\big \Vert _{L_{\infty }(\mathcal {M})} \le c_{d,H,\kappa ,s}\, n^{-s}\,\Vert f\Vert _{\mathbb {W}_{\infty }^{s}(\mathcal {M})}. \end{aligned}$$

We next consider for $p\in [1,\infty )$. Given $n\ge 0$, let m be the integer satisfying $2^{m}\le L<2^{m+1}$. Since $\mathcal {U}_{j}(f)\in \varPi _{2^{j+1}}$, $V_{D^*,n}^{}$ reproduces $\mathcal {U}_{j}(f)$ for $j\le m-1$, that is, $V_{D^*,n}^{}(\mathcal {U}_{j}(f))=\mathcal {U}_{j}(f)$, $j\le m-1$. Lemma 10 then gives

$$\begin{aligned} \big \Vert f-V_{D^*,n}^{}(f)\big \Vert _{L_{p}(\mathcal {M})}&= \lim _{J\rightarrow \infty } \Big \Vert \sum _{j=0}^{J}\mathcal {U}_{j}\left( f-V_{D^*,n}^{}(f)\right) \Big \Vert _{L_{p}(\mathcal {M})}\nonumber \\&= \lim _{J\rightarrow \infty } \Big \Vert \sum _{j=m}^{J}\left( \mathcal {U}_{j}(f)-V_{D^*,n}^{}(\mathcal {U}_{j}(f))\right) \Big \Vert _{L_{p}(\mathcal {M})}\nonumber \\&\le \sum _{j=m}^{\infty }\left( \Vert \mathcal {U}_{j}(f)\Vert _{L_{p}(\mathcal {M})}+ \big \Vert V_{D^*,n}^{}(\mathcal {U}_{j}(f))\big \Vert _{L_{p}(\mathcal {M})}\right) . \end{aligned}$$

(31)

To bound the right-hand side of the last inequality in (31), we need the following estimate.

$$\begin{aligned} \big \Vert V_{D^*,n}^{}(\mathcal {U}_{j}(f))\big \Vert _{L_{p}(\mathcal {M})} \le c\,\left( \frac{2^{j+1}}{n}\right) ^{d/p} \big \Vert \mathcal {U}_{j}(f)\big \Vert _{L_{p}(\mathcal {M})}, \end{aligned}$$

(32)

where the constant c depends only on d, p, $H$ and $\kappa $. For $p=1$ and $j\ge m$,

$$\begin{aligned} \big \Vert V_{D^*,n}^{}(\mathcal {U}_{j}(f))\big \Vert _{L_{1}(\mathcal {M})}&= \Big \Vert \sum _{i=1}^{N}w_{i}\, K_{n}^{}(\mathbf {x}_{i},\cdot )\,\mathcal {U}_{j}(f;\mathbf {x}_{i})\Big \Vert _{L_{1}(\mathcal {M})}\\&\le \sum _{i=1}^{N}w_{i}\, |\mathcal {U}_{j}(f;\mathbf {x}_{i})|\,\Vert K_{n}^{}(\mathbf {x}_{i},\cdot )\Vert _{L_{1}(\mathcal {M})}\\&\le c_{d,H,\kappa }\, \sum _{i=1}^{N}w_{i}\, |\mathcal {U}_{j}(f;\mathbf {x}_{i})|\\&\le c_{d,H,\kappa }\, \left( \frac{2^{j+1}}{n}\right) ^{d}\big \Vert \mathcal {U}_{j}(f)\big \Vert _{L_{1}(\mathcal {M})}, \end{aligned}$$

where the penultimate inequality uses Lemma 3 and the last uses Lemma 7 with $p_{1}=1$. For $1<p<\infty $ and $j\ge m$, by Hölder’s inequality,

$$\begin{aligned}&\big \Vert V_{D^*,n}^{}(\mathcal {U}_{j}(f))\big \Vert _{L_{p}(\mathcal {M})}^{p}\\&\quad = \int _{\mathcal {M}}\left| \sum _{i=1}^{N}w_{i}\, K_{n}^{}(\mathbf {x},\mathbf {x}_{i})\,\mathcal {U}_{j}(f;\mathbf {x}_{i})\right| ^{p}\mathrm {d}{\mu (\mathbf {x})}\\&\quad \le \int _{\mathcal {M}}\left( \sum _{i=1}^{N}w_{i}\, |K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|\,|\mathcal {U}_{j}(f;\mathbf {x}_{i})|\right) ^{p}\mathrm {d}{\mu (\mathbf {x})}\\&\quad \le \int _{\mathcal {M}}\left( \sum _{i=1}^{N}\left( w_{i}\,|K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|\right) ^{\frac{p-1}{p}} \left( w_{i}\, |K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|\,|\mathcal {U}_{j}(f;\mathbf {x}_{i})|^{p}\right) ^{\frac{1}{p}}\right) ^{p}\mathrm {d}{\mu (\mathbf {x})}\\&\quad \le \int _{\mathcal {M}}\left( \sum _{i=1}^{N}w_{i}\,|K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|\right) ^{p-1} \left( \sum _{i=1}^{N}w_{i}\, |K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|\,|\mathcal {U}_{j}(f;\mathbf {x}_{i})|^{p}\right) \mathrm {d}{\mu (\mathbf {x})}. \end{aligned}$$

Using Lemma 7 with $p_{1}=1$ and $p_{1}=p$ and Lemma 3 then gives

$$\begin{aligned}&\big \Vert V_{D^*,n}^{}(\mathcal {U}_{j}(f))\big \Vert _{L_{p}(\mathcal {M})}^{p}\\&\quad \le \left( c_{d}\,\max _{\mathbf {x}\in \mathcal {M}}\big \Vert K_{n}^{}(\mathbf {x},\cdot )\big \Vert _{L_{1}(\mathcal {M})}\right) ^{p-1} \sum _{i=1}^{N}w_{i}\, |\mathcal {U}_{j}(f;\mathbf {x}_{i})|^{p}\,\int _{\mathcal {M}}|K_{n}^{}(\mathbf {x},\mathbf {x}_{i})|\mathrm {d}{\mu (\mathbf {x})}\\&\quad \le c_{d,p,H,\kappa }\, \sum _{i=1}^{N}w_{i}\, |\mathcal {U}_{j}(f;\mathbf {x}_{i})|^{p}\\&\quad \le c_{d,p,H,\kappa }\, \left( \frac{2^{j+1}}{n}\right) ^{d} \big \Vert \mathcal {U}_{j}(f)\big \Vert _{L_{p}(\mathcal {M})}^{p}, \end{aligned}$$

which proves (32) for $p\in (1,\infty )$. It follows from (31) and (32) that for $f\in \mathbb {W}_{p}^{s}(\mathcal {M})$, $1\le p<\infty $, $s>d/p$,

$$\begin{aligned} \big \Vert f-V_{D^*,n}^{}(f)\big \Vert _{L_{p}(\mathcal {M})}&\le c_{d,p,H,\kappa }\sum _{j=m}^{\infty }\left( 1+\left( \frac{2^{j+1}}{n}\right) ^{d/p}\right) \Vert \mathcal {U}_{j}(f)\Vert _{L_{p}(\mathcal {M})}\\&\le c_{d,p,H,\kappa }\sum _{j=m}^{\infty }\left( 1+\left( \frac{2^{j+1}}{n}\right) ^{d/p}\right) 2^{-js} \Vert f\Vert _{\mathbb {W}_{p}^{s}(\mathcal {M})}, \end{aligned}$$

where the second inequality uses (28), and where since $n\asymp 2^{m}$ and $s>d/p$,

$$\begin{aligned} \sum _{j=m}^{\infty }\left( 1+\left( \frac{2^{j+1}}{n}\right) ^{d/p}\right) 2^{-js}&\le c_{d,p,s}\,\sum _{j=m}^{\infty }\left( 1+\left( \frac{2^{j+1}}{2^{m}}\right) ^{d/p}\right) 2^{-js}\\&\le c_{d,p,s}\,2^{-md/p}\,\sum _{j=m}^{\infty }\left( 2^{md/p}+ 2^{(j+1)d/p}\right) 2^{-js}\\&\le c_{d,p,s}\,2^{-md/p}\,\sum _{j=m}^{\infty }2^{-j(s-d/p)}\\&\le c_{d,p,s}\,2^{-ms}\\&\le c_{d,p,s}\,n^{-s}, \end{aligned}$$

thus proving (10). $\square $

Proof

(Theorem 7) For the quadrature rule for random samples $\{(\mathbf {x}_i,w_i)\}_{i=1}^{|D^*|}$, we define four events, as follows. Let $ \varOmega _{D^*}$ be the event such that $\sum _{i=1}^{|D^*|}|w_{i}|^2\le \frac{2}{|D^*|}$ and $\varOmega _{D^*}^c$ be the complement of $\varOmega _{D^*}$, i.e., $\varOmega _{D^*}^c$ be the event $\sum _{i=1}^{|D^*|}|w_{i}|^2>\frac{2}{|D^*|}$. Let $\varXi _{D^*}$ the event that $\{(w_{i},\mathbf {x}_i)\}_{i=1}^{|D^*|}$ is a quadrature rule exact for polynomials in $\varPi _n^d$ associated with the measure $\nu $ and $\varXi _{D^*}^c$ the complement event of $\varXi _{D^*}$. Then, by Lemma 6,

$$\begin{aligned} {\mathbf {P}}\{\varOmega _{D^*}^c\}\le {\mathbf {P}}\{\varXi _{D^*}^c\}\le 4\exp \left\{ -C|D^*|/n^d\right\} . \end{aligned}$$

(33)

We write

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \nonumber \\&\quad =\mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D^*}\right\} {\mathbf {P}}\{\varOmega _{D^*}\}\nonumber \\&\qquad +\mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D^*}^c\right\} {\mathbf {P}}\{\varOmega _{D^*}^c\}. \end{aligned}$$

(34)

Under the event $\varOmega _{D^*}^c$, by Definition 7, we obtain that $V_{D^*,n}=0$. Then, by (33),

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D^*}^c\right\} {\mathbf {P}}\{\varOmega _{D^*}^c\} \le 4\Vert f^*\Vert ^2_{L_\infty (\mathcal {M})}\exp \bigl \{-C|D^*|/n^d\bigr \}. \end{aligned}$$

(35)

This together with (33) gives

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\big |\varOmega _{D^*}\right\} \\&\quad = \mathbf {E}\left\{ \int _{\mathcal {M}} \mathbf {E}\left\{ (f^*(\mathbf {x})-V_{D^*,n}(\mathbf {x}))^2\big |\varLambda _{D^*}\right\} \mathrm {d}\omega (\mathbf {x}) \big |\varXi _{D^*},\varOmega _{D^*}\right\} {\mathbf {P}}\{\varXi _{D^*}\}\\&\qquad +\, \mathbf {E}\left\{ \int _{\mathcal {M}} \mathbf {E}\left\{ (f^*(\mathbf {x})-V_{D^*,n}(\mathbf {x}))^2\big |\varLambda _{D^*}\right\} \mathrm {d}\omega (\mathbf {x}) \big |\varXi _{D^*}^c,\varOmega _{D^*}\right\} {\mathbf {P}}\{\varXi _{D^*}^c\} \\&=: {\mathcal {A}}_{D^*,n,1} + {\mathcal {A}}_{D^*,n,2}. \end{aligned}$$

To bound ${\mathcal {A}}_{D^*,n,1}$, we observe that when the event $\varOmega _{D^*}\cap \varXi _{D^*}$ takes place, $\{w_{i}\}_{i=1}^{|D^*|}$ is a set of positive weights for quadrature rule ${\mathcal {Q}}_{|D^*|,n}$. We then obtain from Theorem 6 and $f^*\in {\mathbb {W}}_2^r(\mathcal {M})$ with $r>d/2$ that

$$\begin{aligned} {\mathcal {A}}_{D^*,n,1}\le c_5^2 n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}. \end{aligned}$$

On the other hand, under the event $\varOmega _{D^*}\cap \varXi _{D^*}^c$, by Cauchy–Schwarz inequality,

$$\begin{aligned} \bigl (f^*(\mathbf {x})-V_{D^*,n}(\mathbf {x})\bigr )^2&\le 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2+2\left| \sum _{i=1}^{|D^*|}w_{i}f^*(\mathbf {x}_i)K_n(\mathbf {x}_i, \mathbf {x})\right| ^2\\&\le 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2+2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2 \sum _{i=1}^{|D^*|}w^2_{i}\sum _{i=1}^{|D^*|}|K_n(\mathbf {x}_i, \mathbf {x})|^2. \end{aligned}$$

This with Lemma 3 and (33) gives

$$\begin{aligned} {\mathcal {A}}_{D^*,n,2} \le 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M})+ 2c_1^2 n^{d})\exp \left\{ -C|D^*|/n^d\right\} . \end{aligned}$$

Then, with (35), (34) and $cn^{d+\tau }\le |D^*|\le c' n^{2d}$, $\tau \in (0,d]$,

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \nonumber \\&\quad \le c_5^2 n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(2+\mu (\mathcal {M})+ 2c_1^2 n^{d})\exp \left\{ -C|D^*|/n^d\right\} \nonumber \\&\quad \le C_1|D^*|^{-2r/d}, \end{aligned}$$

(36)

with $C_1=c_5^2\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2 + c'\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$, where $c_5$ and $c'$ depend only on $\mu (\mathcal {M}), c_1, d, r$, and the filter H and its smoothness $\kappa $, and C from Lemma 6. $\square $

1.2 Proofs for Section 3.2

Proof

(Theorem 9) As $\mathbf {E}\{\epsilon _i\}=0$ for any $i=1,\dots ,|D|$,

$$\begin{aligned} \mathbf {E}\left\{ V_{D,n}(\mathbf {x})\right\}&= \mathbf {E}\left\{ \sum _{i=1}^mw_{i}y_iK_n(\mathbf {x}_i,\mathbf {x})\right\} = \mathbf {E}\left\{ \sum _{i=1}^mw_{i}(f^*(\mathbf {x}_i)+\epsilon _i)K_n(\mathbf {x}_i,\mathbf {x})\right\} \\&= \sum _{i=1}^mw_{i}f^*(\mathbf {x}_i)K_n(\mathbf {x}_i,\mathbf {x}) +\sum _{i=1}^mw_{i}\mathbf {E}\{\epsilon _i\}K_n(\mathbf {x}_i,\mathbf {x}) = V^*_{D,n}(x), \end{aligned}$$

then,

$$\begin{aligned} \mathbf {E}\left\{ V^*_{D,n}(\mathbf {x})-V_{D,n}(\mathbf {x}) \right\} =0. \end{aligned}$$

(37)

This implies

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})} \right\} \nonumber \\&\quad = \int _{\mathcal {M}} \mathbf {E}\{(f^*(x)-V_{D,n}(x))^2 \}\mathrm {d}{\mu (\mathbf {x})}\nonumber \\&\quad = \int _{\mathcal {M}} \mathbf {E}\{(f^*(x)-V^*_{D,n}(x) +V^*_{D,n}(x)-V_{D,n}(x))^2 \}\mathrm {d}{\mu (\mathbf {x})} \nonumber \\&\quad = \int _{\mathcal {M}} (V^*_{D,n}(x)-f^*(x))^2 \mathrm {d}{\mu (\mathbf {x})} + \int _{\mathcal {M}} \mathbf {E}\{(V^*_{D,n}(x)-V_{D,n}(x))^2 \}\mathrm {d}{\mu (\mathbf {x})} \nonumber \\&\quad := \mathcal {A}^\diamond _{D,n} + \mathcal {S}^\diamond _{D,n}. \end{aligned}$$

(38)

For $\mathcal {A}^\diamond _{D,n}$ in (38), Theorem 6 gives

$$\begin{aligned} {\mathcal {A}}^\diamond _{D,n}\le c_5^2 \,n^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}. \end{aligned}$$

(39)

To bound ${\mathcal {S}}^\diamond _{D,n}$, we observe from (12) that

$$\begin{aligned} \mathbf {E}\left\{ (V_{D^*,n}(\mathbf {x})-V_{D,n}(\mathbf {x}))^2 \right\}&= \mathbf {E}\left\{ \left( \sum _{i=1}^{|D|}(y_i-f^*(\mathbf {x}_i))w_{i} K_n(\mathbf {x}_i,\mathbf {x})\right) ^2 \right\} \\&= \mathbf {E}\left\{ \left( \sum _{i=1}^{|D|}\epsilon _iw_{i} K_n(\mathbf {x}_i,\mathbf {x})\right) ^2 \right\} \\&\le M^2 \sum _{i=1}^{|D|}w_{i}^2|K_n(\mathbf {x}_i,\mathbf {x})|^2, \end{aligned}$$

where the last inequality uses the independence of $\epsilon _1,\dots ,\epsilon _{|D|}$. This together with Lemma 3 and Assumption 8 shows

$$\begin{aligned} {\mathcal {S}}^\diamond _{D,n}&\le M^2 \int _{\mathcal {M}} \sum _{i=1}^{|D|} w_{i}^2| K_n(\mathbf {x}_i,\mathbf {x})|^2 \mathrm {d}{\mu (\mathbf {x})} \nonumber \\&= M^2 \sum _{i=1}^{|D|} w_{i}^2\int _{\mathcal {M}}|K_n(\mathbf {x}_i,\mathbf {x})|^2 \mathrm {d}{\mu (\mathbf {x})} \le c_1M^2n^d \sum _{i=1}^{|D|} w_{i}^2 \le \frac{c_1c_2^2M^2n^d}{|D|}. \end{aligned}$$

(40)

Putting (40) and (39) to (38), we obtain

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \le c_5^2n^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+ \frac{c_1c_2^2M^2n^d}{|D|}, \end{aligned}$$

(41)

with $\frac{c_3}{2} |D|^{\frac{1}{2r+d}}\le n\le c_3 |D|^{\frac{1}{2r+d}}$, then,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \le C_2|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

where $C_2:=4^r c_5^2 c_3^{-2r} \Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + c_1 c_2^2 c_3^d M^2$, thus completing the proof. $\square $

We need the following Nikolskiî-type inequality for manifolds, as proved by Filbir and Mhaskar [14, Proposition 4.1].

Lemma 11

For $n\in \mathbb {N}_{0}$ and $0<p<q\le \infty $,

$$\begin{aligned} \Vert P_n\Vert _{L_q(\mathcal {M})} \le c\, n^{\frac{d}{p}-\frac{d}{q}}\Vert P_n\Vert _{L_p(\mathcal {M})}, \end{aligned}$$

where the constant c depends only on d, p, q.

We need the following concentration inequality, Lemma 12, established by [46]. Let $\mathcal {F}$ be a subset of a metric space. For $\varepsilon >0$, the covering number $\mathcal {N}(\mathcal {F},\varepsilon )$ for $\mathcal {F}$ is the minimal natural integer $\ell $ such that $\mathcal {F}$ can be covered by $\ell $ balls of radius $\varepsilon $, see [8, 49].

Lemma 12

Let $\mathcal {G}$ be a set of functions on a product space $X\times Y$ with Borel probability measure $\rho $. For every $g\in \mathcal {G}$, if $|g-\mathbf {E}g|\le B$ almost everywhere and $\mathbf {E}(g^2)\le {\tilde{c}}(\mathbf {E} g)^\alpha $ for some $B\ge 0$, $0\le \alpha \le 1$ and ${\tilde{c}}\ge 0$. Then, for any $\varepsilon >0$,

$$\begin{aligned} {\mathbf {P}}\left\{ \sup _{g\in \mathcal {G}}\frac{\left| \mathbf {E}g-\frac{1}{m}\sum _{i=1}^mg(z_i)\right| }{\sqrt{(\mathbf {E}g)^\alpha +\varepsilon ^\alpha }}>\varepsilon ^{1-\frac{\alpha }{2}}\right\} \le 2{\mathcal {N}}(\mathcal {G},\varepsilon )\exp \left\{ -\frac{m\varepsilon ^{2-\alpha }}{2(\tilde{c}+\frac{1}{3}B\varepsilon ^{1-\alpha })}\right\} , \end{aligned}$$

where the expectation $\mathbf {E}g$ is taken on the product space $X\times Y$ with respect to $\rho $.

The third one is a covering number estimate for Banach space, as given by [50].

Lemma 13

Let ${\mathbb {B}}$ be a finite-dimensional Banach space. Let $B_R$ be the closed ball of radius R centered at origin given by $B_R:=\{f\in {\mathbb {B}}:\Vert f\Vert _{{\mathbb {B}}}\le R\}$. Then,

$$\begin{aligned} \log {\mathcal {N}}( B_R,\varepsilon )\le \dim ({\mathbb {B}}) \log \left( \frac{4R}{\varepsilon }\right) . \end{aligned}$$

Let ${\mathcal {X}}$ be a finite dimensional vector space endowed with norm $\Vert \cdot \Vert _{{\mathcal {X}}}$, and ${\mathcal {Z}}\subset {\mathcal {X}}^*$ be a finite set. We say that ${\mathcal {Z}}$ is a norm generating set for ${\mathcal {X}}$ if the mapping $T_{{\mathcal {Z}}}: {\mathcal {X}}\rightarrow {\mathbb {R}}^{|{\mathcal {Z}}|}$ defined by $T_{{\mathcal {Z}}}(x)=(z(x))_{z\in {\mathcal {Z}}}$ is injective. We call $T_{{\mathcal {Z}}}$ sampling operator. Let $W:=T_{{\mathcal {Z}}}({\mathcal {X}})$ be the range of $T_{{\mathcal {Z}}}$, then the injectivity of $T_{{\mathcal {Z}}}$ implies that $T_{{\mathcal {Z}}}^{-1}:W\rightarrow {\mathcal {X}}$ exists. Denote by $\Vert \cdot \Vert _{{\mathbb {R}}^{|{\mathcal {Z}}|}}$ the norm of ${\mathbb {R}}^{|{\mathcal {Z}}|}$, and $\Vert \cdot \Vert _{{\mathbb {R}}^{|{\mathcal {Z}}|^*}}$ the dual norm on ${\mathbf {R}}^{|{\mathcal {Z}}|^*}$ for $\Vert \cdot \Vert _{{\mathbb {R}}^{|{\mathcal {Z}}|}}$. We equip the space W with the induced norm, and let $\Vert T_{{\mathcal {Z}}}^{-1}\Vert :=\Vert T_{{\mathcal {Z}}}^{-1}\Vert _{W\rightarrow {\mathcal {X}}}$ be the operator norm. In addition, let $\mathcal {K}_+$ be the positive cone of ${\mathbb {R}}^{|{\mathcal {Z}}|}$ which is the set of all $(r_z)_{z\in \mathcal {Z}}\in {\mathbb {R}}^{|{\mathcal {Z}}|}$ such that $r_z\ge 0$. Then, the following lemma [26] holds.

Lemma 14

Let ${\mathcal {Z}}$ be a norm generating set for ${\mathcal {X}}$, with $T_{{\mathcal {Z}}}$ the corresponding sampling operator. If $\mathcal {L}\in {\mathcal {X}}^*$ with $\Vert \mathcal {L}\Vert _{{\mathcal {X}}^*}\le {\mathcal {A}}$, then there exist positive numbers $\{a_z\}_{z\in {\mathcal {Z}}}$, depending only on $\mathcal {L}$ such that for every $x\in {\mathcal {X}}$,

$$\begin{aligned} \mathcal {L}(x)=\sum _{z\in {\mathcal {Z}}}a_zz(x), \end{aligned}$$

and

$$\begin{aligned} \Vert (a_z)\Vert _{{\mathbb {R}}^{|{\mathcal {Z}}|^*}}\le {\mathcal {A}}\Vert T_{{\mathcal {Z}}}^{-1}\Vert . \end{aligned}$$

If the space $W=T_{{\mathcal {Z}}}(X)$ contains an interior point $v_0\in \mathcal {K}_+$, and if $\mathcal {L}(T_{{\mathcal {Z}}}^{-1}v)\ge 0$ when $v\in W\cap \mathcal {K}_+$, then we can choose $a_z\ge 0$.

Proof

(Lemma 6) For $p=1,2$, without loss of generality, we prove Lemma 6 for $P_n\in \varPi _n^d$ satisfying $\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}= A$ for some constant $A>0$. For arbitrary $P_n\in \varPi _{n}^{d}$ with $\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}= A$, it follows from (11) and Lemma 11 that

$$\begin{aligned} \Vert P_n\Vert _{L_\infty (\mathcal {M})}\le \tilde{C}_1n^{\frac{d}{p}}\Vert P_n\Vert _{L_p(\mathcal {M})} \le c^{1/p}_4\tilde{C}_1n^{\frac{d}{p}}\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}, \end{aligned}$$

and

$$\begin{aligned} \mathbf {E}\left\{ |P_n|^{2p}\right\} =\int _{\mathcal {M}}|P_n(\mathbf {x})|^{2p}\mathrm{d}\nu (\mathbf {x})&\le \Vert P_n\Vert ^p_{L_\infty (\mathcal {M})}\int _{\mathcal {M}}|P_n(\mathbf {x})|^p\mathrm{d}\nu (\mathbf {x})\\&\le c_4(\tilde{C}_1)^pn^d\Vert P_n\Vert ^p_{L_{p,\nu }(\mathcal {M})} \mathbf {E}\left[ |P_n|^p\right] . \end{aligned}$$

Let $g(z_i)=|P_n(\mathbf {x}_i)|^p$, $B=2c_4(\tilde{C}_1)^pn^d\Vert P_n\Vert ^p_{L_{p,\nu }(\mathcal {M})}$, $\tilde{c}=c_4(\tilde{C}_1)^pn^d\Vert P_n\Vert ^p_{L_{p,\nu }(\mathcal {M})}$, $m=N$, $\alpha =1$ and $ \mathcal {G}_p=\{|P_n|^p:P_n\in \varPi _n^d,\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}=A\}$ in Lemma 12. Then, for any $\varepsilon >0$,

$$\begin{aligned}&{\mathbf {P}}\left\{ \sup _{P_n\in \varPi _n^{d},\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}=A} \frac{\left| \Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}^p-\frac{1}{N}\sum _{i=1}^{N} |P_n(\mathbf {x}_i)|^p\right| }{ \sqrt{\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}^p+\varepsilon }}>\sqrt{\varepsilon }\right\} \\&\quad \le 2{\mathcal {N}}(\mathcal {G}_p,\varepsilon ) \exp \left\{ -\frac{{N}\varepsilon }{\tilde{C}_2n^d A^p}\right\} , \end{aligned}$$

where $\tilde{C}_2=10c_4(\tilde{C}_1)^p/3$.

We need to estimate the above covering number ${\mathcal {N}}(\mathcal {G}_p,\varepsilon )$ for $p=1,2$. To this end, we let $\mathcal {G}_1':=\{P_n\in \varPi _n^d:\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}=A\}$ and $\mathcal {G}_2':=\{P_n\in \varPi _{2n}^d:\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}=A\}$. By definition, ${\mathcal {N}}(\mathcal {G}_1,\varepsilon )\le {\mathcal {N}}(\mathcal {G}_1',\varepsilon )$ and ${\mathcal {N}}(\mathcal {G}_2,\varepsilon )={\mathcal {N}}(\mathcal {G}_2',\varepsilon )$, where for $p=2$, we have used $|P_n|^2\in \varPi _{2n}^d$. Then, by Lemma 13, for $p=1,2$, we obtain the upper bound

$$\begin{aligned}&{\mathbf {P}}\left\{ \sup _{P_n\in \varPi _n^{d},\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}=A} \frac{\left| \Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}^p-\frac{1}{N}\sum _{i=1}^{N} |P_n(\mathbf {x}_i)|^p\right| }{ \sqrt{\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}^p+\varepsilon }}>\sqrt{\varepsilon }\right\} \\&\quad \le 2\exp \left\{ (2n)^d\log \frac{4A^p}{\varepsilon } -\frac{{N}\varepsilon }{\tilde{C}_2n^dA^p}\right\} , \end{aligned}$$

where we have used the estimate $\dim \mathcal {G}_p\le (pn)^d$. Let $\varepsilon =A^p/4$. As $N/n^{2d}>c$ for sufficiently large constant c, with confidence $1-2\exp \left\{ -C N/n^d\right\} $, there holds

$$\begin{aligned} \left| \Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}^p-\frac{1}{N}\sum _{i=1}^{N}|P_n(\mathbf {x}_i)|^p\right| \le \frac{\sqrt{5}}{4}\Vert P_n\Vert ^p_{L_{p,\nu }(\mathcal {M})}. \end{aligned}$$

From this, we then obtain

$$\begin{aligned} \frac{1}{3}\Vert P_n\Vert _{L_{p,\nu }(\mathcal {M})}^p\le \frac{1}{N}\sum _{i=1}^{N} |P_n(\mathbf {x}_i)|^p\le \frac{5}{3}\Vert P\Vert _{L_{p,\nu }(\mathcal {M})}^p\quad \forall P_n \in \varPi _n^{d},\ p=1,2 \end{aligned}$$

(42)

holds with probability at least $1-2\exp \left\{ -C N/n^d \right\} $.

We now apply (42) with $p=2$ and Lemma 14 to prove Lemma 6. In Lemma 14, we take ${\mathcal {X}}=\varPi _n^{d}$, $\Vert P_n\Vert _{{\mathcal {X}}}=\Vert P_n\Vert _{L_{2,\nu }(\mathcal {M})}$, and ${\mathcal {Z}}$ the set of point evaluation functionals $\{\delta _{\mathbf {x}_i}\}_{i=1}^{N}$. The operator $T_{{\mathcal {Z}}}$ is then the restriction map $P_n\mapsto P_n|_{X_N} $ and $\Vert f\Vert _{\varLambda _{D},2}:=\left( \frac{1}{N}\sum _{i=1}^N|f(\mathbf {x}_i)|^2\right) ^\frac{1}{2}$. It follows from (42) that with confidence at least $1-2\exp \left\{ -\tilde{C}_3 N/n^d \right\} $, there holds $\Vert T_{{\mathcal {Z}}}^{-1}\Vert \le \sqrt{\frac{5}{3}}$. We let $\mathcal {L}$ be the functional

$$\begin{aligned} \mathcal {L}: P_n\mapsto \int _{\mathcal {M}}P_n(x)\mathrm {d}\nu (x). \end{aligned}$$

By Hölder’s inequality, $\Vert y\Vert _{{\mathcal {X}}^*}\le 1$. Lemma 14 then shows that there exists a set of real numbers $\{w_{i,n}\}_{i=1}^N$ such that

$$\begin{aligned} \int _{\mathcal {M}}P_n(x)\mathrm {d}\nu (x)=\sum _{i=1}^{N}w_{i,n}P_n(\mathbf {x}_i) \end{aligned}$$

holds with confidence at least $1-2\exp \left\{ -\tilde{C}_3 N/n^d \right\} $, subject to

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{N}\left( \frac{w_{i,n} }{1/{N}}\right) ^2\le 2. \end{aligned}$$

Finally, we use the second assertion of Lemma 14 and (42) with $p=1$ to prove the positivity of $w_{i,n}$. Since $1\in \varPi _n^d$, we have $v_0:=1|_{X_N}$ is an interior point of $\mathcal {K}_+$. For $P_n\in \varPi _n^d$, $T_{\mathcal {Z}}P_n=P_n|_{X_N}$ is in $W\cap \mathcal {K}_+$ if and only if $P_n(\mathbf {x}_i)\ge 0$ for all $\mathbf {x}_i\in X_N$. For arbitrary $P_n$ satisfying $P_n(\mathbf {x}_i)\ge 0$, $\mathbf {x}_i\in X_N$, define $\xi _i(P_n)=P_n(\mathbf {x}_i)$. From Lemma 11 and (11), we obtain for $i=1,\dots ,N$,

$$\begin{aligned}&|\xi _i| \le \Vert P_n\Vert _{L_\infty (\mathcal {M})} \le \tilde{C}_1n^d\Vert P_n\Vert _{L_1(\mathcal {M})} \le \tilde{C}_1c_4n^d\Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})},\\&|\xi _i-\mathbf {E}\xi _i| \le 2\Vert P_n\Vert _{L_\infty (\mathcal {M})} \le 2\tilde{C}_1c_4n^d\Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})},\\&\mathbf {E}\xi _i^2 \le \Vert P_n\Vert _{L_\infty (\mathcal {M})} \Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})}\le \tilde{C}_1 c_4 n^d \Vert P_n\Vert ^2_{L_{1,\nu }(\mathcal {M})}. \end{aligned}$$

Applying Lemma 12 with $B=2 \tilde{C}_1 c_4 n^d\Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})}$, $\tilde{c}=\tilde{C}_1 c_4 n^d\Vert P_n\Vert ^2_{L_{1,\nu }(\mathcal {M})}$ and $\alpha =0$ to the set $\{P_n:P_n\in \varPi _n^d, \Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})}=A\}$, using Lemma 13, we obtain for any $\varepsilon >0$,

$$\begin{aligned}&{\mathbf {P}}\left\{ \sup _{\begin{array}{c} P_n\in \varPi _n^{d}, P_n=|P_n| \\ \Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})}=A \end{array}} \left| y(P_n)-\frac{1}{N} \sum _{i=1}^{N} P_n(\mathbf {x}_i) \right| >\varepsilon \right\} \\&\quad \le 2\exp \left\{ n^d\log \frac{4A }{\varepsilon } -\frac{{N}\varepsilon ^2}{2\tilde{C}_1c_4n^d A (A +2\varepsilon /3)}\right\} . \end{aligned}$$

Let $\varepsilon =A/4=\frac{1}{4}\Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})}$. We then obtain that with confidence $1-2\exp \left\{ -C N/n^d\right\} $,

$$\begin{aligned} \left| y(P_n)-\frac{1}{N} \sum _{i=1}^{N} P_n(\mathbf {x}_i)\right| \le \frac{1}{4}\Vert P_n\Vert _{L_{1,\nu }(\mathcal {M})}, \end{aligned}$$

This and (42) imply that for $P_n$ which satisfies that $P_n(\mathbf {x}_i)\ge 0$ for all $\mathbf {x}_i\in X_N$, the inequality

$$\begin{aligned} \left| y(P_n)-\frac{1}{N} \sum _{i=1}^{N} P_n(\mathbf {x}_i)\right| \le \frac{3}{4}\frac{1}{N}\sum _{i=1}^{N} P_n(\mathbf {x}_i) \end{aligned}$$

holds with confidence $1-4\exp \left\{ -C N/n^d\right\} $ with the constant C depending only on $\tilde{C}_3$ and $c_4$, then,

$$\begin{aligned} y(P_n)\ge \frac{1}{4} \frac{1}{N}\sum _{i=1}^{N} P_n(\mathbf {x}_i)\ge 0 \end{aligned}$$

for arbitrary $P_n\in \varPi _n^d$ satisfying $P_n(\mathbf {x}_i)\ge 0$, $\mathbf {x}_i\in X_N$. Lemma 14 then implies $w_{i,n}\ge 0$ with confidence $1-4\exp \left\{ -C N/n^d\right\} $, thus completing the proof of Theorem 6. $\square $

Proof

(Theorem 10) For the quadrature rule for random samples $\{(\mathbf {x}_i,w_i)\}_{i=1}^{|D|}$ in Definition 7, we define four events, as follows. Let $ \varOmega _{D}$ be the event such that $\sum _{i=1}^{|D|}|w_{i}|^2\le \frac{2}{|D|}$ and $\varOmega _{D}^c$ be the complement of $\varOmega _D$, i.e., $\varOmega _{D}^c$ be the event $\sum _{i=1}^{|D|}|w_{i}|^2>\frac{2}{|D|}$. Let $\varXi _D$ the event that $\{(w_{i},\mathbf {x}_i)\}_{i=1}^{|D|}$ is a quadrature rule exact for polynomials in $\varPi _n^d$ associated with the measure $\nu $ and $\varXi _D^c$ the complement event of $\varXi _D$. Then, by Lemma 6,

$$\begin{aligned} {\mathbf {P}}\{\varOmega _{D}^c\}\le {\mathbf {P}}\{\varXi _D^c\}\le 4\exp \left\{ -C|D|/n^d\right\} . \end{aligned}$$

(43)

To estimate the approximation error, we write

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \nonumber \\&\quad =\mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D}\right\} {\mathbf {P}}\{\varOmega _{D}\} +\mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D}^c\right\} {\mathbf {P}}\{\varOmega _{D}^c\}. \end{aligned}$$

(44)

Under the event $\varOmega _{D}^c$, by Definition 7, $V_{D,n}=0$. Then, by (43),

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _D^c\right\} {\mathbf {P}}\{\varOmega _{D}^c\} \le 4\Vert f^*\Vert ^2_{L_\infty (\mathcal {M})}\exp \bigl \{-C|D|/n^d\bigr \}. \end{aligned}$$

(45)

Next, we estimate the first term of RHS of (44) when the event $\varOmega _{D}$ takes place. Let $\varLambda _D:=\{\mathbf {x}_i\}_{i=1}^{|D|}$ be the set of points of data D. By the independence between $\{\epsilon _i\}_{i=1}^{|D|}$ and $\varLambda _{D}$ and $\mathbf {E}\{\epsilon _i\}=0$ for $i=1,\dots , |D|$,

$$\begin{aligned} \mathbf {E}\left\{ V_{D,n}(\mathbf {x})\big |\varLambda _{D}\right\}&= \mathbf {E}\left\{ \sum _{i=1}^mw_{i}y_i K_n(\mathbf {x}_i, \mathbf {x})\big |\varLambda _{D}\right\} \\&= \mathbf {E}\left\{ \sum _{i=1}^mw_{i}(f^*(\mathbf {x}_i)+\epsilon _i)K_n(\mathbf {x}_i, \mathbf {x})\big |\varLambda _{D}\right\} \\&= \sum _{i=1}^mw_{i}f^*(\mathbf {x}_i)K_n(\mathbf {x}_i, \mathbf {x})+\sum _{i=1}^mw_{i}\mathbf {E}\{\epsilon _i\}K_n(\mathbf {x}_i, \mathbf {x})\\&= V_{D^*,n}(\mathbf {x}). \end{aligned}$$

Hence,

$$\begin{aligned} \mathbf {E}\left\{ \left( V_{D^*,n}(\mathbf {x})-V_{D,n}(\mathbf {x})\right) \big |\varLambda _{D}\right\} =0. \end{aligned}$$

(46)

This allows us to write

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\big |\varOmega _{D}\right\} = \mathbf {E}\left\{ \int _{\mathcal {M}} \mathbf {E}\{(f^*(\mathbf {x})-V_{D,n}(\mathbf {x}))^2 \big |\varLambda _{D}\}\mathrm {d}\omega (\mathbf {x})\big |\varOmega _{D}\right\} \nonumber \\&\quad = \mathbf {E}\left\{ \int _{\mathcal {M}} \mathbf {E}\{(f^*(\mathbf {x})-V_{D^*,n}(\mathbf {x}) +V_{D^*,n}(\mathbf {x})-V_{D,n}(\mathbf {x}))^2 \big |\varLambda _{D}\}\mathrm {d}\omega (\mathbf {x})\big |\varOmega _{D}\right\} \nonumber \\&\quad = \mathbf {E}\left\{ \int _{\mathcal {M}} \mathbf {E}\{(V_{D^*,n}(\mathbf {x})-V_{D,n}(\mathbf {x}))^2\big |\varLambda _{D}\}\mathrm {d}\omega (\mathbf {x}) \big |\varOmega _{D}\right\} \nonumber \\&\qquad +\, \mathbf {E}\left\{ \int _{\mathcal {M}} \mathbf {E}\{(V_{D^*,n}(\mathbf {x})-f^*(\mathbf {x}))^2 \big |\varLambda _{D}\}\mathrm {d}\omega (\mathbf {x})\big |\varOmega _{D}\right\} \nonumber \\&\quad := {\mathcal {S}}_{D,n}+{\mathcal {A}}_{D,n}. \end{aligned}$$

(47)

Given $\varLambda _{D}$, if the event $\varOmega _{D}$ occurs, by $|\epsilon _i|\le M$,

$$\begin{aligned} \mathbf {E}\left\{ (V_{D^*,n}(\mathbf {x})-V_{D,n}(\mathbf {x}))^2\big |\varLambda _{D}\right\}&= \mathbf {E}\left\{ \left( \sum _{i=1}^{|D|} \epsilon _i w_{i}K_n(\mathbf {x}_i, \mathbf {x})\right) ^2 \bigg |\varLambda _{D}\right\} \\&\le M^2 \sum _{i=1}^{|D|}w_{i}^2 |K_n(\mathbf {x}_i, \mathbf {x})|^2, \end{aligned}$$

where the second line uses the independence of $\epsilon _1, \dots ,\epsilon _{|D|}$. This with Lemma 3 shows

$$\begin{aligned} {\mathcal {S}}_{D,n}&\le M^2\mathbf {E}\left\{ \int _{\mathcal {M}} \sum _{i=1}^{|D|} w_{i}^2|K_n(\mathbf {x}_i, \mathbf {x})|^2 \mathrm {d}\omega (\mathbf {x}) \big |\varOmega _D\right\} \nonumber \\&= M^2\mathbf {E}\left\{ \sum _{i=1}^{|D|} w_{i}^2\int _{\mathcal {M}} |K_n(\mathbf {x}_i, \mathbf {x})|^2 \mathrm {d}\omega (\mathbf {x})\big |\varOmega _D\right\} \nonumber \\&\le c_1^2 M^2n^d\mathbf {E}\left\{ \sum _{i=1}^{|D|} w_{i}^2\right\} \le \frac{2c_1^2M^2n^d}{|D|}. \end{aligned}$$

(48)

On the other hand, similar as the derivation of (36), we obtain

$$\begin{aligned} {\mathcal {A}}_{D,n}\le & {} c_5^2n^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}\nonumber \\&+ 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M})+ 2c_1^2 n^{d})\exp \{-C|D|/n^d\}. \end{aligned}$$

(49)

This and (48) and (47) give

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\big |\varOmega _{D}\right\} \\&\quad \le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M})+ 2c_1^2 n^{d})\exp \{-C|D|/n^d\}\\&\qquad +\frac{2c_1^2M^2n^d}{|D|}. \end{aligned}$$

Putting the above estimate and (45) into (44), we obtain

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\}&\le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+\frac{2c_1^2M^2n^d}{|D|}\nonumber \\&\quad + \,2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M})+ 2c_1^2 n^{d}+2)\exp \bigl \{-C|D|/n^d\bigr \}. \end{aligned}$$

(50)

Taking account of $n\asymp |D|^{\frac{1}{2r+d}}$ and $r>d/2$, we then have

$$\begin{aligned} n^{d}\exp \left\{ -C|D|/n^d\right\} \le C'_5 |D|^{\frac{d}{2r+d}}\exp \left\{ -C|D|^{\frac{2r}{2r+d}}\right\} \le \tilde{C}_5|D|^{-\frac{2r}{2r+d}}. \end{aligned}$$

Thus,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \le C_3|D|^{-\frac{2r}{2r+d}} \end{aligned}$$

with $C_3$ a constant independent of |D|, thus completing the proof. $\square $

1.3 Proofs for Section 4.1

Proof

(Theorem 11) By Definition 6, for $f^* \in \mathbb {W}_{2}^{r}(\mathcal {M})$,

$$\begin{aligned} \big \Vert V_{D^*,n}^{(m)}-f^*\big \Vert _{L_{2}(\mathcal {M})} \le \sum _{j=1}^m\frac{|D^*_j|}{|D^*|}\big \Vert V_{D^*_j,n}-f^*\big \Vert _{L_{2}(\mathcal {M})}. \end{aligned}$$

(51)

By $\min _{j=1,\dots ,m}|D_{j}^*|\ge |D^*|^{\frac{2d}{2r+d}}$ and Assumption 8, there exists a quadrature rule for each local server $D_j^*$ that is exact for polynomials of degree $3n-1$ for n satisfying $\frac{c_3}{6}|D^*|^{\frac{1}{2r+d}}\le n \le \frac{c_3}{3}|D^*|^{\frac{1}{2r+d}}$. By Theorem 6, for each $D_j^*$, $j=1,\dots ,m$,

$$\begin{aligned} \big \Vert V_{D^*_j,n}-f^*\big \Vert _{L_{2}(\mathcal {M})} \le c_5 n^{-r} \Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})} \le 6^r c_3^{-r} c_5 \Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})} |D^*|^{-\frac{r}{2r+d}}, \end{aligned}$$

then, with $\sum _{j=1}^m\frac{|D_j|}{|D|}=1$,

$$\begin{aligned} \big \Vert V_{D^*,n}^{(m)}-f^*\big \Vert _{L_{2}(\mathcal {M})} \le 6^rc_3^{-r} c_5 \Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})} |D^*|^{-\frac{r}{2r+d}}. \end{aligned}$$

$\square $

To prove Theorem 13, we need the following modified version of [16, Proposition 4].

Lemma 15

For $V_{D,n}^{(m)}$ in Definition 6 with quadrature rule given by (13), there holds

$$\begin{aligned}&\mathbf {E}\left\{ \bigl \Vert V_{D,n}^{(m)}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2\right\} \nonumber \\&\quad \le \sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\mathbf {E}\left\{ \Vert V_{D_j,n} -f^*\Vert _{L_2(\mathcal {M})}^2\right\} +\sum _{j=1}^m\frac{|D_j|}{|D|} \bigl \Vert V_{D^*_j,n}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2, \end{aligned}$$

(52)

where $V_{D^*_j,n}$ is given by (9).

Proof

Due to (15) and $\sum _{j=1}^m\frac{|D_j|}{|D|}=1$, we have

$$\begin{aligned}&\bigl \Vert V_{D,n}^{(m)}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2 = \left\| \sum _{j=1}^m\frac{|D_j|}{|D|}(V_{D_j,n}-f^*)\right\| _{L_2(\mathcal {M})}^2\\&\quad = \sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2 \\&\qquad + \sum _{j=1}^m\frac{|D_j|}{|D|}\left\langle V_{D_j,n}-f^*,\sum _{k\ne j}\frac{|D_k|}{|D|}(V_{D_k,n}-f^*)\right\rangle _{L_2(\mathcal {M})}. \end{aligned}$$

Taking expectations gives

$$\begin{aligned}&\mathbf {E}\left\{ \bigl \Vert V_{D,n}^{(m)}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2\right\} = \sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\mathbf {E}\left\{ \Vert V_{D_j,n} -f^*\Vert _{L_2(\mathcal {M})}^2\right\} \\&\quad +\, \sum _{j=1}^m\frac{|D_j|}{|D|}\left\langle \mathbf {E}_{D_j}\bigl \{V_{D_j,n}\bigr \}-f^*,\mathbf {E}\left\{ V_{D,n}^{(m)}\right\} -f^*- \frac{|D_j|}{|D|} \left( \mathbf {E}_{D_j}\{V_{D_j,n}\}-f^*\right) \right\rangle _{L_2(\mathcal {M})}. \end{aligned}$$

Here,

$$\begin{aligned} \sum _{j=1}^m\frac{|D_j|}{|D|}\left\langle \mathbf {E}_{D_j}\{V_{D_j,n}\}-f^*,\mathbf {E}\left\{ V_{D,n}^{(m)}\right\} -f^*\right\rangle _{L_2(\mathcal {M})} =\left\| \mathbf {E}\left\{ V_{D,n}^{(m)}\right\} -f^*\right\| _{L_2(\mathcal {M})}^2. \end{aligned}$$

Then,

$$\begin{aligned}&\mathbf {E}\left\{ \bigl \Vert V_{D,n}^{(m)}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2\right\} \\&\quad = \sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \\&\qquad -\,\sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\left\| \mathbf {E}\{V_{D_j,n}\}-f^*\right\| _{L_2(\mathcal {M})}^2 + \left\| \mathbf {E}\left\{ V_{D,n}^{(m)}\right\} -f^*\right\| _{L_2(\mathcal {M})}^2. \end{aligned}$$

By (37),

$$\begin{aligned} \mathbf {E}\left\{ V_{D,n}^{(m)}\right\} =\sum _{j=1}^m\frac{|D_j|}{|D|}V_{D^*_j,n}. \end{aligned}$$

This plus $\sum _{j=1}^m\frac{|D_j|}{|D|}=1$ gives

$$\begin{aligned} \left\| \mathbf {E}\left\{ V_{D,n}^{(m)}\right\} -f^*\right\| ^2_{L_2(\mathcal {M})}&= \left\| \sum _{j=1}^m\frac{|D_j|}{|D|}\left( V_{D^*_j,n}-f^*\right) \right\| _{L_2(\mathcal {M})}^2\\&\le \sum _{j=1}^m\frac{|D_j|}{|D|}\bigl \Vert V_{D^*_j,n}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2, \end{aligned}$$

thus proving the bound in (52). $\square $

We will use the following lemma to prove Theorems 12 and 14, which can be obtained similarly as Lemma 15.

Lemma 16

For the distributed filtered hyperinterpolation $V_{D,n}^{(m)}$ with random sampling points,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}^{(m)}-f^*\Vert _{L_2(\mathcal {M})}^2\right\}&\le \sum _{j=1}^m\frac{|D_j|^2}{|D|^2} \mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \\&\quad +\,\sum _{j=1}^m\frac{|D_j|}{|D|} \left\| \mathbf {E}\{V_{D_j,n}\}-f^*\right\| _{L_2(\mathcal {M})}^2. \end{aligned}$$

Proof

(Theorem 12) For each $j=1,\dots ,m$,

$$\begin{aligned} \left\| \mathbf {E}\bigl \{V_{D_j^*,n}\bigr \}-f^*\right\| _{L_{2}(\mathcal {M})}^2 = \left\| \mathbf {E}\bigl \{V_{D_j^*,n}-f^*\bigr \}\right\| _{L_{2}(\mathcal {M})}^2 \le \mathbf {E}\left\{ \bigl \Vert V_{D_j^*,n}-f^*\bigr \Vert _{L_{2}(\mathcal {M})}^2\right\} .\nonumber \\ \end{aligned}$$

(53)

Then, by Lemma 16, we need only to estimate $\mathbf {E}\left\{ \bigl \Vert V_{D_j^*,n}-f^*\bigr \Vert _{L_{2}(\mathcal {M})}^2\right\} $. To this end, we can use the similar proof of the non-distributed case of Theorem 10 to prove its distributed counterpart $V_{D^*,n}^{(m)}$. For each $j=1,\dots ,m$, we let $\varOmega _{D^*_j}$ be the event such that the sum of the quadrature weights $\sum _{i=1}w_{i,n,D^*_j}^2\le 2/|D^*_{j}|$, and $\varOmega _{D^*_j}^c$ the complement of $\varOmega _{D^*_j}$. We write

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\}&=\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D^*_j}\right\} {\mathbf {P}}\{\varOmega _{D^*_j}\}\\&\quad +\,\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D^*_j}^c\right\} {\mathbf {P}}\{\varOmega _{D^*_j}^c\}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D^*_j}^c\right\} {\mathbf {P}}\{\varOmega _{D^*_j}^c\} \le 4\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2\exp \bigl \{-C|D^*_j|/n^d\bigr \}. \end{aligned}$$

By (49) with $D=D_j$,

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})} |\varOmega _{D^*_j}\right\} {\mathbf {P}}\{\varOmega _{D^*_j}\} \\&\quad \le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+ 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M})+ 2c_1^2 n^{d})\exp \bigl \{-C|D^*_j|/n^d\bigr \}. \end{aligned}$$

These give

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \\&\quad \le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M}) +2c_1^2n^{d}+2)\exp \bigl \{-C|D^*_j|/n^d\bigr \}. \end{aligned}$$

By $\min _{1\le j\le m}|D^*_j|\ge |D^*|^\frac{d+\tau }{2r+d}$, $n\asymp |D^*|^{\frac{1}{2r+d}}$ and $2r>d$, $0<\tau <2r$,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \le \tilde{C}_5|D^*|^{-\frac{2r}{2r+d}}. \end{aligned}$$

(54)

Then, by Lemma 16 and (53),

$$\begin{aligned} \mathbf {E}\left\{ \bigl \Vert V_{D^*,n}^{(m)}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2\right\}&\le \sum _{j=1}^m\left( \frac{|D^*_j|}{|D^*|}+\frac{|D^*_j|^2}{|D^*|^2}\right) \mathbf {E}\left\{ \bigl \Vert V_{D^*_j,n}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2\right\} \\&\le C_5 |D^*|^{-\frac{2r}{2r+d}} \end{aligned}$$

with $C_5=c'\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2+c''\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$ for constants $c', c''$ dependent only on $\mu (\mathcal {M}), d, r$, and the filter H and its smoothness r, and constant C from Lemma 6. $\square $

1.4 Proofs for Section 4.2

Proof

(Theorem 13) By Lemma 15, we only need to estimate the bounds of $\mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} $ and $\bigl \Vert V_{D^*_j,n}-f^*\bigr \Vert _{L_2(\mathcal {M})}^2$. By $\min _{j=1,\dots ,m}|D_j|\ge |D|^{2d/(2r+d)}$ and $D_j$ and Assumption 8, there exists a quadrature rule for each local server which is exact for polynomials of degree $3n-1$ for n satisfying $\frac{c_3}{6}|D|^{1/(2r+d)}\le n\le \frac{c_3}{3}|D|^{1/(2r+d)}$.

By (41), for $j=1,\dots ,m$,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \le c_5^2n^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+ \frac{c_1c_2^2M^2n^d}{|D_j|}. \end{aligned}$$

This gives

$$\begin{aligned}&\sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \nonumber \\&\quad \le 36^r c_5^2c_3^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}|D|^{-\frac{2r}{2r+d}} + 3^{-d} c_1 c_2^2 c_3^{d}M^2 \sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\frac{|D|^{\frac{d}{2r+d}}}{|D_j|}\nonumber \\&\quad = C'_6|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

(55)

where $C'_6:=36^r c_5^2c_3^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + 3^{-d}c_1 c_2^2 c_3^d M^2$.

For each $j=1,\dots ,m$, Assumption 8 implies that there exists a quadrature rule with nodes of $D_j$ and $|D_j|$ positive weights such that $V_{D^*_j,n}$ is a filtered hyperinterpolation for the noise-free data set $\{\mathbf {x}_i,f^*(\mathbf {x}_i)\}_{\mathbf {x}_i\in D_j}$. Theorem 6 then gives

$$\begin{aligned} \left\| V_{D^*_j,n} -f^*\right\| _{L_2(\mathcal {M})}^2 \le c_5^2n^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} \quad \forall j=1,2,\dots ,m. \end{aligned}$$

This together with conditions $\sum _{j=1}^m\frac{|D_j|}{|D|}=1$ and $n\ge \frac{c_3}{6}|D|^{1/(2r+d)}$ gives

$$\begin{aligned} \sum _{j=1}^m\frac{|D_j|}{|D|}\left\| V_{D^*_j,n} -f^*\right\| _{L_2(\mathcal {M})}^2 \le 36^r c_5^2c_3^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}|D|^{-\frac{2r}{2r+d}}. \end{aligned}$$

(56)

Using (55) and (56), and Lemma 15,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}^{(m)}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_6|D|^{-\frac{2r}{2r+d}}. \end{aligned}$$

Here, $C_6 = 2^{2r+1}\cdot 3^{2r} c_5^2 c_3^{-2r}\Vert f^*\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + 3^{-d} c_1 c_2^2 c_3^{d} M^2$. $\square $

Proof

(Theorem 14) By Lemma 16, we only need to estimate the bounds of $\mathbf {E}\bigl \{\Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2\bigr \}$ and $\left\| \mathbf {E}\{V_{D_j,n}\}-f^*\right\| _{L_2(\mathcal {M})}^2$. To estimate the first, we obtain from (50) with $D=D_j$ that for $j=1,\dots ,m$,

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \\&\quad \le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + \frac{2c_1^2M^2n^d}{|D_j|}\\&\qquad + \,2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2\left( \mu (\mathcal {M})+ 2c_1^2 n^{d}+2\right) \exp \bigl \{-C|D_j|/n^d\bigr \}. \end{aligned}$$

Since $\min _{1\le j\le m}|D_j|\ge |D|^\frac{d+\tau }{2r+d}$, $n\asymp |D|^{\frac{1}{2r+d}}$, $2r>d$ and $0<\tau <2r$,

$$\begin{aligned} 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2\left( \mu (\mathcal {M})+ 2c_1^2 n^{d}+2\right) \exp \bigl \{-C|D_j|/n^d\bigr \}\le \tilde{C}_7|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

where $\tilde{C}_7$ is a constant depending only on $r,c_1,C,d$ and $f^*$. Thus, there exists a constant $\tilde{C}_8$ independent of $m,n,|D_1|,\dots ,|D_m|$ and |D| such that

$$\begin{aligned}&\sum _{j=1}^m\frac{|D_j|^2}{|D|^2} \mathbf {E}\left\{ \Vert V_{D_j,n}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \nonumber \\&\quad \le \tilde{C}_8 \left( |D|^{-\frac{2r}{2r+d}}+ \sum _{j=1}^m\frac{|D_j|^2}{|D|^2}\frac{|D|^{\frac{d}{2r+d}}}{|D_j|}\right) = \bigl (\tilde{C}_8+1\bigr )|D|^{-\frac{2r}{2r+d}}. \end{aligned}$$

(57)

To bound $\left\| \mathbf {E}\{V_{D_j,n}\}-f^*\right\| _{L_2(\mathcal {M})}^2$, we use (46) and Jensen’s inequality to obtain

$$\begin{aligned} \left\| \mathbf {E}\{V_{D_j,n}\}-f^*\right\| _{L_2(\mathcal {M})}^2&= \left\| \mathbf {E}\{\mathbf {E}\{V_{D_j,n}|\varLambda _{ D_j }\}-f^*\}\right\| _{L_2(\mathcal {M})}^2\nonumber \\&= \left\| \mathbf {E}\{V_{D^*_j,n}-f^*\}\right\| _{L_2(\mathcal {M})}^2 \le \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^* \Vert _{L_2(\mathcal {M})}^2\right\} . \end{aligned}$$

(58)

We now use the similar proof as Theorem 10 to prove the error bound of distributed filtered hyperinterpolation $V_{D,n}^{(m)}$. For each $j=1,\dots ,m$, we let $\varOmega _{D_j}$ be the event such that the sum of the quadrature weights $\sum _{i=1}w_{i,n,D_j}^2\le 2/|D_{j}|$, and $\varOmega _{D_j}^c$ the complement of $\varOmega _{D_j}$. We write

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\}&=\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D_j}\right\} {\mathbf {P}}\{\varOmega _{D_j}\}\\&\quad +\,\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D_j}^c\right\} {\mathbf {P}}\{\varOmega _{D_j}^c\}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}|\varOmega _{D_j}^c\right\} {\mathbf {P}}\{\varOmega _{D_j}^c\} \le 4\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2\exp \bigl \{-C|D_j|/n^d\bigr \}. \end{aligned}$$

By (49) with $D=D_j$,

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})} |\varOmega _{D_j}\right\} {\mathbf {P}}\{\varOmega _{D_j}\} \\&\quad \le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})}+ 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M})+ 2c_1^2 n^{d})\exp \bigl \{-C|D_j|/n^d\bigr \}. \end{aligned}$$

These give

$$\begin{aligned}&\mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \\&\quad \le c_5^2n^{-2r}\Vert f\Vert ^2_{{\mathbb {W}}_2^r(\mathcal {M})} + 2\Vert f^*\Vert _{L_\infty (\mathcal {M})}^2(\mu (\mathcal {M}) +2c_1^2n^{d}+2)\exp \bigl \{-C|D_j|/n^d\bigr \}. \end{aligned}$$

By $\min _{1\le j\le m}|D_j|\ge |D|^\frac{d+\tau }{2r+d}$, $n\sim |D|^{\frac{1}{2r+d}}$ and $2r>d$, $0<\tau <2r$,

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D^*_j,n}-f^*\Vert ^2_{L_2(\mathcal {M})}\right\} \le \tilde{C}_9|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

which with (58) and $\sum _{j=1}^m\frac{|D_j|}{|D|}=1$ gives

$$\begin{aligned} \sum _{j=1}^m\frac{|D_j|}{|D|}\left\| \mathbf {E}\{V_{D_j,n}\}-f^*\right\| _{L_2(\mathcal {M})}^2 \le \tilde{C}_{9}|D|^{-\frac{2r}{2r+d}}. \end{aligned}$$

(59)

Using (57) and (59), and Lemma 16, we then obtain

$$\begin{aligned} \mathbf {E}\left\{ \Vert V_{D,n}^{(m)}-f^*\Vert _{L_2(\mathcal {M})}^2\right\} \le C_7|D|^{-\frac{2r}{2r+d}}, \end{aligned}$$

with $C_7=c'M^2+c''\Vert f^*\Vert _{\mathbb {W}_{2}^{r}(\mathcal {M})}^2+c'''\Vert f^*\Vert _{L_{\infty }(\mathcal {M})}^2$ for $c', c''$ and $c'''$ dependent only on $\mu (\mathcal {M}), d, r$, and the filter H and its smoothness r, and constant C from Lemma 6. $\square $

Table of notations

Symbol	Meaning
$\mathbb {N}$	Set of natural numbers $\{1,2,\dots \}$
$\mathbb {N}_{0}$	$\mathbb {N}\cup \{0\}$
$\mathbb {R}^{d}$	d-dimensional real coordinate space
$\mathbb {R}_{+}$	Set of non-negative real numbers
$\mathcal {M}$	Compact and smooth Riemannian manifold, where we call $\mathcal {M}$ d-manifold
d	Dimension of manifold $\mathcal {M}$
$B(\mathbf {x},\alpha )$	Ball with center $\mathbf {x}$ and radius $\alpha $ in manifold
$\rho (\mathbf {x},\mathbf {y})$	Distance between points $\mathbf {x},\mathbf {y}\in \mathcal {M}$ induced by Riemannian metric
$\mu $	Lebesgue measure on $\mathcal {M}$
$C(\mathcal {M})$	Continuous function space on $\mathcal {M}$
$L_p(\mathcal {M})$	Real-valued $L_p$ space on $\mathcal {M}$
r	Smoothness index of Sobolev space containing the target function
$W^r_p(\mathcal {M})$	Sobolev space on $\mathcal {M}$ with smoothness r and p-th norm
n	Degree of polynomial or polynomial space on $\mathcal {M}$
$\varPi _n$	Polynomial space of degree n on $\mathcal {M}$
$P_n$	Diffusion polynomial of degree n on $\mathcal {M}$
$E_n(f)_p$	Best approximation for f in $L_p(\mathcal {M})$
$\varDelta $	Laplace–Beltrami operator on $\mathcal {M}$
$\phi _{\ell }$	The $\ell $th eigenfunction of Laplace–Beltrami operator on $\mathcal {M}$
$\lambda _{\ell }$	The $\ell $th eigenvalue of Laplace–Beltrami operator on $\mathcal {M}$
D	Data set of \|D\| pairs of sampling points $\mathbf {x}_i$ and real data $y_i$
$D^*$	Clean data set of pairs of sampling points $\mathbf {x}_i$ and real data $f^(\mathbf {x}_i)$ for ideal function $f^$
\|D\| or N	Number of elements of a data set D
m	Number of servers in distributed filtered hyperinterpolation
$\{D_j\}_{j=1}^m$	Set of m distributed data sets for a data set
$\varLambda _D$	Set of sampling points $\mathbf {x}_i$ of a data set D
$\mathcal {Q}_{D}$	Quadrature rule, a set of N pairs of real weights and points on the manifold
$f^*$	Ideal target function $\mathcal {M}\rightarrow \mathbb {R}$ (noiseless outputs)
f	Noisy function ($f^*$ plus noise)
$\epsilon _i$	Noise for ith sample
H	Filter in Definition 1
$V_n(f)$ or $V_{n,H}(f)$	Filtered approximation of degree n for function f with filter H
$V_{D,n}$	Non-distributed filtered hyperinterpolation with degree n for data D in Definition 4
$V_{D,n}^{(m)}$	Distributed filtered hyperinterpolation with degree n and m servers for data D in Definition 6
$V_{D^*,n}$	Non-distributed filtered hyperinterpolation with degree n for noiseless data $D^*$ in (9)
$V_{D^*,n}^{(m)}$	Distributed filtered hyperinterpolation with degree n and m servers for noiseless data $D^*$ in (16)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Montúfar, G., Wang, Y.G. Distributed Learning via Filtered Hyperinterpolation on Manifolds. Found Comput Math 22, 1219–1271 (2022). https://doi.org/10.1007/s10208-021-09529-5

Download citation

Received: 16 July 2020
Revised: 02 April 2021
Accepted: 31 May 2021
Published: 12 July 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10208-021-09529-5

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Symbol	Meaning
\(\mathbb {N}\)	Set of natural numbers \(\{1,2,\dots \}\)
\(\mathbb {N}_{0}\)	\(\mathbb {N}\cup \{0\}\)
\(\mathbb {R}^{d}\)	d-dimensional real coordinate space
\(\mathbb {R}_{+}\)	Set of non-negative real numbers
\(\mathcal {M}\)	Compact and smooth Riemannian manifold, where we call \(\mathcal {M}\) d-manifold
d	Dimension of manifold \(\mathcal {M}\)
\(B(\mathbf {x},\alpha )\)	Ball with center \(\mathbf {x}\) and radius \(\alpha \) in manifold
\(\rho (\mathbf {x},\mathbf {y})\)	Distance between points \(\mathbf {x},\mathbf {y}\in \mathcal {M}\) induced by Riemannian metric
\(\mu \)	Lebesgue measure on \(\mathcal {M}\)
\(C(\mathcal {M})\)	Continuous function space on \(\mathcal {M}\)
\(L_p(\mathcal {M})\)	Real-valued \(L_p\) space on \(\mathcal {M}\)
r	Smoothness index of Sobolev space containing the target function
\(W^r_p(\mathcal {M})\)	Sobolev space on \(\mathcal {M}\) with smoothness r and p-th norm
n	Degree of polynomial or polynomial space on \(\mathcal {M}\)
\(\varPi _n\)	Polynomial space of degree n on \(\mathcal {M}\)
\(P_n\)	Diffusion polynomial of degree n on \(\mathcal {M}\)
\(E_n(f)_p\)	Best approximation for f in \(L_p(\mathcal {M})\)
\(\varDelta \)	Laplace–Beltrami operator on \(\mathcal {M}\)
\(\phi _{\ell }\)	The \(\ell \)th eigenfunction of Laplace–Beltrami operator on \(\mathcal {M}\)
\(\lambda _{\ell }\)	The \(\ell \)th eigenvalue of Laplace–Beltrami operator on \(\mathcal {M}\)
D	Data set of \|D\| pairs of sampling points \(\mathbf {x}_i\) and real data \(y_i\)
\(D^*\)	Clean data set of pairs of sampling points \(\mathbf {x}_i\) and real data \(f^(\mathbf {x}_i)\) for ideal function \(f^\)
\|D\| or N	Number of elements of a data set D
m	Number of servers in distributed filtered hyperinterpolation
\(\{D_j\}_{j=1}^m\)	Set of m distributed data sets for a data set
\(\varLambda _D\)	Set of sampling points \(\mathbf {x}_i\) of a data set D
\(\mathcal {Q}_{D}\)	Quadrature rule, a set of N pairs of real weights and points on the manifold
\(f^*\)	Ideal target function \(\mathcal {M}\rightarrow \mathbb {R}\) (noiseless outputs)
f	Noisy function (\(f^*\) plus noise)
\(\epsilon _i\)	Noise for ith sample
H	Filter in Definition 1
\(V_n(f)\) or \(V_{n,H}(f)\)	Filtered approximation of degree n for function f with filter H
\(V_{D,n}\)	Non-distributed filtered hyperinterpolation with degree n for data D in Definition 4
\(V_{D,n}^{(m)}\)	Distributed filtered hyperinterpolation with degree n and m servers for data D in Definition 6
\(V_{D^*,n}\)	Non-distributed filtered hyperinterpolation with degree n for noiseless data \(D^*\) in (9)
\(V_{D^*,n}^{(m)}\)	Distributed filtered hyperinterpolation with degree n and m servers for noiseless data \(D^*\) in (16)

Distributed Learning via Filtered Hyperinterpolation on Manifolds

Abstract

Similar content being viewed by others

Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next

Bolstering stochastic gradient descent with model building

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

1 Introduction

2 Preliminaries on Approximation on Manifolds

2.1 Diffusion Polynomial Space

Assumption 1

Assumption 2

2.2 Generalized Sobolev Spaces

Lemma 1

2.3 Filtered Approximation on Manifolds

Definition 1

Definition 2

Definition 3

Lemma 2

Lemma 3

Remark 1

Theorem 3

Theorem 4

Remark 2

Lemma 4

Theorem 5

3 Non-Distributed Filtered Hyperinterpolation on Manifolds

Definition 4

Remark 3

3.1 Non-Distributed Filtered Hyperinterpolation for Clean Data

Lemma 5

Theorem 6

Lemma 6

Definition 5

Theorem 7

3.2 Non-Distributed Filtered Hyperinterpolation for Noisy Data

Assumption 8

Remark 4

Theorem 9

Remark 5

Theorem 10

4 Distributed Filtered Hyperinterpolation on Manifolds

Definition 6

4.1 Distributed Filtered hyperinterpolation for Clean Data

Theorem 11

Remark 6

Definition 7

Theorem 12

4.2 Distributed Filtered Hyperinterpolation for Noisy Data

Theorem 13

Remark 7

Theorem 14

Remark 8

5 Examples and Numerical Evaluation

6 Discussion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proofs

1.1 Proofs for Section 3.1

Proof

Proof

Lemma 7

Remark 9

Lemma 8

Lemma 9

Proof

Proof

Lemma 10

Proof

Proof

Proof

1.2 Proofs for Section 3.2