Instances of computational optimal recovery: Refined approximability models

doi:10.1016/j.jco.2020.101503

Journal of Complexity

Volume 62, February 2021, 101503

https://doi.org/10.1016/j.jco.2020.101503 Get rights and content

Abstract

Models based on approximation capabilities have recently been studied in the context of Optimal Recovery. These models, however, are not compatible with overparametrization, since model- and data-consistent functions could then be unbounded. This drawback motivates the introduction of refined approximability models featuring an added boundedness condition. Thus, two new models are proposed in this article: one where the boundedness applies to the target functions (first type) and one where the boundedness applies to the approximants (second type). For both types of models, optimal maps for the recovery of linear functionals are first described on an abstract level before their efficient constructions are addressed. By exploiting techniques from semidefinite programming, these constructions are explicitly carried out on a common example involving polynomial subspaces of $C [- 1, 1]$ .

Introduction

The objective of this article is to uncover practical methods for the optimal recovery of functions available through observational data when the underlying models based on approximability allow for overparametrization. To clarify this objective and its various challenges, we start with some background on traditional Optimal Recovery. Typically, an unknown function $f$ defined on a domain $D$ is observed through point evaluations $y_{1} = f (x_{1}), \dots, y_{m} = f (x_{m})$ at distinct points $x_{1}, \dots, x_{m} \in D$ . More generally, an unknown object $f$ , simply considered as an element of a normed space $X$ , is observed through $y_{i} = ℓ_{i} (f), i \in [1 : m],$ where $ℓ_{1}, \dots, ℓ_{m}$ are linear functionals defined on $X$ . We assume here that these data are perfectly accurate — we refer to the companion article [5] for the incorporation of observation error. The data is summarized as $y = L (f)$ , where the linear map $L : X \to R^{m}$ is called observation operator. Based on the knowledge of $y \in R^{m}$ , the task is then to recover a quantity of interest $Q (f)$ , where throughout this article $Q : X \to R$ is assumed to be a linear functional. The recovery procedure can be viewed as a map $R$ from $R^{m}$ to $R$ , with no concern for its practicability at this point.

Besides the observational data (which is also called a posteriori information), there is some a priori information coming from an educated belief about the properties of realistic $f$ ’s. It translates into the assumption that $f$ belongs to a model set $K \subseteq X$ . The choice of this model set is of course critical. When the $f$ ’s indeed represent functions, it is traditionally taken as the unit ball with respect to some norm that characterizes smoothness. More recently, motivated by parametric partial differential equations, a model based on approximation capabilities has been proposed in [2]. Namely, given a linear subspace $V$ of $X$ and a threshold $ε > 0$ , it is defined as $K = K_{V, ε} ≔ {f \in X : {dist}_{X} (f, V) \leq ε} .$ This model set is also implicit in many numerical procedures and in machine learning.

Whatever the selected model set, the performance of the recovery procedure $R : R^{m} \to R$ is measured in a worst-case setting via the (global) error of $R$ over $K$ , i.e., $e_{K, Q} (L, R) ≔ sup_{f \in K} | Q (f) - R (L (f)) | .$ Obviously, one is interested in optimal recovery maps $R^{opt} : R^{m} \to R$ minimizing this worst-case error, i.e., such that $e_{K, Q} (L, R^{opt}) = inf_{R : R^{m} \to R} e_{K, Q} (L, R) .$ This infimum is called the intrinsic error of the observation map $L$ (for $Q$ over $K$ ). It is known, at least since Smolyak’s doctoral dissertation [13], that there is a linear functional among the optimal recovery maps as soon as the set $K$ is symmetric and convex, see e.g. [10, Theorem 4.7] for a proof. The practicality of such a linear optimal recovery map is not automatic, though. For the approximability set (2), Theorem 3.1 of [4] revealed that such a linear optimal recovery map takes the form $R^{opt} : y \in R^{m} \mapsto \sum_{i = 1}^{m} a_{i}^{opt} y_{i}$ , where $a^{opt} \in R^{m}$ is a solution to $\underset{a \in R^{m}}{minimize} ‖ Q - \sum_{i = 1}^{m} a_{i} {ℓ_{i} ‖}_{X^{*}} subject to \sum_{i = 1}^{m} a_{i} ℓ_{i} (v) = Q (v) for all v \in V,$ an optimization problem that can be solved for $X = C (D)$ in exact form when the observation functionals are point evaluations (see [4]) and in approximate form when they are arbitrary linear functionals (see [5] or Section 3.2).

The approximability set (2), however, presents some important restrictions. Suppose indeed that there is some nonzero $v \in ker (L) \cap V$ . Then, for a given $f_{0} \in K$ observed through $y = L (f_{0}) \in R^{m}$ , any $f_{t} ≔ f_{0} + t v$ , $t \in R$ , is both model-consistent (i.e., $f_{t} \in K$ ) and data-consistent (i.e., $L (f_{t}) = y$ ), so that the local error at $y$ of any recovery map $R : R^{m} \to R$ satisfies $e_{K, Q}^{loc} (L, R (y)) ≔ sup_{\binom{f \in K}{L (f) = y}} | Q (f) - R (y) | \geq sup_{t \in R} | Q (f_{t}) - R (y) | = sup_{t \in R} | (Q (f_{0}) - R (y)) + t Q (v) |,$ which is infinite whenever $Q (v) \neq 0$ . Thus, for the optimal recovery problem to make sense under the approximability model (2), independently of the quantity of interest $Q$ , one must assume that $ker (L) \cap V = {0}$ . By a dimension argument, this imposes $n ≔ dim (V) \leq m .$ In other words, we must place ourselves in an underparametrized regime for which the number $n$ of parameters describing the model does not exceed the number $m$ of data. This contrasts with many current studies, especially in the field of Deep Learning, which emphasize the advantages of overparametrization. In order to incorporate overparametrization in the optimal recovery problem under consideration, we must then restrict the magnitude of model- and data-consistent elements. A glaring strategy consists in altering the approximability set (2). We do so in two different ways, namely by considering a bounded approximability set of the first type, i.e., $K = K_{V, ε, κ}^{I} ≔ {f \in X : {dist}_{X} (f, V) \leq ε and {‖ f ‖}_{X} \leq κ},$ and a bounded approximability set of the second type, i.e., $K = K_{V, ε, κ}^{II} ≔ {f \in X : \exists v \in V with {‖ f - v ‖}_{X} \leq ε and {‖ v ‖}_{X} \leq κ} .$ We will start by analyzing the second type of bounded approximability sets in Section 2 by formally describing the optimal recovery maps before revealing on a familiar example how the associated minimization problem is tackled in practice. The main ingredient in essence belongs to the sum-of-squares techniques from semidefinite programming. Next, we will analyze the first type of bounded approximability sets in Section 3. We will even formally describe optimal recovery maps over more general model sets consisting of intersections of approximability sets. On the prior example, we will again reveal how the associated minimization problem is tackled in practice. This time, the main ingredient in essence belongs to the moment techniques from semidefinite programming. In view of this article’s emphasis on computability issues, all of the theoretical constructions are illustrated in a reproducible matlab file downloadable from the author’s webpage.

Before delving into technicalities, we stress that this article considers a scenario where $L$ is fixed, meaning that the user is not free to select favorable observation functionals $ℓ_{1}, \dots, ℓ_{m}$ . As such, we are not concerned with the minimal number $m$ of observations needed to achieve a prescribed accuracy. In other words, we do not investigate the complexity of the problem. But replacing $K_{V, ε}$ by $K_{V, ε, κ}^{I}$ or $K_{V, ε, κ}^{II}$ is actually akin to a strategy which is popular in complexity studies: substituting a refined model set for a model set that was initially too rich. This strategy can transform an intractable multivariate problem into a tractable one, see [12] for a typical example. For our refined approximability sets, complexity inquiries would be interesting to pursue, especially in order to uncover more qualitative statements on the benefits (or lack thereof) of overparametrization. This would require, for a start, to precisely estimate the worst-case error (4) minimized over $L$ , and in particular its dependence on the number of variables of multivariate functions $f \in X$ . However, this is beyond the scope of this article, whose focus is placed on the computability of the optimal recovery maps.

Section snippets

Bounded approximability set of the second type

We concentrate in this section on the bounded approximability set of the second type, i.e., on $K = {f \in X : \exists v \in V with {‖ f - v ‖}_{X} \leq ε and {‖ v ‖}_{X} \leq κ} .$ We shall first describe optimal recovery maps before showing how they can be computed in practice.

Bounded approximability set of the first type

We concentrate in this section on the bounded approximability set of the first type, i.e., on $K = {f \in X : {dist}_{X} (f, V) \leq ε and {‖ f ‖}_{X} \leq κ} .$ Once again, we shall first describe optimal recovery maps before showing how they can be computed in practice.

References (13)

SloanI.H. et al.
When are quasi-Monte Carlo algorithms efficient for high dimensional integrals?
J. Complexity
(1998)
Ben-TalA. et al.
Robust Optimization
(2009)
BinevP. et al.
Data assimilation in reduced modeling
SIAM/ASA J. Uncertain. Quantif.
(2017)
BoydS. et al.
Convex Optimization
(2004)
DeVoreR. et al.
Computing a quantity of interest from observational data
Constr. Approx.
(2019)
M. Ettehad, S. Foucart, Instances of computational optimal recovery: dealing with observation errors....

There are more references available in the full text version of this article.

Cited by (6)

Radius of information for two intersected centered hyperellipsoids and implications in optimal recovery from inaccurate data
2024, Journal of Complexity
For objects belonging to a known model set and observed through a prescribed linear process, we aim at determining methods to recover linear quantities of these objects that are optimal from a worst-case perspective. Working in a Hilbert setting, we show that, if the model set is the intersection of two hyperellipsoids centered at the origin, then there is an optimal recovery method which is linear. It is specifically given by a constrained regularization procedure whose parameters can be precomputed by semidefinite programming. This general framework can be applied to several scenarios, including the two-space problem and problems involving $ℓ_{2}$ -inaccurate data. It can also be applied to the problem of recovery from $ℓ_{1}$ -inaccurate data. For the latter, we reach the conclusion of existence of an optimal recovery method which is linear, again given by constrained regularization, under a computationally verifiable sufficient condition.
Radius of Information for Two Intersected Centered Hyperellipsoids and Implications in Optimal Recovery from Inaccurate Data
2024, arXiv
Full recovery from point values: an optimal algorithm for Chebyshev approximability prior
2023, Advances in Computational Mathematics
Full Recovery from Point Values: an Optimal Algorithm for Chebyshev Approximability Prior
2022, arXiv
Learning from non-random data in Hilbert spaces: an optimal recovery perspective
2022, Sampling Theory, Signal Processing, and Data Analysis
Instances of Computational Optimal Recovery: Dealing with Observation Errors
2021, SIAM-ASA Journal on Uncertainty Quantification

^☆: Communicated by E. Novak.

¹: S.F. is partially supported by NSF, United States of America grants DMS-1622134 and DMS-1664803, and also acknowledges the NSF, United States of America grant CCF-1934904.

View full text

Instances of computational optimal recovery: Refined approximability models☆

Abstract

Introduction

Section snippets

Bounded approximability set of the second type

Bounded approximability set of the first type

J. Complexity

Robust Optimization

Data assimilation in reduced modeling

SIAM/ASA J. Uncertain. Quantif.

Convex Optimization

Computing a quantity of interest from observational data

Constr. Approx.