Skip to main content
Log in

Approximability models and optimal system identification

  • Original Article
  • Published:
Mathematics of Control, Signals, and Systems Aims and scope Submit manuscript

Abstract

This article considers the problem of optimally recovering stable linear time-invariant systems observed via linear measurements made on their transfer functions. A common modeling assumption is replaced here by the related assumption that the transfer functions belong to a model set described by approximation capabilities. Capitalizing on recent optimal recovery results relative to such approximability models, we construct some optimal algorithms and characterize the optimal performance for the identification and evaluation of transfer functions in the framework of the Hardy Hilbert space and of the disk algebra. In particular, we determine explicitly the optimal recovery performance for frequency measurements taken at equispaced points on an inner circle or on the torus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In the case \({\mathcal {K}}= {\mathcal {H}}_2({\mathbb {D}})\), the optimal algorithm over an ellipsoidal model set such as the one described by (9) is linear, too. It is a variant of the minimal-norm interpolation presented in Section 5.3. of [13].

  2. To compute the \({\mathcal {H}}_2\)-error between functions \(F = \sum _{j=1}^\infty b_j V_j\) and \(\widetilde{F} = \sum _{j=1}^n c_j V_j + \sum _{k=1}^m d_k L_k\), we used the fact that \(\Vert F -\widetilde{F} \Vert _{{\mathcal {H}}_2}^2 = \Vert F\Vert _{{\mathcal {H}}_2}^2 + \Vert \widetilde{F}\Vert _{{\mathcal {H}}_2}^2 - 2 {\text {Re}}\langle F, \widetilde{F} \rangle \), together with \(\Vert \widetilde{F}\Vert _{{\mathcal {H}}_2}^2 = \Vert c\Vert _2^2 + \langle d, H d \rangle + 2 {\text {Re}}\langle c, G d \rangle \) and \(\langle F, \widetilde{F} \rangle = \langle b_{1:n}, c \rangle + \sum _{k=1}^m {\overline{d_k}} \ell _k(F)\).

References

  1. Akcay H, Gu G, Khargonekar PP (1994) Identification in \({\cal{H}}_{\infty }\) with nonuniformly spaced frequency response measurements. Int J Robust Nonlinear Control 4(4):613–629

    Article  MathSciNet  Google Scholar 

  2. Binev P, Cohen A, Dahmen W, DeVore R, Petrova G, Wojtaszczyk P (2017) Data assimilation in reduced modeling. SIAM/ASA J Uncertain Quantif 5(1):1–29

    Article  MathSciNet  Google Scholar 

  3. Bokor J, Schipp F, Gianone L (1995) Approximate \({\cal{H}}_{\infty }\) identification using partial sum operators in disc algebra basis. In: Proceedings of American control conference, pp 1981–1985

  4. Campi MC, Weyer E (2002) Finite sample properties of system identification methods. IEEE Trans Autom Control 47(8):1329–1334

    Article  MathSciNet  Google Scholar 

  5. Chen J, Gu G (2000) Control-oriented system identification: an \({\cal{H}}_\infty \) approach, vol 19. Wiley, Hoboken

    Google Scholar 

  6. Chen J, Nett CN (1993) The Caratheodory–Fejer problem and \({\cal{H}}_{\infty }\) identification: a time domain approach. In: Proceedings of the 32nd IEEE conference on decision and control, pp 68–73

  7. Chen J, Nett CN, Fan MKH (1992) Worst-case system identification in \({\cal{H}}_{\infty }\): validation of a priori information, essentially optimal algorithms, and error bounds. In: Proceedings of American control conference, pp 251–257

  8. Chen J, Nett CN, Fan MKH (1992) Optimal non-parametric system identification from arbitrary corrupt finite time series: a control-oriented approach. In: Proceedings of American control conference, pp 279–285

  9. DeVore R, Foucart S, Petrova G, Wojtaszczyk P (2019) Computing a quantity of interest from observational data. Constr Approx 49(3):461–508

    Article  MathSciNet  Google Scholar 

  10. DeVore R, Petrova G, Wojtaszczyk P (2017) Data assimilation and sampling in Banach spaces. Calcolo 54(3):963–1007

    Article  MathSciNet  Google Scholar 

  11. Helmicki AJ, Jacobson CA, Nett CN (1991) Control oriented system identification: a worst-case/deterministic approach in \({{\cal{H}}}_\infty \). IEEE Trans Autom Control 36(10):1163–1176

    Article  MathSciNet  Google Scholar 

  12. Micchelli C, Rivlin T (1977) A survey of optimal recovery. Optimal estimation in approximation theory. Springer, Boston, pp 1–54

    Chapter  Google Scholar 

  13. Partington JR (1997) Interpolation, identification, and sampling. Oxford University Press, Oxford

    MATH  Google Scholar 

  14. Rakhmanov E, Shekhtman B (2006) On discrete norms of polynomials. J Approx Theory 139(1–2):2–7

    Article  MathSciNet  Google Scholar 

  15. Shah P, Bhaskar BN, Tang G, Recht B (2012) Linear system identification via atomic norm regularization. arXiv preprint arXiv:1204.0590

  16. Tu S, Boczar R, Packard A, Recht B (2017) Non-asymptotic analysis of robust control from coarse-grained identification. arXiv preprint arXiv:1707.04791

  17. Vidyasagar M, Karandikar RL (2008) A learning theory approach to system identification and stochastic adaptive control. J Process Control 18(3):421–430

    Article  Google Scholar 

  18. Zhou K, Doyle JC (1998) Essentials of robust control. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Foucart.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Simon Foucart is partially supported by NSF Grants DMS-1622134 and DMS-1664803, and also acknowledges the NSF Grant CCF-1934904.

Appendix: Proofs of essential results

Appendix: Proofs of essential results

In this section, we fully justify some statements appearing in the text but not yet established, namely the relation between (9)–(10) and (11), as well as the validity in the complex setting of results about optimal identification in Hilbert spaces [2] and optimal estimation in Banach spaces [9]. We start with how (11) connects to the descriptions (9)–(10) of the models put forward in [11].

Proposition 6

With \({\mathcal {X}}\) denoting either \({\mathcal {H}}_2({\mathbb {D}})\) or \({\mathcal {A}}({\mathbb {D}})\), the following properties are equivalent:

$$\begin{aligned}&\text{ there } \text{ exist } \rho>1 \text{ and } M>0 \text{ such } \text{ that } \Vert F(\rho \, \cdot )\Vert _{{\mathcal {X}}} \le M; \end{aligned}$$
(74)
$$\begin{aligned}&\text{ there } \text{ exist } \rho>1 \text{ and } M>0 \text{ such } \text{ that } {\mathrm{dist}}_{{\mathcal {X}}}(F,{\mathcal {P}}_n) \le M \rho ^{-n} \text{ for } \text{ all } n \ge 0. \end{aligned}$$
(75)

Proof

We write \(F(z) = \sum _{n=0}^\infty f_n z^n\) throughout the proof. We first establish the equivalence in the case \({\mathcal {X}}= {\mathcal {H}}_2({\mathbb {D}})\). Let us assume that (74) holds, i.e., that \(\sum _{n=0}^\infty |f_n|^2 \rho ^{2n} \le M^2\) for some \(\rho >1\) and \(M>0\). In particular, we have \(|f_n|^2 \le M^2 \rho ^{-2n}\) for all \(n \ge 0\). It follows that, for all \(n \ge 0\),

$$\begin{aligned} {\mathrm{dist}}_{{\mathcal {H}}_2}(F,{\mathcal {P}}_n)^2 = \sum _{k=n}^\infty |f_k|^2 \le M^2 \sum _{k=n}^\infty \rho ^{-2k} = \frac{M^2}{1-\rho ^{-2}} \rho ^{-2n}; \end{aligned}$$
(76)

hence, (75) holds with a change in the constant M. Conversely, let us assume that (75) holds, i.e., that there are \(\rho > 1\) and \(M>0\) such that \(\sum _{k=n}^\infty |f_k|^2 \le M^2 \rho ^{-2n}\) for all \(n \ge 0\). In particular, we have \(|f_n|^2 \le M^2 \rho ^{-2n}\) for all \(n \ge 0\). Then, picking \(\widetilde{\rho } \in (1,\rho )\), we derive that

$$\begin{aligned} \sum _{n=0}^\infty |f_n|^2 {\widetilde{\rho }\,}^{2n} \le M^2 \sum _{n=0}^\infty (\widetilde{\rho }/\rho )^{2n} = \frac{M^2}{1-(\widetilde{\rho }/\rho )^2}; \end{aligned}$$
(77)

hence, (74) holds with a change in both \(\rho \) and M.

We now establish the equivalence in the case \({\mathcal {X}}= {\mathcal {A}}({\mathbb {D}})\). Let us assume that (74) holds, i.e., that \(\sup _{|z| = \rho } |F(z)| \le M\) for some \(\rho > 1\) and \(M>0\). This implies that the Taylor coefficients of F satisfy, for any \(k \ge 0\),

$$\begin{aligned} |f_k| = \bigg | \frac{1}{2\pi i} \int _{|z|=\rho } \frac{F(z)}{z^{k+1}} dz \bigg | \le \frac{1}{2 \pi }\times \frac{M}{\rho ^{k+1}} \times 2 \pi \rho = M \rho ^{-k}. \end{aligned}$$
(78)

Considering \(P \in {\mathcal {P}}_n\) defined by \(P(z):= \sum _{k=0}^{n-1} f_k z^k\), we obtain

$$\begin{aligned} {\mathrm{dist}}_{{\mathcal {X}}}(F,{\mathcal {P}}_n)\le & {} \Vert F-P\Vert _{{\mathcal {H}}_\infty } = \sup _{|z|=1} \bigg | \sum _{k=n}^\infty f_k z^k \bigg | \le \sum _{k=n}^\infty |f_k| \nonumber \\\le & {} M \sum _{k=n}^\infty \rho ^{-k} = \frac{M}{1-\rho } \rho ^{-n}; \end{aligned}$$
(79)

hence, (75) holds with a change in the constant M. Conversely, let us assume that (75) holds, i.e., that there are \(\rho > 1\) and \(M>0\) such that there exists, for each \(n\ge 0\), a polynomial \(P^{[n]} \in {\mathcal {P}}_n\) with \(\Vert F-P^{[n]}\Vert _{{\mathcal {H}}_\infty } \le M \rho ^{-n}\). For all \(n \ge 0\), since the coefficients in \(z^n\) of F are the same as that of \(F-P^{[n]}\), we have

$$\begin{aligned} |f_n|= & {} \bigg | \frac{1}{2 \pi i} \int _{|z|=1} \frac{(F-P^{[n]})(z)}{z^{n+1}} dz \bigg | \nonumber \\\le & {} \frac{1}{2 \pi } \times \Vert F-P^{[n]}\Vert _{{\mathcal {H}}_\infty } \times 2 \pi \le M \rho ^{-n}. \end{aligned}$$
(80)

Then, picking \(\widetilde{\rho } \in (1,\rho )\), we derive that

$$\begin{aligned} \sup _{|z|=\widetilde{\rho }} |F(z)| \le \sum _{n=0}^\infty |f_n| {\widetilde{\rho }\,}^n \le M \sum _{n=0}^\infty (\widetilde{\rho }/\rho )^n = \frac{M}{1-\widetilde{\rho }/\rho }; \end{aligned}$$
(81)

hence, (74) holds with a change in both \(\rho \) and M. \(\square \)

We now turn to the justification for the complex setting of the results from [2] about optimal identification in Hilbert spaces. As in [2, Theorem 2.8], these results are easy consequences of the following statement.

Theorem 7

Let \({\mathcal {V}}\) be a subspace of a Hilbert space \({\mathcal {X}}\), and let \(\ell _1,\ldots ,\ell _m\) be linear functionals defined on \({\mathcal {X}}\). With a model set given by

$$\begin{aligned} {\mathcal {K}}= \{ f \in {\mathcal {X}}: {\mathrm{dist}}_{\mathcal {X}}(f,{\mathcal {V}}) \le \varepsilon \}, \end{aligned}$$
(82)

the performance of optimal identification from some \(y \in {\mathbb {C}}^m\) satisfies

$$\begin{aligned} \inf _{A: {\mathbb {C}}^m \rightarrow {\mathcal {X}}} \sup _{f \in {\mathcal {K}}\cap {\mathcal {L}}^{-1}(y)} \Vert f - A(y)\Vert _{\mathcal {X}}= \mu \left( \varepsilon ^2 - \min _{f \in {\mathcal {L}}^{-1}(y)} \Vert f - P_{\mathcal {V}}f\Vert _{\mathcal {X}}^2 \right) ^{1/2}, \end{aligned}$$
(83)

where the constant \(\mu \) is defined by

$$\begin{aligned} \mu = \sup _{u \in \ker ({\mathcal {L}})} \frac{\Vert u\Vert _{\mathcal {X}}}{{\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}})}. \end{aligned}$$
(84)

Proof

Let \(f^\star \in {\mathcal {X}}\) be constructed from \(y \in {\mathbb {C}}^m\) via \(f^\star := \underset{f \in {\mathcal {X}}}{{\mathrm{argmin}}\,} \Vert f - P_{\mathcal {V}}f\Vert _{\mathcal {X}}\) subject to \({\mathcal {L}}(f) = y\). We shall prove on the one hand that

$$\begin{aligned} \sup _{f \in {\mathcal {K}}\cap {\mathcal {L}}^{-1}(y)} \Vert f - f^\star \Vert _{\mathcal {X}}\le \mu \left( \varepsilon ^2 - \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \right) ^{1/2} \end{aligned}$$
(85)

and on the other hand that, for any \(g \in {\mathcal {X}}\),

$$\begin{aligned} \sup _{f \in {\mathcal {K}}\cap {\mathcal {L}}^{-1}(y)} \Vert f - g\Vert _{\mathcal {X}}\ge \mu \left( \varepsilon ^2 - \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \right) ^{1/2}. \end{aligned}$$
(86)

Justification of (85): Let us point out that \(f^\star - P_{\mathcal {V}}f^\star \) is orthogonal to both \({\mathcal {V}}\) and \(\ker ({\mathcal {L}})\). To see this, given \(v \in {\mathcal {V}}\), \(u \in \ker ({\mathcal {L}})\), and \(\theta \in [-\pi ,\pi ]\), we notice that, as functions of \(t \in {\mathbb {R}}\), the expressions

$$\begin{aligned} \Vert f^\star - P_{\mathcal {V}}f^\star + t e^{i \theta } v\Vert _{\mathcal {X}}^2&= \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \nonumber \\&\quad + 2 t {\text {Re}}( e^{- i \theta } \langle f^\star - P_{\mathcal {V}}f^\star , v \rangle ) + {\mathcal {O}}(t^2),\end{aligned}$$
(87)
$$\begin{aligned} \Vert f^\star + t e^{i \theta } u - P_{\mathcal {V}}( f^\star + t e^{ i \theta } u)\Vert _{\mathcal {X}}^2&= \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \nonumber \\&\quad + 2 t {\text {Re}}( e^{-i \theta } \langle f^\star - P_{\mathcal {V}}f^\star , u - P_{\mathcal {V}}u \rangle ) + {\mathcal {O}}(t^2), \end{aligned}$$
(88)

are minimized at \(t=0\). Therefore, \({\text {Re}}( e^{- i \theta } \langle f^\star - P_{\mathcal {V}}f^\star , v \rangle ) =0\) and \({\text {Re}}( e^{- i \theta } \langle f^\star - P_{\mathcal {V}}f^\star , u - P_{\mathcal {V}}u \rangle ) = 0\) for all \(\theta \in [-\pi ,\pi ]\). This implies that \(\langle f^\star - P_{\mathcal {V}}f^\star , v \rangle =0\) and \(\langle f^\star - P_{\mathcal {V}}f^\star , u - P_{\mathcal {V}}u \rangle =0\) for all \(v \in {\mathcal {V}}\) and \(u \in \ker ({\mathcal {L}})\), hence our claim. Now, consider \(f \in {\mathcal {K}}\cap {\mathcal {L}}^{-1}(y)\). Since \({\mathcal {L}}(f) = y = {\mathcal {L}}(f^\star )\), we can write \(f = f^\star + u \) for some \(u \in \ker ({\mathcal {L}})\). The fact that \(f \in {\mathcal {K}}\) then yields

$$\begin{aligned} \varepsilon ^2\ge & {} \Vert f-P_{\mathcal {V}}f\Vert _{\mathcal {X}}^2 = \Vert f^\star -P_{\mathcal {V}}f^\star + u - P_{\mathcal {V}}u\Vert _{\mathcal {X}}^2 \nonumber \\= & {} \Vert f^\star -P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 + \Vert u - P_{\mathcal {V}}u\Vert _{\mathcal {X}}^2, \end{aligned}$$
(89)

so that

$$\begin{aligned} {\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}}) = \Vert u - P_{\mathcal {V}}u\Vert _{\mathcal {X}}\le \left( \varepsilon ^2 - \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \right) ^{1/2}. \end{aligned}$$
(90)

It remains to take the definition of \(\mu \) into account to obtain

$$\begin{aligned} \Vert f - f^\star \Vert _{\mathcal {X}}= \Vert u\Vert _{\mathcal {X}}\le \mu \, {\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}}) \le \mu \, \left( \varepsilon ^2 - \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \right) ^{1/2}. \end{aligned}$$
(91)

Justification of (86): Let us select \(u \in \ker ({\mathcal {L}})\) such that

$$\begin{aligned} \Vert u\Vert _{\mathcal {X}}= \mu \, {\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}}) \qquad \text{ and } \qquad \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 + \Vert u - P_{\mathcal {V}}u\Vert _{\mathcal {X}}^2 = \varepsilon ^2. \end{aligned}$$
(92)

We now consider \(f^\pm := f^\star \pm u\). It is clear that \(f^\pm \in {\mathcal {L}}^{-1}(y)\), and we also have \(f^\pm \in {\mathcal {K}}\), since

$$\begin{aligned} \Vert f^\pm - P_{\mathcal {V}}f^\pm \Vert _{\mathcal {X}}^2= & {} \Vert (f^\star - P_{\mathcal {V}}f^\star ) \pm (u - P_{\mathcal {V}}u ) \Vert _{\mathcal {X}}^2 \nonumber \\= & {} \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 + \Vert u - P_{\mathcal {V}}u \Vert _{\mathcal {X}}^2 = \varepsilon ^2. \end{aligned}$$
(93)

Then, for any \(g \in {\mathcal {X}}\),

$$\begin{aligned} \sup _{f \in {\mathcal {K}}\cap {\mathcal {L}}^{-1}(y)} \Vert f - g\Vert _{\mathcal {X}}&\ge \max _{\pm } \Vert f^\pm - g \Vert _{\mathcal {X}}\ge \frac{1}{2} \left( \Vert f^+ - g\Vert _{\mathcal {X}}+ \Vert f^- - g\Vert _{\mathcal {X}}\right) \nonumber \\&\ge \frac{1}{2} \Vert f^+ - f^- \Vert _{\mathcal {X}}\nonumber \\&= \Vert u\Vert _{\mathcal {X}}= \mu \, {\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}}) = \mu \, \left( \varepsilon ^2 - \Vert f^\star - P_{\mathcal {V}}f^\star \Vert _{\mathcal {X}}^2 \right) ^{1/2}. \end{aligned}$$
(94)

This completes the proof of the theorem. \(\square \)

Finally, we justify below that the result from [9] about optimal estimation in Banach spaces holds in the complex setting, too.

Theorem 8

Let \({\mathcal {V}}\) be a subspace of a Banach space \({\mathcal {X}}\), let \(\ell _1,\ldots ,\ell _m\) be linear functionals defined on \({\mathcal {X}}\), and let Q be another linear functional defined on \({\mathcal {X}}\). With a model set given by

$$\begin{aligned} {\mathcal {K}}= \{ f \in {\mathcal {X}}: {\mathrm{dist}}_{\mathcal {X}}(f,{\mathcal {V}}) \le \varepsilon \}, \end{aligned}$$
(95)

the performance of optimal estimation of Q satisfies

$$\begin{aligned} \inf _{A: {\mathbb {C}}^m \rightarrow {\mathbb {C}}} \sup _{f \in {\mathcal {K}}} \left| Q(f) - A({\mathcal {L}}(f)) \right| = \mu \, \varepsilon , \end{aligned}$$
(96)

where the constant \(\mu \) equals the minimum of the optimization problem

$$\begin{aligned}&\underset{a \in {\mathbb {C}}^m}{\mathrm{minimize}}\, \bigg \Vert Q - \sum _{k=1}^m a_k \ell _k \bigg \Vert _{{\mathcal {X}}^*} \qquad \text{ subject } \text{ to } \qquad \sum _{k=1}^m a_k \ell _k(v) = Q(v) \nonumber \\&\qquad \text{ for } \text{ all } v \in {\mathcal {V}}. \end{aligned}$$
(97)

Proof

Let \(a^\star \in {\mathbb {C}}^m\) be a minimizer of the optimization program (97), and let \(\nu \) denote the value of the minimum. Let us also consider

$$\begin{aligned} \mu = \sup _{u \in {\mathcal {U}}} \frac{|Q(u)|}{{\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}})}. \end{aligned}$$
(98)

We shall prove on the one hand that

$$\begin{aligned} \sup _{f \in {\mathcal {K}}} \bigg | Q(f) - \sum _{k=1}^m a^\star _k \ell _k(f) \bigg | \le \nu \, \varepsilon , \end{aligned}$$
(99)

and on the other hand that, for any \(A: {\mathbb {C}}^m \rightarrow {\mathbb {C}}\),

$$\begin{aligned} \sup _{f \in {\mathcal {K}}} \left| Q(f) - A({\mathcal {L}}(f)) \right| \ge \mu \, \varepsilon , \end{aligned}$$
(100)

and we shall show as a last step that

$$\begin{aligned} \nu \le \mu . \end{aligned}$$
(101)

Justification of (99): Given \(f \in {\mathcal {K}}\), we select \(v \in {\mathcal {V}}\) such that \(\Vert f-v\Vert _{\mathcal {X}}= {\mathrm{dist}}_{\mathcal {X}}(f,{\mathcal {V}})\). The required inequality follows by noticing that

$$\begin{aligned} \bigg | Q(f) - \sum _{k=1}^m a^\star _k \ell _k(f) \bigg |&= \bigg | Q(f-v) - \sum _{k=1}^m a^\star _k \ell _k(f-v) \bigg | \nonumber \\&\le \bigg \Vert Q- \sum _{k=1}^m a^\star _k \ell _k \bigg \Vert _{{\mathcal {X}}^*} \Vert f-v\Vert _{\mathcal {X}}\nonumber \\&= \nu \, {\mathrm{dist}}_{\mathcal {X}}(f,{\mathcal {V}}) \le \nu \, \varepsilon . \end{aligned}$$
(102)

Justification of (100): Let us select \(u \in \ker ({\mathcal {L}})\) such that

$$\begin{aligned} |Q(u)| = \mu \, {\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}}) \qquad \text{ and } \qquad \, {\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}}) = \varepsilon . \end{aligned}$$
(103)

Then, for any \(A: {\mathbb {C}}^m \rightarrow {\mathbb {C}}\), we have

$$\begin{aligned} \sup _{f \in {\mathcal {K}}} |Q(f) - A({\mathcal {L}}(f))|&\ge \max _\pm |Q(\pm u) - A(0)| \nonumber \\&\ge \frac{1}{2} \left( |Q(u)-A(0)| + |Q(-u)-A(0)| \right) \nonumber \\&\ge \frac{1}{2} |Q(u) - Q(-u)| = |Q(u)| = \mu \, \varepsilon . \end{aligned}$$
(104)

Justification of (101): We assume that \(\ker ({\mathcal {L}}) \cap {\mathcal {V}}= \{ 0\}\), otherwise \(\mu = \infty \) and there is nothing to prove. We consider a linear functional \(\lambda \) defined on \(\ker ({\mathcal {L}}) \oplus {\mathcal {V}}\) by

$$\begin{aligned} \lambda (u) = Q(u) \quad \text{ for } \text{ all } u \in \ker ({\mathcal {L}}) \qquad \text{ and } \qquad \lambda (v) = 0 \quad \text{ for } \text{ all } v \in {\mathcal {V}}. \end{aligned}$$
(105)

We then consider a Hahn–Banach extension \(\widetilde{\lambda }\) of \(\lambda \) defined on \({\mathcal {X}}\). Because \(Q-\widetilde{\lambda }\) vanishes on \(\ker ({\mathcal {L}})\), we can write \(Q-\widetilde{\lambda } = \sum _{k=1}^n \widetilde{a}_k \ell _k\) for some \(\widetilde{a} \in {\mathbb {C}}^m\), and because \(\widetilde{\lambda }\) vanishes on \({\mathcal {V}}\), we have \(\sum _{k=1}^m \widetilde{a}_k \ell _k(v) = Q(v)\) for all \(v \in {\mathcal {V}}\). We therefore derive that

$$\begin{aligned} \nu&\le \bigg \Vert Q - \sum _{k=1}^n \widetilde{a}_k \ell _k \bigg \Vert _{{\mathcal {X}}^*} = \big \Vert \widetilde{\lambda } \big \Vert _{{\mathcal {X}}^*} = \Vert \lambda \Vert _{(\ker ({\mathcal {L}}) \oplus {\mathcal {V}})^*} = \sup _{\begin{array}{c} u \in \ker ({\mathcal {L}})\\ v \in {\mathcal {V}} \end{array}} \frac{| \lambda (u-v) | }{\Vert u-v\Vert _{\mathcal {X}}} \nonumber \\&= \sup _{\begin{array}{c} u \in \ker ({\mathcal {L}})\\ v \in {\mathcal {V}} \end{array}} \frac{| Q(u) |}{\Vert u-v\Vert _{\mathcal {X}}} = \sup _{u \in \ker ({\mathcal {L}})} \frac{|Q(u)|}{{\mathrm{dist}}_{\mathcal {X}}(u,{\mathcal {V}})} = \mu . \end{aligned}$$
(106)

This concludes the proof of the theorem. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ettehad, M., Foucart, S. Approximability models and optimal system identification. Math. Control Signals Syst. 32, 19–41 (2020). https://doi.org/10.1007/s00498-020-00253-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00498-020-00253-z

Keywords

Mathematics Subject Classification

Navigation