1 Introduction

In this work we consider two classical extremum problems for polynomials. The first is very easy to state. Indeed, let us denote the complex polynomials of degree at most n in d complex variables by \(\mathbb {C}_n[z],\) \(z\in \mathbb {C}^d.\) Then for \(K\subset \mathbb {C}^d\) compact and \(z_0\in \mathbb {C}^d\backslash K\) an external point, we say that \(P_n(z)\in \mathbb {C}_n[z]\) has extremal growth relative to K at \(z_0\) if

$$\begin{aligned} P_n={{\,\mathrm{arg\,max}\,}}_{p\in \mathbb {C}_n[z]}\frac{|p(z_0)|}{\Vert p\Vert _K} \end{aligned}$$
(1)

where \(\Vert p\Vert _K\) denotes the sup-norm of p on K. Alternatively, we may normalize p to be 1 at the external point and use

$$\begin{aligned} P_n={{\,\mathrm{arg\,max}\,}}_{p\in \mathbb {C}_n[z],\,\,p(z_0)=1}\frac{1}{\Vert p\Vert _K}. \end{aligned}$$
(2)

We note that for this to be well-defined we require that K be polynomial determining, i.e., if \(p\in \mathbb {C}[z]\) is such that \(p(x)=0\) for all \(x\in K,\) then \(p=0.\) We refer the interested reader to the survey [2] for more about what is known about this problem.

The second problem is from the field of optimal design for polynomial regression. To describe it we reduce to the real case \(K\subset \mathbb {R}^d,\) and note that we may write any \(p\in \mathbb {R}_n[z]\) in the form

$$\begin{aligned} p=\sum _{k=1}^N \theta _k p_k \end{aligned}$$

where \(\mathcal{B}_s:=\{p_1,p_2,\ldots ,p_N\}\) is a basis for \(\mathbb {R}_n[z]\) and \(N:={n+d\atopwithdelims ()d}\) its dimension.

Suppose now that we observe the values of a particular \(p\in \mathbb {R}_n[z]\) at a set of \(m\ge N\) points \(X:=\{x_j\,:\, 1\le j\le m\}\subset K\) with some random errors; i.e., we observe

$$\begin{aligned} y_j =p(x_j)+\epsilon _j,\quad 1\le j\le m \end{aligned}$$

where we assume that the errors \(\epsilon _j\sim N(0,\sigma )\) are independent. In matrix form this becomes

$$\begin{aligned} y= V_n\theta +\epsilon \end{aligned}$$

where \(\theta \in \mathbb {R}^N,\) \(y,\epsilon \in \mathbb {R}^m\) and

$$\begin{aligned} V_n:=\left[ \begin{array}{cccccc} p_1(x_1)&{}p_2(x_1)&{}\cdot &{}\cdot &{}\cdot &{}p_N(x_1) \\ p_1(x_2)&{}p_2(x_2)&{}\cdot &{}\cdot &{}\cdot &{}p_N(x_2) \\ \cdot &{}&{}&{}&{}&{}\cdot \\ \cdot &{}&{}&{}&{}&{}\cdot \\ \cdot &{}&{}&{}&{}&{}\cdot \\ \cdot &{}&{}&{}&{}&{}\cdot \\ \cdot &{}&{}&{}&{}&{}\cdot \\ p_1(x_m)&{}p_2(x_m)&{}\cdot &{}\cdot &{}\cdot &{}p_N(x_m) \end{array}\right] \in \mathbb {R}^{m\times N} \end{aligned}$$

is the associated Vandermonde matrix.

Our assumption on the error vector \(\epsilon \) means that

$$\begin{aligned} \mathrm{cov}(\epsilon )=\sigma ^2I_m\in \mathbb {R}^{m\times m}. \end{aligned}$$

Now, assuming that \(V_n\) is of full rank, the least squares estimate of \(\theta \) is

$$\begin{aligned} \widehat{\theta }:=(V_n^tV_n)^{-1}V_n^ty. \end{aligned}$$

Note that the entries of \(\displaystyle { {1\over m}V_n^tV_n}\) are the discrete inner products of the \(p_i\) with respect to the measure

$$\begin{aligned} \mu _X ={1\over m}\sum _{k=1}^m \delta _{x_k}. \end{aligned}$$
(3)

More specifically,

$$\begin{aligned} {1\over m}V_n^tV_n=G_n(\mu _X) \end{aligned}$$

where

$$\begin{aligned} G_n(\mu ):=\left[ \int _K p_i(x)p_j(x)\mathrm{d}\mu \right] _{1\le i,j\le N}\in \mathbb {R}^{N\times N} \end{aligned}$$
(4)

is the moment, or Gram, matrix of the polynomials \(p_i\) with respect to the measure \(\mu .\)

In general we may consider arbitrary probability measures on K,  setting

$$\begin{aligned} \mathcal{M}(K):=\{\mu \,:\, \mu \,\,\hbox {is a probability measure on}\,\, K\}. \end{aligned}$$

Now if

$$\begin{aligned} {\mathbf{p}}(z)=\left[ \begin{array}{c}p_1(z)\\ p_2(z)\\ \cdot \\ \cdot \\ p_N(z)\end{array}\right] \in \mathbb {R}^N \end{aligned}$$
(5)

then the least squares estimate of the observed polynomial is

$$\begin{aligned} {\mathbf{p}}^t(z)\widehat{\theta }. \end{aligned}$$

We may compute its variance at any point \(z\in \mathbb {R}^d\) to be

$$\begin{aligned} \mathrm{var}({\mathbf{p}}^t(z)\widehat{\theta })= & {} \sigma ^2{\mathbf{p}}^t(z)(V_n^tV_n)^{-1}{\mathbf{p}}(z) \nonumber \\= & {} {1\over m}\sigma ^2 {\mathbf{p}}^t(z)(G_n(\mu _X))^{-1}{\mathbf{p}}(z) \end{aligned}$$
(6)

where \(\mu _X\) is again given by (3). Now, it is easy to verify that for any \(\mu \in \mathcal{M}(K)\) with non-singular Gram matrix,

$$\begin{aligned} {\mathbf{p}}^t(z)(G_n(\mu ))^{-1}{\mathbf{p}}(z)=K_n^\mu (z,z) \end{aligned}$$

where, for \(\{q_1,\ldots ,q_N\}\subset \mathbb {R}_n[z],\) a \(\mu \)-orthonormal basis for \(\mathbb {R}_n[z],\)

$$\begin{aligned} K_n^\mu (w,z):=\sum _{k=1}^N {{q_k(w)}}q_k(z) \end{aligned}$$

is the Bergman kernel for \(\mathbb {R}_n[z]\). The function \(K_n^\mu (z,z)\) is also known as the (reciprocal of) the Christoffel function for \(\mathbb {R}_n[z].\) In particular, we see that the variance (6) is proportional to \(K_n^{\mu _X}(z,z).\)

We may generalize easily to the complex case, \(K\subset \mathbb {C}^d,\) where now the \(p_j\) form a basis for \(\mathbb {C}_n[z]\) and

$$\begin{aligned} G_n(\mu ):=\left[ \int _K { \overline{p_i(z)}{p_j(z)}}\mathrm{d}\mu \right] _{1\le i,j\le N}\in \mathbb {C}^{N\times N}. \end{aligned}$$
(7)

In the case that \(G_n(\mu )\) is non-singular then the kernel is

$$\begin{aligned} K_n^\mu (w,z):=\sum _{k=1}^N {\overline{q_k(w)}}q_k(z) \end{aligned}$$

for \(\{q_1,\ldots ,q_N\}\subset \mathbb {C}_n[z],\) a \(\mu \)-orthonormal basis for \(\mathbb {C}_n[z].\) Then, for an external point \(z_0\in \mathbb {C}^d\backslash K,\) a measure \(\mu _0\in \mathcal{M}(K)\) is said to be an optimal prediction (or extrapolation) measure for \(z_0\) relative to K (of order n) if it minimizes the complex analogue of the variance (6) of the polynomial predictor at \(z_0\); i.e., if

$$\begin{aligned} K_n^{\mu _0}(z_0,z_0)= \inf _{\mu \in \mathcal{M}(K)} K_n^{\mu }(z_0,z_0). \end{aligned}$$
(8)

However, as it turns out (see Example 1.1 below), such optimal prediction measures need not be definite (i.e., the associated Gram matrix need not be non-singular). Hence we need to re-formulate so that indefinite measures are allowed. Indeed, as is well known there is a variational form for \(K_n^\mu (z_0,z_0):\)

$$\begin{aligned} K_n^\mu (z_0,z_0)= & {} \sup _{p\in \mathbb {C}_n[z]}\frac{|p(z_0)|^2}{\int _K |p(z)|^2\mathrm{d}\mu }\nonumber \\= & {} \sup _{p\in \mathbb {C}_n[z],\,\,\, p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu }. \end{aligned}$$
(9)

Note that in the case of an indefinite measure this value may be \(+\infty \). Any polynomial \(P_n^{\mu ,z_0}\in \mathbb {C}_n[z]\) such that

$$\begin{aligned} P_n^{\mu ,z_0}={{\,\mathrm{arg\,max}\,}}_{p\in \mathbb {C}_n[z],\, p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu } \end{aligned}$$
(10)

is said to be a prediction polynomial for \(\mu \) and if \(\mu _0\) is an optimal prediction measure we call \(P_n^{\mu _0,z_0}\) an optimal prediction polynomial. In the case that (9) is \(\infty \) we interpret (10) to mean that the polynomial \(P_n^{\mu _0,z_0}\) is such that \(\int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu =0.\)

Hence, in general, we say that \(\mu _0\in \mathcal{M}(K)\) is an optimal prediction measure for \(z_0\) relative to K if \(\mu _0\) satisfies (8) with \(K_n^\mu \) defined by (9).

We note that if \(\mu \) is definite then

$$\begin{aligned} P_n^{\mu ,z_0}(z)=\frac{K_n^\mu (z_0,z)}{K_n^\mu (z_0,z_0)} \end{aligned}$$

is unique and \(\int _K |P_n^{\mu ,z_0}(z)|^2\mathrm{d}\mu (z)= 1/K_n^\mu (z_0,z_0)\). In the case that \(\mu \) is indefinite then \(P_n^{\mu ,z_0}\) need not be unique.

Example 1.1

Consider \(K=[-1,1]^2\) considered as a subset of \(\mathbb {C}^2\), \(z_0=(2,0),\) \(\mu =\frac{1}{4}\delta _{(-1,0)}+\frac{3}{4}\delta _{(1,0)}\), and degree \(n=1.\) Then it is easy to check that any polynomial of the form \(P_1^{\mu ,z_0}(x,y)=x/2+cy,\) \(c\in \mathbb {C},\) is a prediction polynomial for \(\mu \). We will see in the next section that \(\mu \) is an optimal prediction measure which also shows that optimal prediction measures may be indefinite.

In the univariate case however, optimal prediction polynomials are always definite.

Lemma 1.2

Suppose that \(K\subset \mathbb {C}\) is \(\mathbb {C}_n[z]\) determining and that \(z_0\in \mathbb {C}\backslash K.\) Then any optimal prediction measure \(\mu \) is definite; i.e., the Gram matrix \(G_n(\mu )\) is non-singular.

Proof

If the support of a measure \(\mu \) has n or fewer distinct points there exists a polynomial \(p\in \mathbb {C}_n[z]\) such that \(p\equiv 0\) on the support while \(p(z_0)=1.\) Hence

$$\begin{aligned} \frac{1}{\int _K|p(z)|^2\mathrm{d}\mu }=\infty \end{aligned}$$

and \(\mu \) cannot be an optimal prediction measure as taking any \(n+1\) points \(a_0,\ldots ,a_n\) in K and positive numbers \(w_0,\ldots ,w_n\) with \(\sum _{j=0}^nw_j=1\), the measure \(\nu :=\sum _{j=0}^n w_j \delta _{a_j}\) is definite. Thus \(K_n^\nu (z_0,z)\) is a nontrivial polynomial of degree n with

$$\begin{aligned} K_n^\nu (z_0,z_0)=\sup _{p\in \mathbb {C}_n[z],\,\,\, p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\nu }<\infty . \end{aligned}$$

\(\square \)

Hoel and Levine [5] show that in the univariate case, for \(K=[-1,1],\) and any \(z_0\in \mathbb {R}\backslash K,\) a real external point, the optimal prediction measure is unique and is a discrete measure supported at the \(n+1\) extremal points \(x_k=\cos (k\pi /n),\) \(0\le k\le n,\) of \(T_n(x)\) the classical Chebyshev polynomial of the first kind (see Lemma 3.1 below). In this case it turns out that

$$\begin{aligned} K_n^{\mu _0}(z_0,z_0)=T_n^2(z_0). \end{aligned}$$
(11)

Notably, as is well known, \(T_n(x)\) is the polynomial of extremal growth for any point \(z_0\in \mathbb {R}\backslash [-1,1]\) relative to \(K=[-1,1].\) Also, Erdős [3] has shown that the Chebyshev polynomial is also extreme relative to \([-1,1]\) for real polynomials at points \(z_0\in \mathbb {C}\) with \(|z_0|\ge 1\); i.e.,

$$\begin{aligned} \max _{p\in \mathbb {R}_n[x],\,\,\Vert p\Vert _{[-1,1]}\le 1}|p(z_0)|=|T_n(z_0)|. \end{aligned}$$

The problem for real polynomials and \(|z_0|\le 1\) or for complex polynomials \(p\in \mathbb {C}[z]\) has remained unsolved up to now.

We show in Sect. 2 that (11) is not an accident, and that there is a general equivalence of our two extremum problems. In Sect. 3 we give a complete and unique characterization of optimal prediction measures and polynomials of extremal growth for the case of the unit interval \(K=[-1,1]\subset \mathbb {C}.\) Finally, in Sect. 4 we will use this to compute the polynomials of extremal growth and the optimal prediction measures for a purely imaginary complex point \(z_0\in \mathbb {C}\backslash [-1,1].\)

2 A Kiefer–Wolfowitz Type Equivalence Theorem

Kiefer and Wolfowitz [6] have given a remarkable equivalence between what are called D-optimal and G-optimal designs; i.e., probability measures that maximize the determinant of the design matrix \(G_n(\mu )\) and those that minimize the maximum over x interior to K,  of the prediction variance; i.e., minimize \(\max _{x\in K} K_n^\mu (x,x).\) Here we give an analogous equivalence, for a single exterior point \(z_0\in \mathbb {C}^d\backslash K,\) with the problem of extremal polynomial growth.

Combining the definition of an optimal prediction measure (8) and the variational form for the kernel (9), the problem of minimal variance is to find

$$\begin{aligned} \min _{\mu \in \mathcal{M}(K)} \ \max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\int _K|p(z)|^2\mathrm{d}\mu }. \end{aligned}$$

It turns out that this can be easily analyzed using the classical Minimax theorem (see e.g. Gamelin [4, Thm. 7.1, Ch. II]).

Proposition 2.1

The minimal variance is the square of the maximal polynomial growth, i.e.,

$$\begin{aligned} \min _{\mu \in \mathcal{M}(K)} \ \max _{p\in \mathbb {C}_n[z],\,\,p(z_0)=1}\frac{1}{\int _K|p(z)|^2\mathrm{d}\mu }=\max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\Vert p\Vert _K^2}. \end{aligned}$$

Proof

First note that we may simplify to

$$\begin{aligned} \min _{\mu \in \mathcal{M}(K)} \max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu } =1/\left\{ \max _{\mu \in \mathcal{M}(K)} \min _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\int _K |p(z)|^2\mathrm{d}\mu \right\} . \end{aligned}$$

Now, for \(\mu \in \mathcal{M}(K)\) and \(p\in \mathbb {C}_n[z]\) such that \(p(z_0)=1,\) let

$$\begin{aligned} f(\mu ,p):=\int _K |p(z)|^2\mathrm{d}\mu . \end{aligned}$$

It is easy to confirm that f is quasiconcave in \(\mu \) and quasiconvex in p and hence by the Minimax Theorem

$$\begin{aligned} \max _{\mu \in \mathcal{M}(K)} \min _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\int _K |p(z)|^2\mathrm{d}\mu = \min _{p\in \mathbb {C}_n[z],\,p(z_0)=1} \max _{\mu \in \mathcal{M}(K)} \int _K |p(z)|^2\mathrm{d}\mu . \end{aligned}$$

However, as \(\mu =\delta _x\in \mathcal{M}(K)\) for every \(x\in K,\) it follows that

$$\begin{aligned} \max _{\mu \in \mathcal{M}(K)} \int _K |p(z)|^2\mathrm{d}\mu =\Vert p\Vert _K^2. \end{aligned}$$

Consequently, the minimum variance is given by

$$\begin{aligned} \min _{\mu \in \mathcal{M}(K)} \max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu }=\max _{p\in \mathbb {C}_n[z],\,p(z_0)=1} \frac{1}{\Vert p\Vert _K^2}, \end{aligned}$$

as claimed. \(\square \)

We remark that the Minimax theorem in a similar context has been used before to get pointwise estimates of solutions to the \(\bar{\partial }\)-equation by Berndtsson in [1, p. 206].

It is also possible to give a more precise relation between the extremal polynomials for the two problems (of minimum variance and extremal growth).

Theorem 2.2

A measure \(\mu _0\in \mathcal{M}(K)\) is an optimal prediction measure for \(z_0\notin K\) relative to K if and only if there is an associated (optimal) prediction polynomial \(P_n^{\mu _0,z_0}(z)\in C_n[z],\) (10), such that \(\Vert P_n^{\mu _0,z_0}\Vert _K=\Vert P_n^{\mu _0,z_0}\Vert _{L_2(\mu _0)},\) i.e.,

$$\begin{aligned} \max _{z\in K} |P_n^{\mu _0,z_0}(z)|^2=\int _K|P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0, \end{aligned}$$

or, equivalently, if and only if there is an associated prediction polynomial that is also a polynomial of extremal growth at \(z_0\) relative to K.

Proof

First suppose that \(P_n^{\mu _0,z_0}(z)\in \mathbb {C}_n[z]\) is an optimal prediction polynomial associated with \(\mu _0\) such that \(\Vert P_n^{\mu _0,z_0}\Vert _K=\Vert P_n^{\mu _0,z_0}\Vert _{L^2(\mu _0)}.\) Then for any \(\mu \in \mathcal{M}(K),\)

$$\begin{aligned} K_n^\mu (z_0,z_0)&= \max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu }\\&\ge \frac{1}{\int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu }\\&\ge \frac{1}{\int _K\Vert P_n^{\mu _0,z_0}(z)\Vert _K^2\mathrm{d}\mu }\\&=\frac{1}{\Vert P_n^{\mu _0,z_0}(z)\Vert _K^2}\\&=\frac{1}{\int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0}\\&=K_n^{\mu _0}(z_0,z_0) \end{aligned}$$

and hence \(\mu _0\) is optimal.

To see that \(P_n^{\mu _0,z_0}\) is also a polynomial of extremal growth, let \(p\in \mathbb {C}_n[z]\) be any other polynomial for which \(p(z_0)=1.\) Then

$$\begin{aligned} \Vert P_n^{\mu _0,z_0}\Vert _K^2&=\Vert P_n^{\mu _0,z_0}\Vert _{L_2(\mu _0)}^2\\&=\int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0\\&\le \int _K |p(z)|^2\mathrm{d}\mu _0 \quad \hbox {(as} P_n^{\mu _0,z_0}\hbox {is a prediction polynomial)}\\&\le \Vert p\Vert _K^2. \end{aligned}$$

Hence

$$\begin{aligned} \Vert P_n^{\mu _0,z_0}\Vert _K=\min _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\Vert p\Vert _K \end{aligned}$$

and \(P_n^{\mu _0,z_0}\) is indeed a polynomial of extremal growth.

Conversely, suppose that \(\mu _0\) is optimal and let \(P_n^{\mu _0,z_0}(z)\in \mathbb {C}_n[z]\) be a polynomial of extremal growth for \(z_0\) relative to K,  i.e., \(P_n^{\mu _0,z_0}(z_0)=1\) and for any other \(p\in \mathbb {C}_n[z]\) such that \(p(z_0)=1,\)

$$\begin{aligned} \Vert P_n^{\mu _0,z_0}\Vert _K\le \Vert p\Vert _K. \end{aligned}$$

We claim that \(P_n^{\mu _0,z_0}\) is an optimal prediction polynomial and that \(\Vert P_n^{\mu _0,z_0}\Vert _K=\Vert P_n^{\mu _0,z_0}\Vert _{L^2(\mu _0)}.\)

To see this note that by Proposition 2.1

$$\begin{aligned} \max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu _0}=\frac{1}{\Vert P_n^{\mu _0,z_0}\Vert _K^2}; \end{aligned}$$

i.e.,

$$\begin{aligned} \min _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\int _K |p(z)|^2\mathrm{d}\mu _0=\Vert P_n^{\mu _0,z_0}\Vert _K^2. \end{aligned}$$

Hence

$$\begin{aligned} \Vert P_n^{\mu _0,z_0}\Vert _K^2&\le \int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0\\&\le \int _K \Vert P_n^{\mu _0,z_0}\Vert _K^2 \mathrm{d}\mu _0\\&=\Vert P_n^{\mu _0,z_0}\Vert _K^2; \end{aligned}$$

i.e., \(\int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0=\Vert P_n^{\mu _0,z_0}\Vert _K^2.\)

Moreover,

$$\begin{aligned} \max _{p\in \mathbb {C}_n[z],\,p(z_0)=1}\frac{1}{\int _K |p(z)|^2\mathrm{d}\mu _0}=\frac{1}{\Vert P_n^{\mu _0,z_0}\Vert _K^2} =\frac{1}{\int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0} \end{aligned}$$

and so \(P_n^{\mu _0,z_0}\) is also an optimal prediction polynomial associated with \(\mu _0.\) \(\square \)

In particular, if \(\mu _0\) is definite then

$$\begin{aligned} \int _K |P_n^{\mu _0,z_0}(z)|^2\mathrm{d}\mu _0=\Vert P_n^{\mu _0,z_0}\Vert _K^2= 1/K_n^{\mu _0}(z_0,z_0). \end{aligned}$$

Remark 2.3

It is easily confirmed that \(|P_n^{\mu _0,z_0}(z)|\equiv \Vert P_n^{\mu _0,z_0}\Vert _K\) on the support of \(\mu _0.\) Consequently optimal prediction measures are always supported on a real algebraic subset of K of degree 2n.

Example 2.4

Recall the situation of Example 1.1: \(K=[-1,1]^2\subset \mathbb {C}^2\) with the measure \(\mu _0:=\frac{1}{4}\delta _{(-1,0)}+\frac{3}{4}\delta _{(1,0)}.\) We show that this is an optimal prediction measure for the external point \(z_0=(2,0)\) and polynomials of degree at most 1. As mentioned in Example 1.1, the prediction polynomials for this measure and point are \(p(x,y)=x/2+cy,\) \(c\in \mathbb {C}.\) For the particular polynomial \(P_1^{\mu _0,z_0}(x,y):=x/2,\) we have \(\Vert P_1^{\mu _0,z_0}\Vert _K^2=1/4\) and \(\int _K |P_1^{\mu _0,z_0}(x,y)|^2\mathrm{d}\mu _0=1/8+3/8=1/4\). Hence by Theorem 2.2, \(\mu _0\) is an optimal prediction measure.

We now give an example showing that optimal prediction measures need not be unique, even in the univariate situation. Let

$$\begin{aligned} K=\mathbf {D}=\{z\in \mathbb {C}: |z|\le 1\} \end{aligned}$$

and fix \(z_0\) with \(|z_0|>1\). Write \(z_0=|z_0|e^{i\phi }\) for a fixed angle \(\phi \).

Proposition 2.5

Consider the measure

$$\begin{aligned} \mathrm{d}\mu (\theta ):=\left[ \sum _{k=-\infty }^{\infty } |z_0|^{-|k|} e^{ik(\theta +\phi )}\right] \frac{1}{2\pi }\mathrm{d}\theta , \end{aligned}$$

i.e., \(\mathrm{d}\mu ,\) the Poisson kernel for \(1/z_0\) times \(\mathrm{d}\theta /(2\pi ),\) supported on the unit circle. Then \(\mu \) is an optimal prediction measure for \(K=\mathbf {D}\) and \(z_0\notin \mathbf {D}\) for any degree n.

Proof

For \(j=0,\pm 1,\pm 2,\ldots \), let

$$\begin{aligned} m_j(\mu ):=\int _K z^j\mathrm{d}\mu =\int _0^{2\pi } e^{ij\theta }\mathrm{d}\mu (\theta )= \frac{1}{2\pi }\int _0^{2\pi } \left[ \sum _{k=-\infty }^{\infty } |z_0|^{-k} e^{ik(\theta +\phi )}\right] e^{ij\theta }\mathrm{d}\theta . \end{aligned}$$

It follows easily that for any j,

$$\begin{aligned} m_j(\mu )= |z_0|^{-j} e^{-ij\phi }= z_0^{-j}. \end{aligned}$$
(12)

Thus the Gram matrix for \(\mu \) with respect to the basis \(\{1,z,\ldots ,z^n\}\) for \(\mathbb {C}_1[z]\) is

$$\begin{aligned} G_n(\mu )=G(z_0^{-1}):= \left[ \begin{array}{ccccc} 1 &{}z_0^{-1} &{}z_0^{-2}\ldots &{}z_0^{-n}\\ \bar{z}_0^{-1}&{}1 &{}z_0^{-1} \ldots &{}z_0^{-(n-1)}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \bar{z}_0^{-n}&{}\bar{z}_0^{-(n-1)} &{}\bar{z}_0^{-(n-2)} \ldots &{}1 \end{array} \right] . \end{aligned}$$

More generally, we define, for \(|z|\not = 1\),

$$\begin{aligned} G(z):= \left[ \begin{array}{ccccc} 1 &{}z &{}z^2\ldots &{}z^{n}\\ \bar{z}&{}1 &{}z\ldots &{}z^{(n-1)}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \bar{z}^{n}&{}\bar{z}^{(n-1)} &{}\bar{z}^{(n-2)} \ldots &{}1 \end{array} \right] . \end{aligned}$$

One easily verifies that

$$\begin{aligned} G(z)^{-1}:= \frac{1}{|z|^2-1}\left[ \begin{array}{cccccc} -1 &{}z &{}0&{}\ldots &{} 0&{}0\\ \bar{z}&{}-(1+|z|^2) &{}z&{} \ldots &{} 0&{}0\\ \vdots &{} \vdots &{}&{} \ddots &{}&{} \vdots \\ 0&{}0 &{} \ldots &{} \bar{z} &{}-(1+|z|^2)&{}z \\ 0&{}0 &{}0 &{}\ldots &{} \bar{z} &{}-1 \end{array} \right] . \end{aligned}$$

Next, letting, for \(z\not = 0\),

$$\begin{aligned} {\mathbf {P}}(z):=\left[ \begin{array}{c}1\\ z^{-1}\\ \cdot \\ \cdot \\ z^{-n}\end{array}\right] \in \mathbb {C}^{ n+1}, \end{aligned}$$
(13)

we easily verify that

$$\begin{aligned} {\mathbf {P}}^*(z)G(z)^{-1}{\mathbf {P}}(z)=|z|^{-2n}. \end{aligned}$$

Thus we have

$$\begin{aligned} K_n^{\mu _0}(z_0,z_0)={\mathbf {P}}^*(z_0^{-1})G(z_0^{-1})^{-1}{\mathbf {P}}(z_0^{-1})=|z_0^{-1}|^{-2n}=|z_0|^{2n}. \end{aligned}$$

But it is well-known that \(p_n(z)=z^n\) is a polynomial of degree n of extremal growth at \(z_0\) relative to K (see [2]); thus we know from Proposition 2.1 the optimal value \(\inf _{\nu \in \mathcal M(K)}K_n^{\nu }(z_0,z_0)\) is

$$\begin{aligned} |p_n(z_0)|^2=|z_0|^{2n} \end{aligned}$$

and the proof is complete. \(\square \)

For each degree n we can produce additional optimal prediction measures by taking any discrete measure \(\nu \) that reproduces the moments of \(\mu \) present in the Gram matrix \(G_n(\mu )\). Such discrete measures \(\nu \) can be constructed, e.g., by Szegő quadrature (see section 7 of [7]). \(\square \)

However, as we will see in the next section, for a real interval and a point exterior to this interval, optimal prediction measures are unique.

3 A Complex Point External to \([-1,1]\)

We now consider \(K=[-1,1]\subset \mathbb {C}\) and \(z_0\in \mathbb {C}\backslash K.\) We normalize so that the extremal polynomials are \(P_n^{\mu _0,z_0}(z)=\frac{K_n^{\mu _0}(z_0,z)}{\sqrt{K_n^{\mu _0}(z_0,z_0)}}\); these have supremum norm 1 on \([-1,1]\). As mentioned in Remark 2.3 above, the support of an optimal prediction measure in this case is a subset of \([-1,1]\) where \(|P_n^{\mu _0,z_0}(z)|= 1,\) its maximum value. It is not possible that \(|P_n^{\mu _0,z_0}(z)|\equiv 1\) on all of \([-1,1]\) and hence the support of \(\mu _0\) consists of at most 2n points in \([-1,1],\) counting multiplicities. Any interior point, being a local maximum of \(|P_n^{\mu _0,z_0}|,\) must be of even multiplicity and hence there can be at most n interior points. However, exactly n interior (double) points would mean that \(z=\pm 1\) are not maximum points of \(|P_n^{\mu _0,z_0}(z)|\); i.e., \(|P_n^{\mu _0,z_0}(\pm 1)|<1.\) But then the fact that \(\lim _{z\rightarrow \pm \infty }|P_n^{\mu _0,z_0}(z)|= \infty \) would imply that there are two other points outside \([-1,1]\) where \(|P_n^{\mu _0,z_0}(z)|\) attains the value 1, giving \(2n+2>2n\) real points where the value 1 is attained, an impossibility. Hence there are at most \(n-1\) interior points in the support of \(\mu _0.\) The fact that \(G_n(\mu _0)\) is non-singular requires that there are at least \(n+1\) support points, and these must therefore consist of \(n-1\) interior points together with the two endpoints \(\pm 1\); i.e., \(x_0:=-1,\) \(x_n:=+1\) and \(n-1\) internal (double) points \(-1<x_1<\cdots<x_{n-1}<1.\) Consequently

$$\begin{aligned} \mu _0=\sum _{i=0}^n w_i \delta _{x_i} \end{aligned}$$

with weights \(w_i>0,\) \(\sum _{i=0}^n w_i=1.\)

Given the support points \(x_i\) there is a simple recipe for the optimal weights, given already in [5].

Lemma 3.1

(Hoel–Levine) Suppose that \(-1=x_0<x_1<\cdots <x_n=+1\) are given. Then among all discrete probability measures supported at these points, the measure with

$$\begin{aligned} w_i:=\frac{|\ell _i(z_0)|}{\sum _{i=0}^n |\ell _i(z_0)|},\quad 0\le i\le n \end{aligned}$$
(14)

with \(\ell _i(z)\) the ith fundamental Lagrange interpolating polynomial for these points, minimizes \(K_n^\mu (z_0,z_0).\)

Proof

We first note that for such a discrete measure, \(\{\ell _i(z)/\sqrt{w_i}\}_{0\le i \le n}\) form an orthonormal basis. Hence

$$\begin{aligned} K_n^\mu (z_0,z_0)=\sum _{i=0}^n \frac{|\ell _i(z_0)|^2}{w_i}. \end{aligned}$$
(15)

In the case of the weights chosen according to (14) we obtain

$$\begin{aligned} K_n^{\mu _0}(z_0,z_0)=\left( \sum _{i=0}^n |\ell _i(z_0)|\right) ^2. \end{aligned}$$
(16)

We claim that for any choice of weights \(K_n^{\mu }\) given by (15) is at least as large as that given by (16). To see this, just note that by the Cauchy–Schwartz inequality,

$$\begin{aligned} \left( \sum _{i=0}^n |\ell _i(z_0)|\right) ^2= & {} \left( \sum _{i=0}^n \frac{|\ell _i(z_0)|}{\sqrt{w_i}}\cdot \sqrt{w_i}\right) ^2\\\le & {} \left( \sum _{i=0}^n \frac{|\ell _i(z_0)|^2}{w_i}\right) \cdot \left( \sum _{i=0}^n w_i\right) \\= & {} \sum _{i=0}^n \frac{|\ell _i(z_0)|^2}{w_i}. \end{aligned}$$

\(\square \)

Remark 3.2

We note that the optimal \(K_n^{\mu _0}(z_0,z_0)\) given by (16) is the Lebesgue function squared. Hence the problem of finding the support of the optimal prediction measure amounts to finding the \(n+1\) interpolation points \(-1=x_0<x_1<\cdots <x_n=+1\) for which the Lebesgue function evaluated at the external point \(z_0,\)

$$\begin{aligned} \Lambda (z_0):=\sum _{i=0}^n |\ell _i(z_0)|, \end{aligned}$$

is as small as possible.

Recall that the optimal prediction polynomials \(P_n^{\mu _0,z_0}(z)=\frac{K_n^{\mu _0}(z_0,z)}{\sqrt{K_n^{\mu _0}(z_0,z_0)}}\) have supremum norms 1 on \([-1,1]\).

Lemma 3.3

Suppose that the measure \(\mu _0\) is supported at the points \(-1=x_0<x_1<\cdots <x_n=+1\) with optimal weights given by (14). Then

$$\begin{aligned} P_n^{\mu _0,z_0}(z)=\sum _{i=0}^n \mathrm{sgn}(\ell _i(z_0))\ell _i(z) \end{aligned}$$

where \(\mathrm{sgn}(z):=\overline{z}/|z|\) is the complex sign of \(z\in \mathbb {C}.\)

Proof

Using again the fact that \(\{\ell _i(z)/\sqrt{w_i}\}_{0\le i\le n}\) form a set of orthonormal polynomials, we have

$$\begin{aligned} P_n^{\mu _0,z_0}(z)= & {} \frac{1}{\Lambda (z_0)}\sum _{i=0}^n \frac{\overline{\ell _i(z_0)}}{\sqrt{w_i}}\frac{\ell _i(z)}{\sqrt{w_i}}\\= & {} \frac{1}{\Lambda (z_0)}\sum _{i=0}^n \left( \Lambda (z_0) \frac{\overline{\ell _i(z_0)}}{|\ell _i(z_0)|} \right) \ell _i(z)\\= & {} \sum _{i=0}^n \frac{\overline{\ell _i(z_0)}}{|\ell _i(z_0)|} \cdot \ell _i(z). \end{aligned}$$

\(\square \)

Remark 3.4

By the equivalence Theorem 2.2 the support of the optimal prediction measure and the polynomial of extremal growth will be given by those points \(-1=x_0<x_1<\cdots <x_n=+1\) for which

$$\begin{aligned} \max _{-1\le x\le 1} \left| \sum _{i=0}^n \frac{\overline{\ell _i(z_0)}}{|\ell _i(z_0)|} \cdot \ell _i(x)\right| =1. \end{aligned}$$

4 A Purely Imaginary Point External to \([-1,1]\)

In the case of \(z_0=ai,\) \(0\ne a\in \mathbb {R},\) a purely imaginary point, it turns out that there are remarkable formulas for the polynomial of extremal growth as well as for the support of the optimal prediction measure. Both of these will depend on the point \(z_0\) (as opposed to the real case \(z_0\in \mathbb {R}\backslash [-1,1]\) where Hoel and Levine [5] showed that the support is always the set of extreme points of the Chebyshev polynomial \(T_n(x)\)).

To begin we will first analyze the degrees \(n=1\) and \(n=2\) cases.

4.1 Degree \(n=1\)

Here the support of the extremal measure is necessarily \(x=-1\) and \(x_1=+1.\) We will compute \(P_1^{\mu _0,z_0}(z)\) using the formula given in Lemma 3.3. Indeed in this case, \(\ell _0(z)=(1-z)/2\) and \(\ell _1(z)=(1+z)/2\) so that

$$\begin{aligned} \mathrm{sgn}(\ell _0(ia))=\mathrm{sgn}\left( \frac{1-ia}{2}\right) =\frac{1+ia}{\sqrt{a^2+1}} \end{aligned}$$

and

$$\begin{aligned} \mathrm{sgn}(\ell _1(ia))=\mathrm{sgn}\left( \frac{1+ia}{2}\right) =\frac{1-ia}{\sqrt{a^2+1}}. \end{aligned}$$

Hence,

$$\begin{aligned} P_1^{\mu _0,z_0}(z)= & {} \frac{1+ia}{\sqrt{a^2+1}}\frac{1-z}{2} +\frac{1-ia}{\sqrt{a^2+1}}\frac{1+z}{2}\\= & {} \frac{1}{\sqrt{a^2+1}}\{1-iaz\}. \end{aligned}$$

Since \(\pm 1\) is necessarily the support of the optimal prediction measure it is immediate that \(\Vert P_1^{\mu _0,z_0}\Vert _{[-1,1]}=1,\) as is also easily verified by a simple direct calculation.

4.2 Degree \(n=2\)

We claim that the support of the optimal prediction measure is \(x_0=-1,\) \(x_1=0\) and \(x_2=+1.\) However, this is not automatic and we will have to verify that the norm of \(P_2^{\mu _0,z_0}\) is indeed 1. Now, it is easy to see, for this support, that

$$\begin{aligned} \ell _0(z)=\frac{z(z-1)}{2},\quad \ell _1(z)=1-z^2,\quad \ell _2(z)=\frac{z(z+1)}{2} \end{aligned}$$

for which

$$\begin{aligned} \mathrm{sgn}(\ell _0(ia))= & {} \mathrm{sgn}\left( \frac{ia(ia-1)}{2}\right) \\= & {} \frac{-ia}{|a|}\cdot \frac{-ia-1}{\sqrt{a^2+1}}\\= & {} i\,\mathrm{sgn}(a)\frac{1+ia}{\sqrt{a^2+1}}, \end{aligned}$$
$$\begin{aligned} \mathrm{sgn}(\ell _1(ia))=\mathrm{sgn}(1+a^2)=+1, \end{aligned}$$

and, after a simple calculation,

$$\begin{aligned} \mathrm{sgn}(\ell _2(ia))= i\,\mathrm{sgn}(a)\frac{ia-1}{\sqrt{a^2+1}}. \end{aligned}$$

From this we may easily conclude that

$$\begin{aligned} P_2^{\mu _0,z_0}(z)= & {} \sum _{i=0}^2 \mathrm{sgn}(\ell _i(z_0))\ell _i(z)\\= & {} \frac{\mathrm{sgn}(a)}{\sqrt{a^2+1}}\left( -(a+\mathrm{sgn}(a)\sqrt{a^2+1})z^2-iz+\mathrm{sgn}(a)\sqrt{a^2+1}\right) . \end{aligned}$$

The fact that \(\Vert P_2^{\mu _0,z_0}\Vert _{[-1,1]}=1\) is an immediate consequence of the following lemma.

Lemma 4.1

For \(x\in \mathbb {R}\) we have

$$\begin{aligned} |P_2^{\mu _0,z_0}(x)|^2= & {} 1+\frac{(|a|+\sqrt{a^2+1})^2}{a^2+1}x^2(x^2-1)\\= & {} 1+(x^2-1)R_1^2(x),\quad R_1(x):=\frac{|a|+\sqrt{a^2+1}}{\sqrt{a^2+1}}\,x. \end{aligned}$$

Proof

This follows from elementary calculations starting with the formula for \(P_2^{\mu _0,z_0}(x)\) given above. \(\square \)

We now define a sequence of polynomials, \(Q_n(z),\) based on the above degrees \(n=1\) and \(n=2\) cases, for which we will show that \(Q_n(z)=c_nP_n^{\mu _0,z_0}(z)\) for certain \(c_n\in \mathbb {C}\) with modulus \(|c_n|=1.\) We will also define a sequence of polynomials \(R_n(x)\) which will play the role of \(R_1(x)\) in the Lemma for general degree n.

Now, as the formula for \(P_2^{\mu _0,z_0}\) depends on the sign of a,  in order to simplify the formulas we will assume that \(a>0.\) For \(a<0,\) one may use the relation \(P_2^{\mu _0,ia}(z)=P_2^{\mu _0,-ia}(-z).\)

Definition 4.2

For \(a>0\) we define the sequences of polynomials \(Q_n(z)\) and \(R_n(z)\) by

$$\begin{aligned} Q_1(z)= & {} -\frac{az+i}{\sqrt{a^2+1}}, \quad (=(-i)P_1^{\mu _0,z_0}(z))\\ Q_2(z)= & {} \frac{1}{\sqrt{a^2+1}}\left( -(a+\sqrt{a^2+1})z^2-iz+\sqrt{a^2+1}\right) , \quad (=P_2^{\mu _0,z_0}(z))\\ Q_{n+1}(z)= & {} 2zQ_n(z)-Q_{n-1}(z),\quad n=2,3,\ldots . \end{aligned}$$

and

$$\begin{aligned} R_0(z)= & {} \frac{a}{\sqrt{a^2+1}},\\ R_1(z)= & {} \frac{a+\sqrt{a^2+1}}{\sqrt{a^2+1}} \,z,\\ R_{n+1}(z)= & {} 2zR_n(z)-R_{n-1}(z),\quad n=1,2,\ldots . \end{aligned}$$

Since the recursions are both those of the classical Chebyshev polynomials it is not surprising that there are formulas for \(Q_n(z)\) and \(R_n(z)\) in terms of these.

Lemma 4.3

We have

$$\begin{aligned} Q_n(z)=\frac{1}{\sqrt{a^2+1}}\left( -(az+i)T_{n-1}(z)+\sqrt{a^2+1}(1-z^2)U_{n-2}(z)\right) \end{aligned}$$

where \(T_n(z)\) is Chebyshev polynomial of the first kind and \(U_n(z):=\frac{1}{n+1}T_{n+1}'(z)\) that of the second kind.

Proof

Let \(q_n(z)\) denote the right side of the proposed identity. We proceed by induction. For \(n=1\) we have

$$\begin{aligned} q_1(z)= & {} \frac{1}{\sqrt{a^2+1}}\left( -(az+i)T_{1-1}(z)+\sqrt{a^2+1}(1-z^2)U_{1-2}(z)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}\left( -(az+i)\times 1 + 0\right) \\= & {} Q_1(z). \end{aligned}$$

Similarly, for \(n=2\) we have

$$\begin{aligned} q_2(z)= & {} \frac{1}{\sqrt{a^2+1}}\left( -(az+i)T_{2-1}(z)+\sqrt{a^2+1}(1-z^2)U_{2-2}(z)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}\left( -(az+i)z+\sqrt{a^2+1}(1-z^2)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}\left( -(a+\sqrt{a^2+1})z^2-iz+\sqrt{a^2+1} \right) \\= & {} Q_2(z). \end{aligned}$$

The result now follows easily from the fact that both kinds of Chebyshev polynomials satisfy the same recursion as used in the definition of \(Q_n(z).\) \(\square \)

Lemma 4.4

We have

$$\begin{aligned} R_n(z)=\frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}zU_{n-1}(z)+aT_{n}(z)\right) . \end{aligned}$$

Proof

Let \(r_n(z)\) denote the right side of the proposed identity. We again proceed by induction. For \(n=0\) we have

$$\begin{aligned} r_0(z)= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}zU_{-1}(z)+aT_{0}(z)\right) \\= & {} \frac{a}{\sqrt{a^2+1}}\\= & {} R_0(z). \end{aligned}$$

Similarly, for \(n=1\) we have

$$\begin{aligned} r_1(z)= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}zU_{0}(z)+aT_{1}(z)\right) \\= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}z\times 1+a\times z\right) \\= & {} \frac{a+\sqrt{a^2+1}}{\sqrt{a^2+1}}z\\= & {} R_1(z). \end{aligned}$$

The result now follows easily from the fact that both kinds of Chebyshev polynomials satisfy the same recursion as used in the definition of \(R_n(z).\) \(\square \)

Now, just for the Chebyshev polynomials \(T_n(z)\) and \(U_{n-1}(z)\) there is the Pell identity

$$\begin{aligned} T_n^2(z)-(z^2-1)U_{n-1}^2(z)\equiv 1. \end{aligned}$$
(17)

We will show that for real \(z\in \mathbb {R},\) the polynomials \(Q_n(z)\) and \(R_{n-1}(z)\) satisfy a similar Pell identity.

Proposition 4.5

For \(z=x\in \mathbb {R},\) we have

$$\begin{aligned} |Q_n(x)|^2-(x^2-1)R_{n-1}^2(x)\equiv 1. \end{aligned}$$

Proof

By Lemma 4.3, \(z=x\in \mathbb {R},\) we may write

$$\begin{aligned} Q_n(x)= & {} \frac{1}{\sqrt{a^2+1}}\left( -(ax+i)T_{n-1}(x)+\sqrt{a^2+1}(1-x^2)U_{n-2}(x)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}\left( -iT_{n-1}(x)+\left\{ -axT_{n-1}(x)+\sqrt{a^2+1}(1-x^2)U_{n-2}(x) \right\} \right) \end{aligned}$$

so that

$$\begin{aligned} |Q_n(x)|^2=\frac{1}{a^2+1}\left( T_{n-1}^2(x)+\left( -axT_{n-1}(x)+\sqrt{a^2+1}(1-x^2)U_{n-2}(x) \right) ^2\right) . \end{aligned}$$

Hence, using the Chebyshev Pell identity (17),

$$\begin{aligned}&(a^2+1)(1-|Q_n(x)|^2)\\&\quad =(a^2+1)-T_{n-1}^2(x)-a^2x^2T_{n-1}^2(x)\\&\qquad -(a^2+1)(1-x^2)^2U_{n-2}^2(x)+2a\sqrt{a^2+1}x(1-x^2)U_{n-2}(x)T_{n-1}(x)\\&\quad =(a^2+1)(1-T_{n-1}^2(x))+a^2(1-x^2)T_{n-1}^2(x)-(a^2+1)(1-x^2)^2U_{n-2}^2(x)\\&\qquad +2a\sqrt{a^2+1}x(1-x^2)U_{n-2}(x)T_{n-1}(x)\\&\quad =(a^2+1)(1-x^2)U_{n-2}^2(x)+a^2(1-x^2)T_{n-1}^2(x)-(a^2+1)(1-x^2)^2U_{n-2}^2(x)\\&\qquad +2a\sqrt{a^2+1}x(1-x^2)U_{n-2}(x)T_{n-1}(x)\\&\quad =(1-x^2)\Big \{(a^2+1)U_{n-2}^2(x) +a^2T_{n-1}^2(x)-(a^2+1)(1-x^2)U_{n-2}^2(x)\\&\qquad +2a\sqrt{a^2+1}xU_{n-2}(x)T_{n-1}(x)\Big \}\\&\quad =(1-x^2)\Big \{(a^2+1)[1-(1-x^2)]U_{n-2}^2(x) +a^2T_{n-1}^2(x)\\&\qquad +2a\sqrt{a^2+1}xU_{n-2}(x)T_{n-1}(x)\Big \}\\&\quad =(1-x^2)\Big \{(a^2+1)x^2U_{n-2}^2(x) +a^2T_{n-1}^2(x) +2a\sqrt{a^2+1}xU_{n-2}(x)T_{n-1}(x)\Big \}\\&\quad =(1-x^2)\Big \{\sqrt{a^2+1}xU_{n-2}(x)+aT_{n-1}(x)\Big \}^2\\&\quad =(1-x^2)(a^2+1)R_{n-1}^2(x). \end{aligned}$$

The last equality follows from Lemma 4.4. \(\square \)

From the Pell identity Proposition 4.5, we immediately have

Corollary 4.6

For \(x\in [-1,1],\)

$$\begin{aligned} |Q_n(x)|\le 1 \end{aligned}$$

and its maximum of 1 is attained at the endpoints \(x=\pm 1\) and the zeros of \(R_{n-1}(x).\)

Indeed, we claim that the endpoints together with the zeros of \(R_{n-1}(x)\) form the support of the optimal prediction measure. To this end we first prove that \(R_{n-1}(x)\) has \(n-1\) zeros in \((-1,1).\)

Lemma 4.7

The polynomials \(R_n(x)\) have n distinct zeros in \((-1,1)\) which interlace the extreme points of \(T_n(x),\) \(\cos (k\pi /n),\) \(0\le k\le n.\)

Proof

Using the fact that \(T_n'(x)=nU_{n-1}(x),\) we have that at an interior extremal point of \(T_n(x),\) \(\cos (k\pi /n),\) \(1\le k\le (n-1),\)

$$\begin{aligned} R_n(\cos (k\pi /n))= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}zU_{n-1}(\cos (k\pi /n))+aT_{n}(\cos (k\pi /n))\right) \\= & {} \frac{1}{\sqrt{a^2+1}} \left( 0+a(-1)^k\right) \\= & {} \frac{a}{\sqrt{a^2+1}}(-1)^k \end{aligned}$$

so that

$$\begin{aligned} \mathrm{sgn}(R_n(\cos (k\pi /n)))=(-1)^k,\quad 1\le k\le (n-1). \end{aligned}$$

Further, for \(k=0,\) \(\cos (k\pi /n)=1,\)

$$\begin{aligned} R_n(1)= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}U_{n-1}(1)+aT_{n}(1)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}\left( n\sqrt{a^2+1}+a\right) \end{aligned}$$

so that

$$\begin{aligned} \mathrm{sgn}(R_n(\cos (0\pi /n)))=+1=(-1)^0. \end{aligned}$$

Similarly, for \(k=n,\) \(\cos (k\pi /n)=-1,\)

$$\begin{aligned} R_n(-1)= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}(-1)U_{n-1}(-1)+aT_{n}(-1)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}(n\sqrt{a^2+1}+a)(-1)^n \end{aligned}$$

so that also

$$\begin{aligned} \mathrm{sgn}(R_n(\cos (n\pi /n)))=(-1)^n. \end{aligned}$$

The result follows. \(\square \)

Suppose now that \(\mu _0\) is the discrete measure supported on \(\pm 1\) together with the \(n-1\) zeros of \(R_{n-1}(x),\) with optimal weights given by Lemma 3.1.

Proposition 4.8

The polynomials \(Q_n(z)\) are of extremal growth at \(z_0=ai\) relative to \(K=[-1,1].\) Specifically, \(Q_n(z)=-(i)^nP_n^{\mu _0,z_0}(z).\)

Proof

Let \(-1=x_0<x_1<\cdots <x_n=+1\) be the support points with corresponding Lagrange polynomials \(\ell _k(z).\) We will show that

$$\begin{aligned} Q_n(x_k)=-(i)^n\mathrm{sgn}(\ell _k(ai)),\quad 0\le k\le n \end{aligned}$$

using the formula

$$\begin{aligned} \ell _k(z)=\frac{\omega _n(z)}{(z-x_k)\omega _n'(x_k)},\quad \omega _n(z):=(z^2-1)R_{n-1}(z). \end{aligned}$$

Our calculations will make use of the elementary facts that

$$\begin{aligned} T_n(ai)= & {} \frac{(i)^n}{2}\left\{ (a+\sqrt{a^2+1})^n+(a-\sqrt{a^2+1})^n\right\} ,\\ U_n(ai)= & {} \frac{(i)^n}{2\sqrt{a^2+1}}\left\{ (a+\sqrt{a^2+1})^{n+1}-(a-\sqrt{a^2+1})^{n+1}\right\} \end{aligned}$$

so that

$$\begin{aligned} R_{n-1}(ai)= & {} \frac{1}{\sqrt{a^2+1}} \left( \sqrt{a^2+1}(ai)U_{n-2}(ai)+aT_{n-1}(ai)\right) \\= & {} (i)^{n-1}\frac{a}{\sqrt{a^2+1}}(a+\sqrt{a^2+1})^{n-1}. \end{aligned}$$

The endpoints are the easiest case and so we will begin with those. Specifically, for \(k=0,\,x_0=-1,\)

$$\begin{aligned} \ell _0(ai)= & {} \frac{((ai)^2-1)R_{n-1}(ai)}{(ai-(-1))\omega _n'(-1)}\\= & {} \frac{-(a^2+1)R_{n-1}(ai)}{(ai+1)(-2R_{n-1}(-1))}. \end{aligned}$$

Hence

$$\begin{aligned} \mathrm{sgn}(\ell _0(ai))= & {} \mathrm{sgn}(R_{n-1}(ai))\,\mathrm{sgn}(R_{n-1}(-1))\,\mathrm{sgn}\left( \frac{1}{ai+1}\right) \\= & {} (-i)^{n-1}(-1)^{n-1}\frac{ai+1}{\sqrt{a^2+1}}. \end{aligned}$$

On the other hand

$$\begin{aligned} Q_n(-1)= & {} \frac{1}{\sqrt{a^2+1}}\left( -(a(-1)+i)T_{n-1}(-1)+\sqrt{a^2+1}(1-(-1)^2)U_{n-2}(-1)\right) \\= & {} \frac{1}{\sqrt{a^2+1}}(a-i)(-1)^{n-1}\\= & {} -(i)^n\mathrm{sgn}(\ell _0(ai)), \end{aligned}$$

as is easily verified.

The other endpoint \(x_n=+1\) is very similar and so we suppress the details.

Consider now, \(x_k,\) \(1\le k\le (n-1),\) a zero of \(R_{n-1}(x).\) Then

$$\begin{aligned} \ell _k(ai)= & {} \frac{((ai)^2-1)R_{n-1}(ai)}{(ai-x_k)(x_k^2-1)R_{n-1}'(x_k)}\\= & {} \frac{-(a^2+1)R_{n-1}(ai)}{(ai-x_k)(x_k^2-1)(R_{n-1}'(x_k))}. \end{aligned}$$

Hence

$$\begin{aligned} \mathrm{sgn}(\ell _k(ai))= & {} \mathrm{sgn}(R_{n-1}(ai))\,\mathrm{sgn}(R_{n-1}'(x_k))\,\mathrm{sgn}\left( \frac{1}{ai-x_k}\right) \\= & {} (i)^{n-1}(-1)^{k}\frac{ai-x_k}{\sqrt{a^2+x_k^2}} \end{aligned}$$

as \(\mathrm{sgn}(R_{n-1}'(x_k))=(-1)^{n-1-k},\) as is easy to see.

On the other hand, from the formula for \(R_{n-1}(x)\) given in Lemma 4.4, we see that \(R_{n-1}(x_k)=0\) implies that

$$\begin{aligned} T_{n-1}(x_k)=-\frac{\sqrt{a^2+1}}{a}x_kU_{n-2}(x_k). \end{aligned}$$

Substituting this into the formula for \(Q_n\) given in Lemma 4.3 we obtain

$$\begin{aligned} Q_n(x_k)= & {} \left\{ \frac{(ax_k+i)x_k}{a}+(1-x_k^2)\right\} U_{n-2}(x_k)\\= & {} \left( \frac{a+ix_k}{a}\right) U_{n-2}(x_k). \end{aligned}$$

But by the Pell identity of Proposition 4.5, \(|Q_n(x_k)|=1\) and so we must have

$$\begin{aligned} Q_n(x_k)=\frac{a+ix_k}{\sqrt{a^2+x_k^2}}\,\mathrm{sgn}\left( U_{n-2}(x_k)\right) . \end{aligned}$$

But, as the zeros of \(R_{n-1}\) interlace the extreme points of \(T_{n-1},\) i.e., the zeros of \(U_{n-2},\) it is easy to check that \(\mathrm{sgn}(U_{n-2}(x_k))=(-1)^{n-1-k}.\) In other words,

$$\begin{aligned} Q_n(x_k)=(-1)^{n-1-k}\frac{a+ix_k}{\sqrt{a^2+x_k^2}} \end{aligned}$$

which is easily verified to equal \(-(i)^n\mathrm{sgn}(\ell _k(ai)),\) as claimed. \(\square \)

From the recursion formula for \(Q_n(z)\) it is easy to see that

$$\begin{aligned} Q_n(ai)=-(i)^n\sqrt{a^2+1}(a+\sqrt{a^2+1})^{n-1}. \end{aligned}$$

Hence we have

Proposition 4.9

For \(n=1,2,\ldots \)

$$\begin{aligned} \max _{p\in \mathbb {C}_n[z],\,\Vert p\Vert _{[-1,1]}\le 1}|p(ai)|=\sqrt{a^2+1}(|a|+\sqrt{a^2+1})^{n-1} \end{aligned}$$

and this maximum value is attained by \(Q_n(z)\) (for \(a>0\)).

It is worth noting that the extremal polynomial and optimal prediction measure, unlike the real case, depend on the exterior point \(z_0.\) Moreover, this extreme value is rather larger than \(|T_n(ai)|.\) Indeed it is easy to show that

$$\begin{aligned} \sqrt{a^2+1}(|a|+\sqrt{a^2+1})^{n-1}-|T_n(ai)|=(\sqrt{a^2+1}-|a|)|T_{n-1}(ai)|. \end{aligned}$$

One may of course wonder if there are similar formulas for general points \(z_0\in \mathbb {C}\backslash [-1,1]\) (not just \(z_0=ai\)). However numerical experiments seem to indicate that in general there is no three-term recurrence for the extremal polynomials.

Note added in proof It has recently come to our attention that the general extremal values problem has been studied by Yuditskii [8].