1 Introduction

The theory of (truncated) moment sequences is a field of diverse applications and connections to numerous other mathematical fields, see e.g. [1, 22, 29,30,31, 33, 34, 36, 38, 48, 50, 52], and references therein. For more on recent advances in the reconstruction of measures from moments see e.g. [6, 10, 11, 14, 20, 21, 23, 26, 35, 40, 41], and references therein.

A crucial fact in the theory of truncated moment sequences is the Richter (Richter–Rogosinski–Rosenbloom) Theorem [43,44,45] which states that every truncated moment sequence is a convex combination of finitely many Dirac measures, see also Theorem 2.2. The Carathéodory number is the minimal number N such that every truncated moment sequence (with fixed truncation) is a sum of N atoms, i.e., Dirac measures. It has been studied in several contexts but in most cases the precise value of the Carathéodory number is not known [15, 16, 32, 39, 42, 43, 46, 53].

In this work we proceed the study of Carathéodory numbers. We treat moment sequences with small gaps (see Sect. 3), moment sequences of measures supported on algebraic varieties (Sect. 4), and the multidimensional polynomial case on \(\mathbb {R}^n\) and \([0,1]^n\) (Sect. 5). For moment functionals with small gaps we find explicit lower and upper bounds for dimension \(n=1\) based on Descartes’ rule of signs, see Theorem 3.7. For moment functionals \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) on polynomial functions on an algebraic set \(\mathcal {X}\subset \mathbb {R}^n\) and for sufficiently large d, Theorem 4.5 yields an upper bound of \(P(2d)-1\) and a lower bound of

$$\begin{aligned} P(2d)-k\cdot P(d)+\left( {\begin{array}{c}k\\ 2\end{array}}\right) , \end{aligned}$$

where P is the Hilbert polynomial and k the dimension of \(\mathcal {X}\). In the case \(\mathcal {X}=\mathbb {R}^n\) and \(L:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\rightarrow \mathbb {R}\), this gives the lower bound

$$\begin{aligned} \begin{pmatrix} n+2d\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d\\ n\end{pmatrix} + \begin{pmatrix} n\\ 2\end{pmatrix} \end{aligned}$$

(Theorem 5.2). We obtain similar bounds for odd degrees and the case \(\mathcal {X}=[0,1]^n\) in Sect. 5. In Sect. 6 we discuss implications of these bounds, when \(n\rightarrow \infty \) and \(d\rightarrow \infty \). We show that there are moment functionals \(L:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\rightarrow \mathbb {R}\) that behave as bad as possible under flat extensions, see Theorem 6.2 for the precise statement. For literature on flat extensions in this context see [8, 9, 36, 48] and the references therein.

2 Preliminaries

2.1 Truncated moment problem

Let \(\mathcal {A}\) be a (finite dimensional) real vector space of measurable functions on a measurable space \((\mathcal {X},\mathfrak {A})\). Denote by \(L:\mathcal {A}\rightarrow \mathbb {R}\) a continuous linear functional. If there is a (positive) measure \(\mu \) on \((\mathcal {X},\mathfrak {A})\) such that

$$\begin{aligned} L(a) = \int _\mathcal {X}a(x)~\mathrm {d}\mu (x),\ \text {for all}\ a\in \mathcal {A}, \end{aligned}$$
(1)

then L is called a moment functional. If \(\mathcal {A}\) is finite dimensional, it is a truncated moment functional. By \(\mathsf {A}= \{a_1,\dots ,a_m\}\) we denote a basis of the m-dimensional real vector space \(\mathcal {A}\) and by

$$\begin{aligned} s_i := L(a_i) \end{aligned}$$

the \(a_i\)-th (or simply i-th) moment of L (or \(\mu \) for a \(\mu \) as in (1)). Given a sequence \(s = (s_1,\dots ,s_m)\in \mathbb {R}^m\) we define the Riesz functional \(L_s\) by setting \(L_s(a_i) = s_i\) for all \(i=1,\dots ,m\) and extending it linearly to \(\mathcal {A}\), i.e., the Riesz functional induces a bijection between moment sequences \(s=(s_1,\dots ,s_m)\) and moment functionals \(L = L_s\). By \(\mathfrak {M}_\mathsf {A}\) we denote the set of all measures on \((\mathcal {X},\mathfrak {A})\) such that all \(a\in \mathcal {A}\) are integrable and by \(\mathfrak {M}_\mathsf {A}(s)\) or \(\mathfrak {M}_\mathsf {A}(L)\) we denote all representing measures of the moment sequence s resp. moment functional L. Even though moment sequences and moment functionals are the same, when we apply techniques from algebraic geometry it is easier to work with moment functionals \(L:\mathcal {A}\rightarrow \mathbb {R}\) on e.g. \(\mathcal {A}= \mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\) or \(\mathbb {R}[\mathcal {X}]_{\le 2d}\) while when we work with Hankel matrices it is easier to work with moment sequences s in a fixed basis \(\mathsf {A}\) of \(\mathcal {A}\). Since the polynomials \(\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\) are of special importance, we denote by

$$\begin{aligned} \mathsf {A}_{n,d}:=\{x^\alpha \,|\,\alpha \in \mathbb {N}_0^n \;\wedge \;|\alpha |=\alpha _1 + \dots + \alpha _m \le d\} \end{aligned}$$

the monomial basis, where we have \(x^\alpha = x_1^{\alpha _1}\cdots x_n^{\alpha _n}\) with \(\alpha =(\alpha _1,\dots ,\alpha _n)\in \mathbb {N}_0^n\). On \(\mathbb {N}_0^n\) we work with the partial order \(\alpha =(\alpha _1,\dots ,\alpha _n) \le \beta = (\beta _1,\dots ,\beta _n)\) if \(\alpha _i \le \beta _i\) for all \(i=1,\dots ,n\).

Definition 2.1

Let \(\mathsf {A}= \{a_1,\dots ,a_m\}\) be a basis of the finite dimensional vector space \(\mathcal {A}\) of measurable functions on the measurable space \((\mathcal {X},\mathfrak {A})\). We define \(s_\mathsf {A}\) by

$$\begin{aligned} s_\mathsf {A}: \mathcal {X}\rightarrow \mathbb {R}^m,\quad x\mapsto s_\mathsf {A}(x):= \begin{pmatrix}a_1(x)\\ \vdots \\ a_m(x)\end{pmatrix}. \end{aligned}$$

Of course, \(s_\mathsf {A}(x)\) is the moment sequence of the Dirac \(\delta _x\) measure and the corresponding moment functional is the point evaluation \(l_x\) with \(l_x(a) := a(x)\). By a measure we always mean a positive measure unless it is explicitly denoted as a signed measure.

The fundamental theorem in the theory of truncated moments is the following.

Theorem 2.2

(Richter Theorem [43, Satz 11]) Let \(\mathsf {A}= \{a_1,\dots ,a_m\}\), \(m\in \mathbb {N}\), be finitely many measurable functions on a measurable space \((\mathcal {X},\mathfrak {A})\). Then every moment sequence \(s\in \mathcal {S}_\mathsf {A}\) has a k-atomic representing measure

$$\begin{aligned} s = \sum _{i=1}^k c_i\cdot s_\mathsf {A}(x_i) \end{aligned}$$

with \(k\le m\), \(c_1,\dots ,c_k>0\), and \(x_1,\dots ,x_k\in \mathcal {X}\).

The theorem can also be called Richter–Rogosinski–Rosenbloom Theorem [43,44,45], see the discussion after Example 20 in [15] for more details. That every truncated moment sequence has a k-atomic representing measure ensures that the Carathéodory number \(\mathcal {C}_\mathsf {A}\) is well-defined.

Definition 2.3

Let \(\mathsf {A}= \{a_1,\dots ,a_m\}\) be linearly independent measurable functions on a measurable space \((\mathcal {X},\mathfrak {A})\). For \(s\in \mathcal {S}_\mathsf {A}\) we define the Carathéodory number \(\mathcal {C}_\mathsf {A}(s)\) of s by

$$\begin{aligned} \mathcal {C}_\mathsf {A}(s) := \min \{k\in \mathbb {N}_0 \,|\, \exists \mu \in \mathfrak {M}_\mathsf {A}(s)\ k\text {-atomic}\}. \end{aligned}$$

We define the Carathéodory number \(\mathcal {C}_\mathsf {A}\) of \(\mathcal {S}_\mathsf {A}\) by

$$\begin{aligned} \mathcal {C}_\mathsf {A}:= \max _{s\in \mathcal {S}_\mathsf {A}} \mathcal {C}_\mathsf {A}(s). \end{aligned}$$

The same definition holds for moment functionals \(L:\mathcal {A}\rightarrow \mathbb {R}\).

The following theorem turns out to be a convenient tool for proving lower bounds on the Carathéodory number \(\mathcal {C}_\mathsf {A}\).

Theorem 2.4

([16, Thm. 18]) Let \(\mathsf {A}= \{a_1,\dots ,a_m\}\) be measurable functions on a measurable space \((\mathcal {X},\mathfrak {A})\), \(s\in \mathcal {S}_\mathsf {A}\), and \(a\in \mathcal {A}\) with \(a\ge 0\) on \(\mathcal {X}\), \(\mathcal {Z}(a) = \{x_1,\dots ,x_k\}\) and \(L_s(a) = 0\). Then

$$\begin{aligned} \mathcal {C}_\mathsf {A}\quad \ge \quad \mathcal {C}_\mathsf {A}(s) \quad =\quad \dim \mathrm {lin}\,\{s_\mathsf {A}(x_i) \,|\, i=1,\dots ,k\}. \end{aligned}$$

Remark 2.5

Note that in Theorem 2.4 it is crucial that the zero set of a is finite: Take \(a=0\) and \(\mathcal {X}=\mathbb {R}^n\) for a simple example where the statement fails when the zero set is not finite.

It is well-known that in general not every sequence \(s\in \mathbb {R}^m\) or linear functional \(L:\mathcal {A}\rightarrow \mathbb {R}\) has a positive representing measure. But of course it always has a signed k-atomic representing measure with \(k\le m\).

Lemma 2.6

([15, Prop. 12]) Let \(\mathsf {A}= \{a_1,\dots ,a_m\}\) be a basis of the finite dimensional space \(\mathcal {A}\) of measurable functions on a measurable space \((\mathcal {X},\mathfrak {A})\). There exist points \(x_1,\dots ,x_m\in \mathcal {X}\) such that every vector \(s\in \mathbb {R}^m\) has a signed k-atomic representing measure \(\mu \) with \(k\le m\) and all atoms are from \(\{x_1,\dots ,x_m\}\), i.e., every functional \(L:\mathcal {A}\rightarrow \mathbb {R}\) is the linear combination \(L = c_1 l_{x_1} + \cdots + c_m l_{x_m}\), \(c_i\in \mathbb {R}\).

It is well-known that in dimension \(n=1\) the atom positions \(x_i\) of a moment sequence can be calculated from the generalized eigenvalue problem, see e.g. [24]. To formulate this and other results we introduce the following shift.

Definition 2.7

Let \(n,d\in \mathbb {N}\) and \(s = (s_\alpha )_{\alpha \in \mathbb {N}_0^n:|\alpha |\le d}\). For \(\beta \in \mathbb {N}_0^n\) with \(|\beta |\le d\) we define \(M_\beta s := (M_\beta s_\alpha )_{\alpha \in \mathbb {N}_0^n:|\alpha +\beta |\le d}\) by \(M_\beta s_\alpha := s_{\alpha +\beta }\), i.e., \((M_\beta L)(p) = L(x^\beta \cdot p)\).

For a space \(\mathcal {A}\) of measurable functions with basis \(\mathsf {A}= \{a_1,a_2\dots \}\) the Hankel matrix \(\mathcal {H}_d(L)\) of a linear functional \(L:\mathcal {A}^2\rightarrow \mathbb {R}\) is given by \(\mathcal {H}_d(L) = (L(a_i a_j))_{i,j=1}^d\). The atom positions of a truncated moment sequence s (resp. moment functional L) are then determined by the following result from a generalized eigenvalue problem.

Lemma 2.8

Let \(n,d\in \mathbb {N}\), \(\mathcal {X}= \mathbb {C}\), and \(s = (s_0,s_1,\dots ,s_{2d+1})\in \mathbb {R}^{2d+2}\) with

$$\begin{aligned} s = \sum _{i=1}^k c_i\cdot s_{\mathsf {A}_{1,2d+1}}(z_i) \end{aligned}$$

for some \(z_i\in \mathbb {C}\), \(c_i\in \mathbb {C}\), and \(k\le d\). Then the \(z_i\) are unique and are the eigenvalues of the generalized eigenvalue problem

$$\begin{aligned} \mathcal {H}_{d}(M_1 s)v_i = z_i \mathcal {H}_{d}(s)v_i. \end{aligned}$$
(2)

Proof

That the \(z_i\) are the eigenvalues of (2) and therefore uniqueness follows from

$$\begin{aligned} \mathcal {H}_{d}(s) = (s_{\mathsf {A}_{1,d}}(z_1),\dots ,s_{\mathsf {A}_{1,d}}(z_k))\cdot \mathrm {diag}\,(c_1,\dots ,c_k)\cdot (s_{\mathsf {A}_{1,d}}(z_1),\dots ,s_{\mathsf {A}_{1,d}}(z_k))^T \end{aligned}$$

and

$$\begin{aligned}&\mathcal {H}_{d}(M_1 s) \\&=(s_{\mathsf {A}_{1,d}}(z_1),\dots ,s_{\mathsf {A}_{1,d}}(z_k))\cdot \mathrm {diag}\,(c_1 z_1,\dots ,c_k z_k)\cdot (s_{\mathsf {A}_{1,d}}(z_1),\dots ,s_{\mathsf {A}_{1,d}}(z_k))^T. \end{aligned}$$

\(\square \)

We gave here only the 1-dimensional formulation, but a similar result holds also for \(n>1\). But as seen from the Carathéodory number and the flat extension in Sects. 5 and 6, the size of the Hankel matrix of the flat extension can be very large. For numerical reasons it is therefore advisable to reduce n-dimensional problems to 1-dimensional problems.

2.2 Algebraic geometry

Consider the polynomial ring \(\mathbb {R}[x_0,\ldots ,x_n]\) with the natural grading and let \(I\subset \mathbb {R}[x_0,\ldots ,x_n]\) be a homogeneous ideal. Let

$$\begin{aligned} R=\mathbb {R}[x_0,\ldots ,x_n]/I \end{aligned}$$

be the quotient ring which is a graded ring itself. Recall that the Hilbert function of R is given by \(HF_R(d)=\dim R_d\) where \(R_d\) is the degree d part of R. For d large enough one has \(HF_R(d)=HP_R(d)\) for some polynomial \(HP_R\) of degree k which is called the Hilbert polynomial of R.

In this article, we will always denote by \(\mathbb {P}^n=\mathbb {P}_{\mathbb {C}}^n\) the complex projective space. A real projective variety is the zero set \(V\subset \mathbb {P}^n\) of some homogeneous ideal \(I\subset \mathbb {R}[x_0,\ldots ,x_n]\). In particular, a real projective variety can contain nonreal points but it is defined by real polynomial equations. We will denote by \(V(\mathbb {R})\) the set of real points of V. The Zariski closure of any subset \(W\subset \mathbb {P}^n\), that consists only of real points, is an example for a real projective variety V with the additional property that \(V(\mathbb {R})\) is Zariski in V. If \(V\subset \mathbb {P}^n\) is a real projective variety and I is its homogeneous vanishing ideal, then the Hilbert function/polynomial \(HF_V\) resp. \(HP_V\) of V is the Hilbert function/polynomial of \(\mathbb {R}[x_0,\ldots ,x_n]/I\). In this case, the leading coefficient of \(HP_V\) is \(\frac{e}{k!}\) where e is the degree of V.

Now we consider the dehomogenization map

$$\begin{aligned} \mathbb {R}[x_0,\ldots ,x_n]\rightarrow \mathbb {R}[x_1,\ldots ,x_n],\quad f\mapsto f|_{x_0=1}. \end{aligned}$$

Let \(I\subset \mathbb {R}[x_1,\ldots ,x_n]\) be an ideal and \(I^h\subset \mathbb {R}[x_0,\ldots ,x_n]\) the homogenization of I, i.e., the ideal generated by the homogenizations \(f^h\) of all \(f\in I\). Then the dehomogenization map induces an isomorphism of vector spaces

$$\begin{aligned} (\mathbb {R}[x_0,\ldots ,x_n]/I^h)_d \quad \rightarrow \quad (\mathbb {R}[x_1,\ldots ,x_n]/I)_{\le d} \end{aligned}$$

for all \(d\ge 0\). Here \((\mathbb {R}[x_1,\ldots ,x_n]/I)_{\le d}\) is the subspace of \(\mathbb {R}[x_1,\ldots ,x_n]/I\) consisting of the residue classes of polynomials of degree at most d. The main application of this observation will be the case when I is the vanishing ideal of finitely many points \(\Gamma \) in \(\mathbb {R}^n\). In this case the dimension

$$\begin{aligned} \dim \mathrm {lin}\,\{s_{\mathsf {A}_{n,d}}(x) \,|\, x\in \Gamma \} \end{aligned}$$

of the span of the point evaluations \(s_{\mathsf {A}_{n,d}}(x)\) in \(\mathbb {R}[x_1,\ldots ,x_n]_{\le d}^*\) at the points from \(\Gamma \) needed Theorem 2.4 is

$$\begin{aligned} \dim (\mathbb {R}[x_1,\ldots ,x_n]/I)_{\le d} \quad =\quad \dim (\mathbb {R}[x_0,\ldots ,x_n]/I^h)_d \quad =\quad HF_I(d). \end{aligned}$$

The Hilbert function \(HF_I\) of an ideal I can be easily calculated if it is generated by a regular sequence.

Definition 2.9

Let A be a commutative ring. A sequence \(f_1,\ldots ,f_r\in A\) is a regular sequence if for all \(i=1,\ldots ,r\) the residue class of \(f_i\) is not a zero divisor in \(A/(f_1,\ldots ,f_{i-1})\).

The following is a consequence of Krull’s Principal Ideal Theorem. We include a proof since we are not aware of a good reference.

Lemma 2.10

Let \(I\subset \mathbb {R}[x_0,\ldots ,x_n]\) be a homogeneous radical ideal and \(V\subset \mathbb {P}^n\) its zero set. If each irreducible component of V has the same dimension \(d\ge 1\), then for any homogeneous \(f\in \mathbb {R}[x_0,\ldots ,x_n]\) the following are equivalent:

  1. (i)

    f is not a zero divisor in \(\mathbb {R}[x_0,\ldots ,x_n]/I\).

  2. (ii)

    f is not in a minimal prime ideal of \(\mathbb {R}[x_0,\ldots ,x_n]/I\).

  3. (iii)

    f is not identically zero on an irreducible component of V.

  4. (iv)

    Each irreducible component of \(V\cap \mathcal {V}(f)\) has dimension at most \(d-1\).

  5. (v)

    Each irreducible component of \(V\cap \mathcal {V}(f)\) has dimension \(d-1\).

Furthermore, if f is not constant and V nonempty, then \(V\cap \mathcal {V}(f)\) is nonempty.

Proof

The minimal prime ideals of the homogeneous coordinate ring

$$\begin{aligned} A=\mathbb {R}[x_0,\ldots ,x_n]/I \end{aligned}$$

of V are exactly the vanishing ideals of irreducible components of V. Thus we have \((ii)\Leftrightarrow (iii)\). If f is a zero divisor in A, then there is a nonzero \(g\in A\) such that \(fg=0\). Let \(V_i\) be an irreducible component of V on which g does not vanish identically. Then \(V_i\subset \mathcal {V}(f)\cup \mathcal {V}(g)\) implies \(V_i\subset \mathcal {V}(f)\) because \(V_i\) is irreducible. Thus (iii) implies (i). By [3, p. 44, Ex. 9] every minimal prime ideal contains only zero divisors. This shows \((i)\Rightarrow (ii)\). If f vanishes entirely on an irreducible component \(V_i\) of V, then \(V_i\) is an irreducible component of \(V\cap \mathcal {V}(f)\). By assumption we have \(\dim (V_i)=d\), so we cannot have (iv). Thus \((iv)\Rightarrow (iii)\).

Since (v) clearly implies (iv), it remains to show (v) under the assumption of \((i)-(iii)\). If f is a unit in A, then \(V\cap \mathcal {V}(f)=\emptyset \) and (v) is trivially true as there are no irreducible components. Thus we can assume that f is neither a zero divisor nor a unit in A. Thus by Krull’s Principal Ideal Theorem [3, Cor. 11.17] every minimal prime ideal over \((f)\subset A\) has height one. This implies that every irreducible component W of \(V\cap \mathcal {V}(f)\) has codimension one. Since V is of pure dimension d, this means that the dimension of W is \(d-1\). The additional statement follows for example by [49, Cor. 1.7] because we have \(\dim (V)>0\). \(\square \)

Corollary 2.11

Let \(I_0\subset \mathbb {R}[x_0,\ldots ,x_n]\) be a homogeneous prime ideal such that \(\dim \mathcal {V}(I_0)\ge k\). Let \(f_1,\ldots ,f_k\in \mathbb {R}[x_0,\ldots ,x_n]\) homogeneous elements of positive degree such that for all \(i=1,\ldots ,k\) we have:

  1. (i)

    \(I_i:=I+(f_1,\ldots ,f_i)\) is radical.

  2. (ii)

    \(\dim \mathcal {V}(I_i)=\dim \mathcal {V}(I_{i-1})-1\).

Then \(f_1,\ldots ,f_k\) is a regular sequence modulo \(I_0\).

Proof

For \(i=0,\ldots ,k\) let \(V_i=\mathcal {V}(I_i)\subset \mathbb {P}^n\) and let \(d=\dim (V_0)\). First we show that each irreducible component of \(V_i\) has dimension \(d-i\) by induction on i. The claim is clear for \(i=0\) because I is a prime ideal. Assume the claim is true for \(0\le i<k\). Then we can apply Lemma 2.10 to the ideal \(I_i\). By assumption we have \(\dim \mathcal {V}(I_{i+1})=\dim \mathcal {V}(I_{i})-1=d-i-1\) so we have (iv). Thus we also have (v) which says that each irreducible component of \(V_{i+1}\) has dimension \(d-i-1\). Then by the same lemma we also have that \(f_{i+1}\) is not a zero divisor modulo \(I_i\) which shows that \(f_1,\ldots ,f_k\) is a regular sequence modulo \(I_0\).

Lemma 2.12

Let \(I\subset \mathbb {R}[x_0,\ldots ,x_n]\) be a homogeneous ideal and \(R=\mathbb {R}[x_0,\ldots ,x_n]/I\) with Hilbert function \(HF_R\). Let \(f_1,\ldots , f_r\in R\) be a regular sequence of homogeneous elements of degree d. The Hilbert function \(HF_{R/(f_1,\ldots ,f_r)}\) of \(R/(f_1,\ldots ,f_r)\) is

$$\begin{aligned} HF_{R/(f_1,\ldots ,f_r)}(j) \quad =\quad \sum _{i=0}^r (-1)^i\cdot \left( {\begin{array}{c}r\\ i\end{array}}\right) \cdot HF_{R}(j-id). \end{aligned}$$

Proof

We prove the statement by induction on r. The case \(r=0\) is trivial. In order to prove the induction step, let \(R^i = R/(f_1,\ldots ,f_i)\) for \(i=0,\ldots ,r\). For all \(j\in \mathbb {Z}\) we have the exact sequence

$$\begin{aligned} 0\rightarrow R^{r-1}_{j-d}\rightarrow R^{r-1}_{j}\rightarrow R^r_j\rightarrow 0 \end{aligned}$$

where the first map is given by multiplication with \(f_r\). Therefore

$$\begin{aligned} HF_{R^r}(j)=HF_{R^{r-1}}(j)-HF_{R^{r-1}}(j-d). \end{aligned}$$

By induction hypothesis this implies that

$$\begin{aligned} HF_{R^{r}}(j)&= \sum _{i=0}^{r-1}(-1)^i\left( {\begin{array}{c}r-1\\ i\end{array}}\right) HF_{R}(j-i\cdot d)\\&\quad -\sum _{i=0}^{r-1}(-1)^i\left( {\begin{array}{c}r-1\\ i\end{array}}\right) HF_{R}(j-(i+1)d)\\&=\sum _{i=0}^{r}(-1)^i(\left( {\begin{array}{c}r-1\\ i\end{array}}\right) + \left( {\begin{array}{c}r-1\\ i-1\end{array}}\right) )HF_{R}(j-i\cdot d)\\&=\sum _{i=0}^r(-1)^i\left( {\begin{array}{c}r\\ i\end{array}}\right) HF_{R}(j-i\cdot d). \end{aligned}$$

\(\square \)

At various places we will make use of the following version of Bertini’s Theorem.

Theorem 2.13

Let \(\mathcal {X}\subset \mathbb {P}^n\) be a real projective variety of dimension k. Then the following statements hold for generic homogeneous forms \(f_1,\ldots ,f_r\in \mathbb {R}[x_0,\ldots ,x_n]\), \(r\le k\), of degree \(d>0\) in the sense that the set of exceptions is contained in a lower dimensional algebraic subset of \(\mathbb {R}[x_0,\ldots ,x_n]_d^r\).

  1. i)

    The homogeneous vanishing ideal of \(\mathcal {X}\cap \mathcal {V}(f_1,\ldots ,f_r)\) is generated by the homogeneous vanishing ideal of \(\mathcal {X}\) and \(f_1,\ldots ,f_r\).

  2. ii)

    If \(\mathcal {X}\) is irreducible and \(r<k\), then \(\mathcal {X}\cap \mathcal {V}(f_1,\ldots ,f_r)\) is irreducible as well.

  3. iii)

    We have \(\dim (\mathcal {X}\cap \mathcal {V}(f_1,\ldots ,f_r))=k-r\).

  4. iv)

    If the singular locus of \(\mathcal {X}\) has dimension at most \(r-1\), then \(\mathcal {X}\cap \mathcal {V}(f_1,\ldots ,f_r)\) is smooth.

Proof

Bertini’s Theorem in its usual formulation says that the above listed statements hold for generic homogeneous forms \(f_1,\ldots ,f_r\in \mathbb {C}[x_0,\ldots ,x_n]\), \(r\le k\), of degree \(d>0\). As a reference for this see for example [28, Thm. 6.10, Cor. 6.11]. This means that the set U of exceptions is contained in a lower dimensional algebraic subset \(W\subset \mathbb {C}[x_0,\ldots ,x_n]_d^r\). The set \(U'\) of tuples \((f_1,\ldots ,f_r)\in \mathbb {R}[x_0,\ldots ,x_n]_d^r\) of real polynomials for which one of our statements does not hold is thus contained in the algebraic subset \(W'=W\cap \mathbb {R}[x_0,\ldots ,x_n]_d^r\) of \(\mathbb {R}[x_0,\ldots ,x_n]_d^r\). Since the set of real points \(\mathbb {R}[x_0,\ldots ,x_n]_d^r\) is Zariski dense in the vector space \(\mathbb {C}[x_0,\ldots ,x_n]_d^r\), we see that W does not contain \(\mathbb {R}[x_0,\ldots ,x_n]_d^r\). Thus \(W'\) is a strict algebraic subset of \(\mathbb {R}[x_0,\ldots ,x_n]_d^r\). This shows the claim. \(\square \)

For more on Hilbert functions and polynomials see e.g. [51], or standard text books on commutative algebra like [18, 19], or [5].

3 Carathéodory numbers for moment sequences with small gaps

We want to start our investigation of the Carathéodory number in the 1-dimensional case with gaps, i.e., not all monomials are present.

Let \(d_1,\dots ,d_r\in \mathbb {N}\) be some natural numbers whose greatest common divisor is one. We consider the subring \(R=\mathbb {R}[t^{d_1},\ldots ,t^{d_r}]\) of \(\mathbb {R}[t]\). By \(R_{\le d}\) we denote the vector space of polynomials in R of degree at most d. By the assumption on the greatest common divisor there is a constant c such that \(t^d\in R\) for all \(d\ge c\). We choose c minimal with this property and denote it by \(\mathfrak {c}\). We observe that one has

$$\begin{aligned} \dim R_{\le d}=d+1-g \quad \text {for}\quad d\ge \mathfrak {c}\end{aligned}$$

where \(\mathfrak {c}+1-{g}\) is the number of monomials in R of degree at most \(\mathfrak {c}\). In other words, g is the number of monomials that are not in R (i.e., the number of gaps).

Definition 3.1

The k-th Descartes number \(D_k\) of R is the maximal number of different real zeros that a polynomial \(f\in R_{\le k}\) can have.

Recall that Descartes’ rule of signs says that the number of positive real zeros (counted with multiplicities) of a polynomial \(f=\sum _{k=0}^n c_k t^k\) is bounded from above by the number \(\text {Var}(c_0,\ldots ,c_n)\) of sign changes in the sequence \(c_0,\ldots ,c_n\) after erasing all zeros. The number of negative zeros (again counted with multiplicities) of f is then bounded by \(\text {Var}(c_0,-c_1,\ldots ,(-1)^nc_n)\). Conversely, Grabiner [25] constructed for all sequences of signs \((\sigma _0,\ldots ,\sigma _n)\), \(\sigma _i\in \{0,\pm 1\}\), a polynomial \(f=\sum _{k=0}^n c_k t^k\) with only simple positive and negative zeros and \(\text {sgn}(c_i)=\sigma _i\), that realizes both bounds. Thus Descartes’ rule of signs gives a purely combinatorial way to determine an upper bound on the k-th Descartes number from the numbers \(d_1,\ldots ,d_r\). This also shows that \(D_k\) is the maximal number of different real zeros that a polynomial \(f\in R_{\le k}\) with \(f(0)\ne 0\) can have since adding a small constant of appropriate sign does not decrease the number of real zeros of a polynomial whose only possibly multiple real root is 0.

Example 3.2

  1. (a)

    Let \(R=\mathbb {R}[t^4,t^6,t^7]\). Then the Descartes number \(D_7\) is the maximal number of real roots that a polynomial of the form \(a+bt^4+ct^6+dt^7\) can have. By trying out all possible signs on the coefficients, we find by Descartes’ rule of signs that such a polynomial can have at most five real zeros and by [25] there actually is such a polynomial. Thus \(D_7=5\).

  2. (b)

    The Descartes number does not only depend on the number of involved monomials but also on their parities. For example if \(R=\mathbb {R}[t^5,t^6,t^9]\), then \(D_9=3\).

Proposition 3.3

For all \(k\ge 0\) we have \(D_{\mathfrak {c}+k}=D_{\mathfrak {c}}+k\).

Proof

We prove the claim by induction on k. The case \(k=0\) is trivial. Let \(k\ge 1\) and assume that the claim is true for \(k-1\). Then there is a sequence of signs \((\sigma _0,\ldots ,\sigma _{\mathfrak {c}+k-1})\), \(\sigma _i\in \{0,\pm 1\}\), \(\sigma _0\ne 0\), of coefficients of a polynomial in R with \(D_{\mathfrak {c}}+k-1\) different real zeros. In particular,

$$\begin{aligned} {\text {Var}}(\sigma _0,\ldots ,\sigma _{\mathfrak {c}+k-1}) +{\text {Var}}(\sigma _0,\ldots ,(-1)^{\mathfrak {c}+k-1}\sigma _{\mathfrak {c}+k-1})=D_{\mathfrak {c}}+k-1. \end{aligned}$$

Letting \(\sigma _{\mathfrak {c}+k}=-\sigma _{\mathfrak {c}+k-1}\) we get that

$$\begin{aligned} \text {Var}(\sigma _0,\dots ,\sigma _{\mathfrak {c}+k-1}, \sigma _{\mathfrak {c}+k})+\text {Var}(\sigma _0,\ldots ,(-1)^{\mathfrak {c}+k-1} \sigma _{\mathfrak {c}+k-1},(-1)^{\mathfrak {c}+k}\sigma _{\mathfrak {c}+k})=D_\mathfrak {c}+k \end{aligned}$$

and another choice of \(\sigma _{\mathfrak {c}+k}\) would not result in a larger sum, so \(D_{\mathfrak {c}+k}=D_\mathfrak {c}+k\). \(\square \)

Proposition 3.4

Let \(k\ge \mathfrak {c}\). The maximal number of real zeros that a nonnegative polynomial \(f\in R_{\le 2k}\) can have is between \(k-(\mathfrak {c}-D_\mathfrak {c})\) and \(k-\left\lceil {\frac{\mathfrak {c}-D_\mathfrak {c}-1}{2}}\right\rceil \). Here the lower bound is realized by a polynomial that is the square of an element of R.

Proof

For the lower bound just take the square of a polynomial of degree k with \(D_k=D_\mathfrak {c}+k-\mathfrak {c}\) real zeros. On the other hand, if \(f\in R\) is a nonnegative polynomial with N real zeros, then \(tf'\in R\) has at least \(2N-1\) zeros. Therefore, \(2N-1\le D_{2k}\) implies

$$\begin{aligned} N\le \left\lfloor {\frac{D_{2k}+1}{2}} \right\rfloor =\left\lfloor {\frac{D_{\mathfrak {c}}+2k-\mathfrak {c}+1}{2}} \right\rfloor =k-\left\lceil {\frac{\mathfrak {c}-D_{\mathfrak {c}}-1}{2}} \right\rceil . \end{aligned}$$

\(\square \)

Lemma 3.5

The point evaluations \(l_{p_1},\ldots ,l_{p_n}: R_{\le e}\rightarrow \mathbb {R}\) are linearly independent for any pairwise distinct points \(p_1,\ldots ,p_n\in \mathbb {R}\) and \(e\ge \mathfrak {c}+n-1\).

Proof

We consider the map \(\psi :R_{\le e}\rightarrow \mathbb {R}^n,\, g\mapsto (g(p_i))_{1\le i\le n}\). The polynomial \(t^\mathfrak {c}\prod _{i=1,i\ne j}^n(t-p_i)\) is mapped to a nonzero multiple of the j-th unit vector except for the case when \(p_j=0\). Thus we have at least all unit vectors but one in the image and the constant polynomial 1 is mapped to the vector \((1,\dots ,1)\). Thus \(\psi \) is surjective which implies the claim. \(\square \)

The following lemma generalizes [12, Thm. 3.68] and [16, Thm. 45].

Lemma 3.6

Let \(\mathcal {A}\subset \mathbb {R}[x]\) be the vector space of polynomials on \(\mathbb {R}\) generated by the monomials \(\mathsf {A}= \{x^{d_1}, x^{d_2},\dots , x^{d_m}\}\), \(m,d_i\in \mathbb {N}\), such that \(d_1 = 0< d_2< \dots < d_m\) and \(d_m\) is even. If all non-negative polynomials in \(\mathcal {A}\) have at most C zeros, then

$$\begin{aligned} \mathcal {C}_\mathsf {A}\le C + 1. \end{aligned}$$

Proof

Let \(s\in \mathcal {S}_\mathsf {A}\) be a moment sequence.

Step i): If s is in the boundary of the moment cone, there exists a \(p\in \mathcal {A}\) with \(p\ge 0\) and \(L_s(p)=0\), i.e., all point evaluations are located at the zeros of p. Hence, s requires at most C point evaluations.

Step ii): Assume now s is in the interior of the moment cone.

Homogenize \(\mathsf {A}\), i.e., \(\mathsf {B}:= \{y^{d_m},x^{d_2} y^{d_m-d_2},\dots ,x^{d_m}\}\). Since s is a moment sequence, we have \(s = \sum _{i=1}^l c_i\cdot s_\mathsf {A}(x_i) = \sum _{i=1}^l c_i\cdot s_\mathsf {B}((x_i,1))\). Since \(x^{d_m},y^{d_m}\in \mathsf {B}\), we have \(x^{d_m}+y^{d_m}>0\) on \(\mathbb {P}^1\) and the moment cone \(\mathcal {S}_\mathsf {B}\) is closed. Hence, by [16, Prop. 8] there exists an \(\varepsilon >0\) such that

figure a

is attained and continuous for all \(q\in B_\varepsilon (s)\), \((x,y)\in \mathbb {P}^1\), we have \(B_\varepsilon (s)\subset \mathrm {int}\,\mathcal {S}\), and

figure b

Let

$$\begin{aligned} T := \bigcup _{c\in [0,\Gamma +1]} \overline{B_\varepsilon (s-c\cdot s_\mathsf {B}(0,1))} \end{aligned}$$

be the \(\varepsilon \)-tube around the line \(s - [0,\Gamma +1]\cdot s_\mathsf {B}((0,1))\). Write \(T = T_1 \cup T_2 \cup T_3\) with \(T_1 := T\cap \mathrm {int}\,\mathcal {S}_\mathsf {B}\), \(T_2 := T\cap \partial \mathcal {S}_\mathsf {B}\), and \(T_3 := T\setminus (T_1\cup T_2)\). I.e., \(T_1\) is the part of the \(\varepsilon \)-tube inside the moment cone, \(T_2\) is the intersection of the \(\varepsilon \)-tube with the boundary of the moment cone, and \(T_3\) is the part of the \(\varepsilon \)-tube outside the moment cone.

Since the moment cone is closed (and convex), also \(T_2\) is closed and every path starting in \(T_1\) and ending in \(T_3\) contains at least one point in \(T_2\). We define

$$\begin{aligned} t' := t - c_q((0,1))\cdot s_\mathsf {B}((0,1)) \end{aligned}$$

for all \(t\in B_\varepsilon (s)\). By (\(*\)) and (\(**\)) we have for all \(t\in B_\varepsilon (s)\) that \(t'\) is a moment sequence without an atom at (0, 1) (by maximality of \(c_q((0,1))\)), i.e., \(t'\) is a moment sequence on \(\mathbb {R}\) and in the boundary of \(\mathcal {S}_\mathsf {A}\). By step (i) \(t'\) requires at most k point evaluations. Since \(s_\mathsf {B}((x,y))\) is continuous in (xy), there exists a \(\delta = \delta (\varepsilon )>0\) such that

$$\begin{aligned} s' := s - c_s((\delta ,1))\cdot s_\mathsf {B}((\delta ,1))\in T_2, \end{aligned}$$

i.e., also \(s'\) is a moment sequence on \(\mathbb {R}\) with at most k point evaluations. Hence, \(s = s' + c_s((\delta ,1))\cdot s_\mathsf {B}((\delta ,1))\) is a moment sequence on \(\mathbb {R}\) with \(\mathcal {C}_\mathsf {A}(s) \le C+1\).

Finally, since \(s\in \mathcal {S}_\mathsf {A}\) was arbitrary we have \(\mathcal {C}_\mathsf {A}\le C+1\). \(\square \)

Theorem 3.7

Let \(R=\mathbb {R}[t^{d_1},\dots ,t^{d_r}]\) and \(k\ge \mathfrak {c}\). Every moment functional \(L:R_{\le 2k}\rightarrow \mathbb {R}\) is a conic combination of at most \(k+1-\left\lceil {\frac{\mathfrak {c}-D_\mathfrak {c}-1}{2}}\right\rceil \) point evaluations. There are moment functionals \(L:R_{\le 2k}\rightarrow \mathbb {R}\) that are not a conic combination of less than \(k-(\mathfrak {c}-D_\mathfrak {c})\) point evaluations.

Proof

For the upper bound we combine Proposition 3.4 and Lemma 3.6. We have \(1\in R_{\le 2k}\) and since \(k\ge \mathfrak {c}\) we have by the minimality of \(\mathfrak {c}\) (see second paragraph at the beginning of this section) that \(x^{2k}\in R_{\le 2k}\). So the monomial basis \(\mathsf {A}\) of \(R_{\le 2k}\) fulfills the conditions in Lemma 3.6 and by Proposition 3.4 every non-negative polynomial in \(R_{\le 2k}\) has at most \(C = k-\left\lceil {\frac{\mathfrak {c}-D_\mathfrak {c}-1}{2}}\right\rceil \) zeros. Lemma 3.6 implies that every moment sequence/functional is represented by at most \(C + 1 = k+1-\left\lceil {\frac{\mathfrak {c}-D_\mathfrak {c}-1}{2}}\right\rceil \) point evaluations.

The lower bound follows from Proposition 3.4, Lemma 3.5, and Theorem 2.4. \(\square \)

Example 3.8

  1. a)

    Let \(R=\mathbb {R}[t^2,t^{2r+1}]\) with \(r\ge 0\). In this case we have \(\mathfrak {c}=D_\mathfrak {c}=2r+1\). Thus for \(k\ge 2r+1\) every moment functional \(L:R_{\le 2k}\rightarrow \mathbb {R}\) is a conic combination of at most \(k+1\) point evaluations and there are moment functionals which are not a conic combination of less than k point evaluations.

  2. b)

    Let \(R=\mathbb {R}[t^r,t^{r+1},t^{r+2},\ldots ]\). Then \(\mathfrak {c}=r\) and \(D_\mathfrak {c}=1\) if r is odd and \(D_\mathfrak {c}=2\) if r is even, so the difference between upper and lower bound in Theorem 3.7 grows linearly in r. This situation is in sharp contrast to the results from Sect. 4 on smooth curves.

4 Carathéodory numbers for measures supported on algebraic varieties

Now for any subset \(\mathcal {X}\subset \mathbb {R}^n\) we are interested in the ring \(\mathbb {R}[\mathcal {X}]\) of polynomial functions \(\mathcal {X}\rightarrow \mathbb {R}\). The finite dimensional vector space of all functions \(\mathcal {X}\rightarrow \mathbb {R}\) that can be represented by a polynomial of degree at most d is denoted by \(\mathbb {R}[\mathcal {X}]_{\le d}\). If \(I\subset \mathbb {R}[x_1,\ldots ,x_n]\) is the ideal of all polynomials vanishing on \(\mathcal {X}\), then \(\mathbb {R}[\mathcal {X}]=\mathbb {R}[V_0]=\mathbb {R}[x_1,\ldots ,x_n]/I\) where \(V_0\subset \mathbb {R}^n\) is the Zariski closure of \(\mathcal {X}\). Let \(V\subset \mathbb {P}^n\) be the Zariski closure of \(V_0\) in the complex projective space. Then one has

$$\begin{aligned} HF_V(d)\quad =\quad \dim \mathbb {R}[\mathcal {X}]_{\le d}. \end{aligned}$$
(3)

From Richter’s Theorem we thus immediately get the following.

Proposition 4.1

Every moment functional \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) is a conic combination of at most \(HF_V(2d)\) point evaluations \(l_{x_i}\) with \(x_i\in \mathcal {X}\). If \(\mathcal {X}\) consists of less than \(HF_V(2d)\) path-connected components, then \(HF_V(2d)-1\) point evaluations are sufficient. In particular, for large d this upper bound grows like a polynomial whose degree is the dimension of the Zariski closure of \(\mathcal {X}\).

Proof

In (3) we already established that \(\dim \mathbb {R}[\mathcal {X}]_{\le 2d} = HF_V(2d)\). Since \(\mathcal {X}\) is a measurable space and monomials are measurable functions, Richter’s Theorem 2.2 implies that L can be represented by at most \(\dim \mathbb {R}[\mathcal {X}]_{\le 2d} = HF_V(2d)\) point evaluations.

Since \(\mathcal {X}\) is a topological space which consists of at most \(HF_V(2d) -1\) path-connected components, \(1\in \mathbb {R}[\mathcal {X}]\), and \(\mathbb {R}[\mathcal {X}]\) consists of continuous functions we have that \(s_\mathsf {A}(\mathcal {X})\) consists of at most \(HF_V(2d)-1\) path-connected components. All conditions of [13, Thm. 12] are fulfilled which implies the upper bound. \(\square \)

In order to provide lower bounds as well, we will need the following lemma.

Lemma 4.2

Assume that V is irreducible with homogeneous vanishing ideal I and that its singular locus has codimension at least 2. If \(k=\dim (V)\), then, for all d large enough, there are k real homogeneous polynomials \(f_1,\ldots ,f_k\) of degree d whose common zero set Z on V consists of \(d^k\cdot \deg (V)\) different points that are all real and contained in \(V_0\). Furthermore, one can choose the \(f_1,\ldots ,f_k\) to be a regular sequence with the property that they generate the homogeneous vanishing ideal of Z modulo I.

Proof

By Bertini’s theorem, for a generic choice of \(k-1\) real linear forms \(l_1,\ldots ,l_{k-1}\) the set \(V\cap \mathcal {V}(l_1,\cdots ,l_{k-1})\subset \mathbb {P}^n\) is a real smooth irreducible curve X. Since the real points of V are Zariski dense in V we can furthermore assume that \(X(\mathbb {R})\) is nonempty. Now by [47, Cor. 2.10, Rem. 2.14], for large enough d, there is a homogeneous polynomial f of degree d all of whose zeros on X are real, simple and do not lie at the hyperplane at infinity. Since \(\deg X=\deg V=:e\), these are de many points. The same is true for the zeros of f on \(X'\) where \(X'\) is the intersection of V with linear forms \(l_1',\dots ,l_{k-1}'\) that are sufficiently small perturbations of \(l_1,\dots ,l_{k-1}\). Therefore, for sufficiently small \(\epsilon >0\), the common zero set on V of f with the polynomials \(f_i=\prod _{j=1}^d(l_i+j\epsilon \cdot x_0)\), \(i=1,\ldots ,k-1\), consists of exactly \(d^ke\) real, simple points that do not lie in the hyperplane at infinity. Thus these \(d^ke\) points lie in \(V_0\).

In order to obtain the additional properties, we can perturb \(f_1,\ldots ,f_k\) a little bit so that each \(I_i=I+(f_1,\ldots ,f_i)\) is a radical ideal by Bertini’s Theorem. Finally, since the dimension of \(\mathcal {V}(I_i)\) is exactly \(k-i\), the \(f_1,\ldots ,f_k\) have to form a regular sequence modulo I by Corollary 2.11. \(\square \)

Remark 4.3

In the proof of the preceding lemma lies the reason why, in this section, we get lower bounds only for sufficiently large d. Namely, Scheiderer’s result in [47], which states that for every smooth algebraic curve X there are polynomials of degree d that have only real zeros on X, is only true for sufficiently large d. To find an explicit lower bound on d, that ensures the existence of such polynomials, is an open problem, except for the case of M-curves where a good lower bound has been provided by Huisman [27].

Example 4.4

The assumption on the singular locus in Lemma 4.2 is necessary. Consider for example the singular plane curve \(V_0=\mathcal {V}(x^4-y^3)\subset \mathbb {R}^2\). It is the image of the map \(\mathbb {R}\rightarrow \mathbb {R}^2,t\mapsto (t^3,t^4)\). Now the zeros of a polynomial \(f\in \mathbb {R}[x,y]\) of degree d on \(V_0\) correspond to the roots of the univariate polynomial \(f(t^3,t^4)\). But by Descartes’ rule of signs this can not have 4d different real zeros. We dealt with curves of this kind in Sect. 3.

From this we get our main theorem on the Carathéodory numbers for measures supported on an algebraic set.

Theorem 4.5

Let \(\mathcal {X}=V_0\subset \mathbb {R}^n\) be Zariski closed of dimension \(k>0\) such that its projective closure \(V\subset \mathbb {P}^n\) is irreducible and its singular locus has codimension at least 2. Let \(P\in \mathbb {Q}[t]\) be the Hilbert polynomial of V. For large enough \(d>0\), every moment functional \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) is a conic combination of at most \(P(2d) -1\) point evaluations \(l_{x_i}\) with \(x_i\in \mathcal {X}\). On the other hand, there are moment functionals \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) that are not a conic combination of fewer than

$$\begin{aligned} P(2d)-k\cdot P(d)+\left( {\begin{array}{c}k\\ 2\end{array}}\right) \end{aligned}$$

point evaluations \(l_{x_i}\).

Proof

For large enough d we have \(P(2d)=HF_V(2d)\). Therefore, by Proposition 4.1, in order to prove the upper bound, it suffices to show that P(2d) exceeds the number m of path-connected components of \(\mathcal {X}\) for large enough d. Since m is finite by [4, Thm. 2.4.5, Prop. 2.5.13], this is clear because P has positive degree k. For the lower bound we consider the polynomials \(f_1,\ldots ,f_k\) of degree d from Lemma 4.2 whose common zero set Z on \(\mathcal {X}\) consists of \(d^k\cdot \deg V\) simple points. The polynomial

$$\begin{aligned} f_1(1,x_1,\ldots ,x_n)^2+\ldots +f_k(1,x_1,\ldots ,x_n)^2 \end{aligned}$$

is non-negative and has the same zero set Z on \(\mathcal {X}\). By Theorem 2.4 a lower bound is then given by the dimension of the span of the point evaluations of polynomials of degree at most 2d in Z. This is the same as the dimension of the vector space \((\mathbb {R}[x_0,\ldots ,x_n]/J)_{2d}\) where J is the homogeneous vanishing ideal of Z considered as a subset of \(\mathbb {P}^n\). This is by definition \(HF_Z(2d)\). Since J is given by \(I+(f_1,\ldots ,f_k)\) where I is the homogeneous vanishing ideal of V, and since the \(f_i\) form a regular sequence, we have

$$\begin{aligned} HF_Z(2d)=HF_{I+(f_1,\ldots ,f_k)}(2d)=\sum _{i=0}^k(-1)^i\left( {\begin{array}{c}k\\ i\end{array}}\right) HF_I(d\cdot (2-i)) \end{aligned}$$

by Lemma 2.12. It follows immediately from the definition of the Hilbert function that \(HF_I(m)=0\) for \(m<0\) and \(HF_I(0)=1\). Therefore, only the first three terms of the above sum are nonzero and we obtain:

$$\begin{aligned} HF_Z(2d)=HF_I(2d)-k HF_I(d)+\left( {\begin{array}{c}k\\ 2\end{array}}\right) . \end{aligned}$$

For large d the Hilbert function coincides with the Hilbert polynomial which shows the claim. \(\square \)

Example 4.6

This example is to demonstrate that both upper and lower bound from Theorem 4.5 are false when d is not large enough. Consider the plane curve \(\mathcal {X}\subset \mathbb {R}^2\) defined as the zero set of the polynomial \(x^8+y^8-1\). It is path-connected and its Zariski closure \(C\subset \mathbb {P}^2\) is smooth with Hilbert polynomial \(P(d)=8d-20\). Thus for \(d=1\) we have

$$\begin{aligned} P(2d)-1=P(2)-1=-5<0 \end{aligned}$$

which cannot be an upper bound. However, by Proposition 4.1 an upper bound in the case \(d=1\) is given by

$$\begin{aligned} HF_C(2d)-1=HF_C(2)-1=5. \end{aligned}$$

Finally, again for \(d=1\), we have

$$\begin{aligned} P(2d)-k\cdot P(d)+\left( {\begin{array}{c}k\\ 2\end{array}}\right) =P(2)-P(1)=8 \end{aligned}$$

which exceeds the upper bound and thus cannot be a lower bound.

Example 4.7

Let \(\mathcal {X}\subset \mathbb {R}^n\), \(n\ge 2\), be the boundary of the unit ball, i.e., the zero set of \(1-(x_1^2+\ldots +x_n^2)\). Its Zariski closure

$$\begin{aligned} V=\mathcal {V}(x_0^2-(x_1^2+\ldots +x_n^2))\subset \mathbb {P}^n \end{aligned}$$

is irreducible and smooth. A direct computation gives its Hilbert polynomial:

$$\begin{aligned} P(d)=\left( {\begin{array}{c}n+d-1\\ d\end{array}}\right) +\left( {\begin{array}{c}n+d-2\\ d-1\end{array}}\right) \end{aligned}$$

It agrees with its Hilbert function for \(d\ge 0\). By Theorem 4.5 there is a \(d_0\) such that for all \(d\ge d_0\), every moment functional \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) is a conic combination of at most

$$\begin{aligned} \left( {\begin{array}{c}n+2d-1\\ 2d\end{array}}\right) +\left( {\begin{array}{c}n+2d-2\\ 2d-1\end{array}}\right) -1 \end{aligned}$$

point evaluations with points in \(\mathcal {X}\). Moreover, there are moment functionals \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) that are not a conic combination of fewer than

$$\begin{aligned} \left( {\begin{array}{c}n+2d-1\\ 2d\end{array}}\right)+ & {} \left( {\begin{array}{c}n+2d-2\\ 2d-1\end{array}}\right) -(n-1)\left( \left( {\begin{array}{c}n+d-1\\ d\end{array}}\right) +\left( {\begin{array}{c}n+d-2\\ d-1\end{array}}\right) \right) \\+ & {} \left( {\begin{array}{c}n-1\\ 2\end{array}}\right) \end{aligned}$$

point evaluations. We claim that in this case we can even choose \(d_0=1\). Indeed, since \(\mathcal {X}\) is path-connected, P(2d) exceeds the number of connected components whenever \(d>0\). Moreover, letting \(l_i=x_i\) the curve X in the proof of Lemma 4.2 is the Zariski closure of the unit circle in the plane. Then clearly for any \(d\ge 1\) there is a polynomial of degree d all of whose zeros on X are real, simple and do not lie at the hyperplane at infinity: Consider for example the union of d distinct lines through the origin. Thus the proof of Theorem 4.5 shows that we can choose \(d_0=1\). In the case \(n=3\) the Carathéodory number C is thus bounded by

$$\begin{aligned} 2d^2\le C\le 4d(d+1). \end{aligned}$$

Let us examine the ratio of the lower and upper bound from Theorem 4.5 as d goes to infinity:

$$\begin{aligned} \frac{P(2d)-k\cdot P(d)+\left( {\begin{array}{c}k\\ 2\end{array}}\right) }{P(2d)} \quad =\quad 1-k\frac{P(d)}{P(2d)}+\frac{\left( {\begin{array}{c}k\\ 2\end{array}}\right) }{P(2d)} \quad \xrightarrow {d\rightarrow \infty }\quad 1-\frac{k}{2^k}. \end{aligned}$$
(4)

Thus if the dimension k of \(\mathcal {X}\) is not too small, our bounds are rather tight – at least for large d. On the other hand, if \(k=1\), i.e., \(\mathcal {X}\) is a smooth algebraic curve, using a refined argument, we obtain bounds that are even better, namely they differ only by one.

Theorem 4.8

Let \(\mathcal {X}=V_0\subset \mathbb {R}^n\) be a compact algebraic set of dimension 1 such that its projective closure \(V\subset \mathbb {P}^n\) is a smooth irreducible curve of degree e. For large enough \(d>0\), every moment functional \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) is a conic combination of at most \(d\cdot e+1\) point evaluations \(l_{x_i}\) with \(x_i\in \mathcal {X}\). On the other hand, there are moment functionals \(L:\mathbb {R}[\mathcal {X}]_{\le 2d}\rightarrow \mathbb {R}\) that are not a conic combination of fewer than \(d\cdot e\) point evaluations \(l_{x_i}\).

Proof

The Hilbert polynomial of V is of the form \(HP_V(t)=e\cdot t+a\). Thus the lower bound from Theorem 4.5 is just \(d\cdot e\).

In order to prove the upper bound we use a similar technique as in Lemma 3.6. At first we show that a non-negative polynomial f on \(\mathcal {X}\) of degree 2d can have at most \(d\cdot e\) different zeros on \(\mathcal {X}\) (or vanishes on all of \(\mathcal {X}\)). Indeed, the zero set of f on \(\mathcal {X}\) is contained in \(\mathcal {V}(f^h)\cap V\) where \(f^h\) is the homogenization of f. Since \(\mathcal {V}(f^h)\) is a hypersurface of degree 2d and V a curve of degree e, the intersection \(\mathcal {V}(f^h)\cap V\) consists of 2de points counted with multiplicity. But because f is non-negative on \(\mathcal {X}\), each zero of f on \(\mathcal {X}\) must have even multiplicity as otherwise there would be a sign change. This shows that f has at most \(d\cdot e\) different zeros on \(\mathcal {X}\).

Now we show the upper bound \(d\cdot e + 1\). Let \(\mathsf {A}\) be a basis of \(\mathbb {R}[\mathcal {X}]_{\le 2d}\) and \(s\in \mathcal {S}_\mathsf {A}\) be the moment sequence of L. Since \(1\in \mathbb {R}[\mathcal {X}]_{\le 2d}\) and \(\mathcal {X}\) is compact, \(\mathcal {S}_\mathsf {A}\) is closed and pointed, i.e.,

$$\begin{aligned} c_s(x) := \sup \{c\ge 0 \,|\, s - c\cdot s_\mathsf {A}(x)\in \mathcal {S}_\mathsf {A}\} < \infty \end{aligned}$$

is attained for every \(x\in \mathcal {X}\). Hence, \(s' = s - c_s(x)\cdot s_\mathsf {A}(x)\in \partial \mathcal {S}_\mathsf {A}\) and there exists an \(f\in \mathbb {R}[\mathcal {X}]_{\le 2d}\) such that \(f\ge 0\) on \(\mathcal {X}\) and \(L_{s'}(f) = 0\) holds. Since f has at most \(d\cdot e\) zeros, \(s'\) is represented by at most \(d\cdot e\) point evaluations. \(s = s' + c_s(x)\cdot s_\mathsf {A}(x)\) requires therefore at most \(d\cdot e + 1\) point evaluations in \(\mathcal {X}\). \(\square \)

Remark 4.9

As we have seen in Sect. 3, the smoothness assumption in Theorem 4.8 is crucial.

In the next section we use the techniques and results from this and the preceding sections to obtain new lower bounds for the cases \(\mathcal {X}= \mathbb {R}^n\) and \(\mathcal {X}= [0,1]^n\).

5 Lower bounds on the Carathéodory number

Several lower bounds on the Carathéodory number are known, see e.g. [17]. For bivariate polynomials of odd degree \(\mathcal {A}= \mathbb {R}[x_1,x_2]_{\le 2d-1}\) Möller [39] proved

$$\begin{aligned} \begin{pmatrix} d+1\\ 2\end{pmatrix} + \left\lfloor \frac{d}{2}\right\rfloor \quad \le \quad \mathcal {C}_{\mathsf {A}_{2,2d-1}}. \end{aligned}$$

In [16] the first author and K. Schmüdgen gave a very general lower bound improving Möllers lower bound to

$$\begin{aligned} \left\lceil \frac{1}{3}\begin{pmatrix} 2d+1\\ 2\end{pmatrix}\right\rceil \quad \le \quad \mathcal {C}_{\mathsf {A}_{2,2d-1}} \qquad \text {and}\qquad \left\lceil \frac{1}{3}\begin{pmatrix} 2d+2\\ 2\end{pmatrix}\right\rceil \quad \le \quad \mathcal {C}_{\mathsf {A}_{2,2d}}. \end{aligned}$$

In [46] C. Riener and M. Schweighofer further improved the lower bound to

$$\begin{aligned} (d-1)^2 \quad \le \quad \mathcal {C}_{\mathsf {A}_{2,2d-1}}. \end{aligned}$$
(5)

They used [46, Prop. 8.5], a polynomial version of Theorem 2.4, applied to \(f_1^2+f_2^2\) where

$$\begin{aligned} f_1(x) = (x-1)(x-2)\cdots (x-d) \quad \text {and}\quad f_2(y) = (y-1)(y-2)\cdots (y-d) \end{aligned}$$

and found \(\dim \mathbb {R}[x,y]/(f_1,f_2) = d^2\), i.e., \(\dim \mathrm {lin}\,\{s_\mathsf {A}(x_i,y_j) \,|\, x_i,y_j= 1,\dots ,d\} = d^2\) and therefore the moment functional \(L:\mathbb {R}[x,y]_{\le 2d}\rightarrow \mathbb {R}\) with \(L = \sum _{i,j=1}^d l_{(i,j)}\) has Carathéodory number \(d^2\). In [15] this was extended to higher dimensions by investigating the linear (in)dependence of \(s_\mathsf {A}(x_i)\) on the grid \(G = \{1,\dots ,d\}^n\) (for \(\mathcal {X}= \mathbb {R}^n\)) and \(G = \{0,1,\dots ,d\}^n\) (for \(\mathcal {X}= [0,d]^n\)). As in the previous section the main idea is that the dimension of point evaluations

$$\begin{aligned} \dim \mathrm {lin}\,\{s_{\mathsf {A}_{n,d}}(x) \,|\, x\in \mathcal {Z}(f)\} \end{aligned}$$
(6)

can be translated into

$$\begin{aligned} \dim (\mathbb {R}[x_0,\dots ,x_n]/I)_d, \end{aligned}$$
(7)

i.e., the dimension of the homogeneous part of \(\mathbb {R}[x_0,\dots ,x_n]/I\) of degree d for some homogeneous ideal I.

Lemma 5.1

Let \(n,d\in \mathbb {N}\) and set

$$\begin{aligned} p_i = (x_i - x_0)\cdots (x_i - dx_0) \quad \text {and}\quad q_i = x_i(x_i - x_0)\cdots (x_i - dx_0) \end{aligned}$$

for \(i=1,\dots ,n\). The following holds:

  1. (i)

    The sequences \(p_1,\dots ,p_n\) and \(q_1,\dots ,q_n\) are regular.

  2. (ii)

    The ideals generated by \(p_1,\dots ,p_n\) resp. \(q_1,\dots ,q_n\) are radical.

  3. (iii)

    Let \(f_1,\dots ,f_n\) be a regular sequence of homogeneous functions \(f_i\) of degree d. The Hilbert function \(HF_{R_n}\) of \(R_n := \mathbb {R}[x_0,\dots ,x_n]/(f_1,\dots ,f_n)\) is

    $$\begin{aligned} HF_{R_n}(k)=\sum _{i=0}^n (-1)^i\cdot \left( {\begin{array}{c}n\\ i\end{array}}\right) \cdot HF_{\mathbb {P}^n}(k-i\cdot d). \end{aligned}$$

    In particular, we have

    $$\begin{aligned} HF_{R_n}(2d-2)&= \begin{pmatrix} n+2d-2\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d-2\\ n\end{pmatrix},\\ HF_{R_n}(2d-1)&= \begin{pmatrix} n+2d-1\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d-1\\ n\end{pmatrix},\\ HF_{R_n}(2d)&= \begin{pmatrix} n+2d\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d\\ n\end{pmatrix} + \begin{pmatrix} n\\ 2\end{pmatrix}, \end{aligned}$$

    and

    $$\begin{aligned} HF_{R_n}(2d+1)&= \begin{pmatrix} n+2d+1\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d+1\\ n\end{pmatrix} + 3\cdot \begin{pmatrix} n+1\\ 3\end{pmatrix}. \end{aligned}$$

Proof

Part (i) follows directly from the fact that each \(p_i\) resp. \(q_i\) is a monic polynomials over \(\mathbb {R}[x_0]\) in the single variable \(x_i\). Part (ii) is a direct consequence of [2, Thm. 1.1]. Finally, since \(HF_{\mathbb {P}^n}(k)=\left( {\begin{array}{c}n+k\\ k\end{array}}\right) \) for \(k\ge 0\) and \(HF_{\mathbb {P}^n}(k)=0\) otherwise, Lemma 2.12 directly implies (iii). \(\square \)

From this lemma we derive the following lower bounds for the Carathéodory number \(\mathcal {C}_{\mathsf {A}_{n,2d}}\) and \(\mathcal {C}_{\mathsf {A}_{n,2d+1}}\) on \(\mathcal {X}= \mathbb {R}^n\).

Theorem 5.2

Let \(n,d\in \mathbb {N}\) and \(\mathcal {X}\subseteq \mathbb {R}^n\) with non-empty interior. For even degree \(\mathcal {A}= \mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\) we have

$$\begin{aligned} \mathcal {C}_{\mathsf {A}_{n,2d}} \quad \ge \quad \begin{pmatrix} n+2d\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d\\ n\end{pmatrix} + \begin{pmatrix} n\\ 2\end{pmatrix} \end{aligned}$$

and for odd degree \(\mathcal {A}= \mathbb {R}[x_1,\dots ,x_n]_{\le 2d+1}\) we have

$$\begin{aligned} \mathcal {C}_{\mathsf {A}_{n,2d+1}} \quad \ge \quad \begin{pmatrix} n+2d+1\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d+1\\ n\end{pmatrix} + 3\cdot \begin{pmatrix} n+1\\ 3\end{pmatrix}. \end{aligned}$$

Proof

Since \(\mathcal {X}\subseteq \mathbb {R}^n\) has non-empty interior there is a \(\varepsilon > 0\) and \(y\in \mathbb {R}^n\) such that \(y + \varepsilon \cdot \{1,\dots ,d\}^n\subset \mathcal {X}\). The affine map \(T:\mathcal {X}'\rightarrow \mathcal {X},\ x\mapsto y+\varepsilon \cdot x\) shifts the moment problem on \(\mathcal {X}\) to \(\mathcal {X}' = \varepsilon ^{-1}\cdot (\mathcal {X}-y)\) with \(\mathbb {R}[x_1,\dots ,x_n]_{\le dD} = \mathbb {R}[x_1,\dots ,x_n]_{\le D}\circ T\), \(D = 2d, 2d+1\), and \(\{1,\dots ,d\}^n\subset \mathcal {X}'\). So w.l.o.g. we can assume that \(\{1,\dots ,d\}^n\subset \mathcal {X}\). Then we can proceed as in the proof of Theorem 4.5 by choosing the \(f_i\) to be the \(p_i\) from Lemma 5.1. We have already calculated the concrete resulting values of the Hilbert function in Lemma 5.1. \(\square \)

These lower bounds coincide with the numerical results in [15, Tab. 2]. Note that for \(n=1\) we get for the even and odd degree cases the bound d. This is the maximal number of zeros of a non-zero and non-negative univariate polynomial, i.e., the Carathéodory number of moment sequences on the boundary of the moment cone \(\mathcal {S}_{\mathsf {A}_{n,2d}}\) or \(\mathcal {S}_{\mathsf {A}_{n,2d+1}}\), respectively. In fact, we proved the following.

Proposition 5.3

Let \(n,d\in \mathbb {N}\), \(k\in \{0,1\}\), \(\mathcal {X}=\mathbb {R}^n\), and \(G=\{1,\dots ,d\}^n\). Then

$$\begin{aligned} s = \sum _{x\in G} s_{\mathsf {A}_{n,2d+k}}(x) \qquad \text {resp.}\qquad L = \sum _{x\in G} l_x:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d+k}\rightarrow \mathbb {R}\end{aligned}$$

supported on the grid G with \(L(p)=0\), \(p=p_1^2+ \cdots + p_n^2\ge 0\) from Lemma 5.1, and the representing measure \(\mu = \sum _{x\in G} \delta _x\) has the Carathéodory number

We get the following lower bounds for the case \(\mathcal {X}= [0,1]^n\) (or equivalently \(\mathcal {X}= [0,d]^n\)) which serves as an example of a compact set \(\mathcal {X}\).

Theorem 5.4

Let \(n,d\in \mathbb {N}\) and \(\mathcal {X}= [0,1]^n\). For even degree \(\mathcal {A}= \mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\) we have

$$\begin{aligned} \mathcal {C}_{\mathsf {A}_{n,2d}} \quad \ge \quad \begin{pmatrix} n+2d\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d-1\\ n\end{pmatrix} \end{aligned}$$

and for odd degree \(\mathcal {A}= \mathbb {R}[x_1,\dots ,x_n]_{\le 2d+1}\) we have

$$\begin{aligned} \mathcal {C}_{\mathsf {A}_{n,2d+1}} \quad \ge \quad \begin{pmatrix} n+2d+1\\ n\end{pmatrix} - n\cdot \begin{pmatrix} n+d\\ n\end{pmatrix}. \end{aligned}$$

Proof

The proof follows the same arguments as in Theorem 5.2. Since we work on \([0,1]^n\) we can choose the \(f_i\)’s to be the \(q_i\)’s as in Theorem 4.5. Then Lemma 5.1 provides the explicit values of the Carathéodory numbers. \(\square \)

Additionally, note the difference between the lower bounds from Theorem 5.2 and Theorem 5.4. In the one-dimensional case a non-negative polynomial p of degree 2d has at most d zeros by the fundamental theorem of algebra:

$$\begin{aligned} p(x) = (x-x_1)^2\cdots (x-x_d)^2. \end{aligned}$$

However, on the interval [0, 1] a non-negative polynomial q of degree 2d can have up to \(d+1\) zeros

$$\begin{aligned} q(x) = (x-x_1)\cdot (x-x_2)^2\cdots (x-x_{d-1})^2\cdot (x_d-x) \end{aligned}$$

when \(x_1 = 0\) and \(x_d = 1\) holds. So interior zeros count twice, zeros on the boundary only once. This concept appeared already in the classical works of Kreĭn and Nudel’man about T-systems, see [31, Ch. 2]. So in higher dimensions for \(\mathbb {R}^n\) all zeros are interior points but for \([0,1]^n\) we can place zeros on the boundary.

Note that for \(n=1\) we get for the even and the odd case the lower bound \(d+1\). This is the maximal number of zeros of a non-zero and non-negative polynomial on [0, 1]. For \(n=2\) we get the following.

Corollary 5.5

For \(d\in \mathbb {N}\) and \(\mathcal {X}= [0,1]^2\) (\(n=2\)) we have

$$\begin{aligned} \mathcal {C}_{\mathsf {A}_{2,2d}} \;\ge \; (d+1)^2 \qquad \text {and}\qquad \mathcal {C}_{\mathsf {A}_{2,2d+1}} \;\ge \; (d+1)^2. \end{aligned}$$

Theorems 5.2 and 5.4 give lower bounds on the Carathéodory number of \(\mathcal {S}_{\mathsf {A}_{n,k}}\) by constructing one specific boundary moment sequence s and calculating its Carathéodory number \(\mathcal {C}_{\mathsf {A}_{n,k}}(s)\). But from the following considerations it will be clear that Theorems 5.2 and 5.4 already show that in higher dimensions and degrees the Carathéodory numbers behave very badly, see Theorem 5.6. Previous results in [16] and [46] show that for \(n=2\) we have

$$\begin{aligned} \frac{1}{2} \quad \le \quad \liminf _{d\rightarrow \infty } \frac{\mathcal {C}_{\mathsf {A}_{2,d}}}{|\mathsf {A}_{2,d}|}\quad \le \quad \limsup _{d\rightarrow \infty } \frac{\mathcal {C}_{\mathsf {A}_{2,d}}}{|\mathsf {A}_{2,d}|} \quad \le \quad \frac{3}{4}. \end{aligned}$$
(8)

From Theorems 5.2 and 5.4 we get the following limits.

Theorem 5.6

For \(\mathcal {X}\subseteq \mathbb {R}^n\) with non-empty interior we have

$$\begin{aligned} \liminf _{d\rightarrow \infty } \frac{\mathcal {C}_{\mathsf {A}_{n,d}}}{|\mathsf {A}_{n,d}|}\quad&\ge \quad 1-\frac{n}{2^n}&\text { for all } n&\in \mathbb {N}\end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\mathcal {C}_{\mathsf {A}_{n,d}}}{|\mathsf {A}_{n,d}|}\quad&= \quad 1&\text { for all } d&\in \mathbb {N}. \end{aligned}$$

Proof

Follows by a direct calculation as in Equation (4). \(\square \)

In (8), i.e., [16] and [46], we have seen that for \(n=2\) the upper bound on the Carathéodory number is considerably smaller than \(|\mathsf {A}_{2,d}|\), namely \(\frac{3}{4}\cdot |\mathsf {A}_{2,d}|\) is an upper bound. But Theorems 5.2, 5.4, and 5.6 confirm the apprehensions in [15] on the Carathéodory numbers and their limits. Note that for \(\mathcal {X}= [0,1]^n\) the following was proved already in [15].

Theorem 5.7

([15, Thm. 59]) For \(\mathbb {R}[x_1,\dots ,x_n]_{\le 2}\) on \(\mathcal {X}= [0,1]^n\) we have

$$\begin{aligned} \begin{pmatrix} n+2\\ 2\end{pmatrix} - n \quad \le \quad \mathcal {C}_{\mathsf {A}_{n,2}} \quad \le \quad \begin{pmatrix}n+2\\ 2\end{pmatrix} - 1. \end{aligned}$$

Thus for higher dimensions n, even with fixed degree d, it is not possible to give upper bounds \(\mathcal {C}_{\mathsf {A}_{n,d}}\le c\cdot |\mathsf {A}_{n,d}|\) with \(c<1\) for all n.

Corollary 5.8

Let \(\mathcal {X}\subseteq \mathbb {R}^n\) with non-empty interior and \(d\in \mathbb {N}\). For \(\varepsilon >0\) there is an \(N\in \mathbb {N}\) such that for every \(n\ge N\) there is a moment sequence \(s\in \mathcal {S}_{\mathsf {A}_{n,d}}\) resp. a moment functional \(L:\mathbb {R}[x_1,\dots ,x_n]_{\le d}\rightarrow \mathbb {R}\) with

$$\begin{aligned} \mathcal {C}_{\mathsf {A}_{n,d}}(s) \quad \ge \quad (1-\varepsilon )\cdot \begin{pmatrix} n+d\\ n\end{pmatrix}, \end{aligned}$$

i.e., \(L_s\) is the conic combination of at least \((1-\varepsilon )\cdot \big ({\begin{matrix} n+d\\ n\end{matrix}}\big )\) point evaluations \(l_{x_i}\).

Proof

Choose s resp. L as in Proposition 5.3. This has the desired property. \(\square \)

So even when we work with the probably most well behaved moment problem, i.e., polynomials, the Carathéodory number is cursed by high dimensionalities. In the next section we study the consequences of these new lower bounds and their limits for Hankel matrices and flat extension.

6 Hankel matrices and flat extension

Recall that for a finite dimensional space \(\mathcal {A}\) of measurable functions with basis \(\mathsf {A}= \{a_1,\dots ,a_m\}\) the Hankel matrix \(\mathcal {H}(L)\) of a linear functional \(L:\mathcal {A}^2\rightarrow \mathbb {R}\) is given by \(\mathcal {H}(L) = (L(a_i a_j))_{i,j=1}^m\), i.e.,

$$\begin{aligned} \mathcal {H}(L) = \int _\mathcal {X}\begin{pmatrix} a_1(x) a_1(x) &{} \hdots &{} a_1(x) a_m(x)\\ \vdots &{} \ddots &{} \vdots \\ a_1(x) a_m(x) &{} \hdots &{} a_m(x) a_m(x) \end{pmatrix}~\mathrm {d}\mu (x) = \int _\mathcal {X}s_\mathsf {A}(x)\cdot s_\mathsf {A}(x)^T~\mathrm {d}\mu (x) \end{aligned}$$
(9)

if \(\mu \) is a (signed) representing measure of L. Hence we have the following.

Lemma 6.1

Let \(\mathcal {A}\) be a finite dimensional vector space of measurable functions with basis \(\mathsf {A}=\{a_1,\dots ,a_m\}\). For \(L:\mathcal {A}^2\rightarrow \mathbb {R}\) with \(L = \sum _{i=1}^k c_i\cdot l_{x_i}\) (\(c_i\in \mathbb {R}\)) we have

$$\begin{aligned} \mathcal {H}(L)=\sum _{i=1}^k c_i\cdot s_\mathsf {A}(x_i)\cdot s_\mathsf {A}(x_i)^T \qquad \text {and}\qquad \mathrm {rank}\,\mathcal {H}(L)\le k. \end{aligned}$$

The following are equivalent:

  1. (i)

    \(\mathrm {rank}\,\mathcal {H}(L) = k\).

  2. (ii)

    \(s_\mathsf {A}(x_1),\dots ,s_\mathsf {A}(x_k)\) are linearly independent.

Proof

By replacing \(x^\alpha \), \(x^\beta \), and \(x^{\alpha +\beta }\) by \(a_i\), \(a_j\), and \(a_{i,j}=a_i\cdot a_j\), respectively, with (9) the proof is verbatim the same as in [48, Prop. 17.21]. \(\square \)

Note that the previous result holds for signed representing measures.

It is very hard to check whether a linear functional L is a moment functional. If \(\mathcal {X}\) is compact and \(\mathcal {A}^2\) contains an \(e>0\) on \(\mathcal {X}\) then one has to check the following condition:

$$\begin{aligned} L_s(p)\ge 0 \quad \forall p\in \mathrm {Pos}(\mathcal {A}^2,\mathcal {X}) := \{a\in \mathcal {A}^2 \,|\, a\ge 0\}. \end{aligned}$$

But the set of non-negative functions \(\mathrm {Pos}(\mathcal {A}^2,\mathcal {X})\) on \(\mathcal {X}\) is in general hard to describe. For example deciding, whether a polynomial \(p\in \mathbb {R}[x_1,\ldots ,x_n]_{\le 2d}\) is non-negative, is an NP-hard problem (for fixed \(d\ge 2\) as a function of n), see e.g. [7, p.56]. One approach to overcome this problem is to approximate non-negative polynomials with sums of squares (SOS): Checking whether a given polynomial is a sum of squares is equivalent to deciding whether a certain semidefinite program (SDP) is feasible, see [7, §4.1.4], and (under some mild assumptions) one can solve an SDP up to a fixed precision in time that is polynomial in the program description size [7, §2.3.1]. The connection between the truncated moment problem and non-negative polynomials run deep and can be found in a large number of publications on the truncated moment problem, see e.g. [1, 31, 34, 36, 38, 48].

Flat extension is another method to determine whether a linear functional \(L:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\rightarrow \mathbb {R}\) is a moment functional, see e.g. [8, 9, 36, 37, 48]. Let \(D\ge d\) and \(L_0:\mathbb {R}[x_1,\dots ,x_n]_{\le 2D}\rightarrow \mathbb {R}\) be a linear functional that extends L. An extension \(L_1:\mathbb {R}[x_1,\dots ,x_n]_{\le 2D+2}\) of \(L_0\) is called flat with respect to \(L_0\) if \(\mathrm {rank}\,\mathcal {H}(L_1) = \mathrm {rank}\,\mathcal {H}(L_0)\). Then by the flat extension theorem, see [8, 9] or e.g. [48, Thm. 17.35], there are linear functionals \(L_i:\mathbb {R}[x_1,\dots ,x_n]_{\le 2D+2i}\) with \(\mathrm {rank}\,(L_0)=\mathrm {rank}\,(L_i)\) such that \(L_{i}\) extends \(L_{i-1}\) for all \(i\in \mathbb {N}\). These determine a linear functional \(L_\infty :\mathbb {R}[x_1,\dots ,x_n]\rightarrow \mathbb {R}\) which is called a flat extension of \(L_0\) (to all \(\mathbb {R}[x_1,\dots ,x_n]\)), i.e., every restriction \(L_\infty |_{\mathbb {R}[x_1,\dots ,x_n]_{\le 2D'+2}}\) is a flat extension of \(L_\infty |_{\mathbb {R}[x_1,\dots ,x_n]_{\le 2D'}}\) for all \(D'\ge D\). Exists such a flat extension \(L_\infty \), then by the flat extension theorem \(L_0\) is a moment functional if \(L_0(a^2)\ge 0\) for all \(a\in \mathbb {R}[x_1,\dots ,x_n]_{\le D}\). In this case L is of course a moment functional as well. It was open to which degree 2D the functional L must be extended in order to have a flat extension. The upper bound of \(D\le 2d\) follows immediately from the Carathéodory bound [8, 9]. Part one of the following theorem is due to Curto and Fialkow. Our new lower bounds on the Carathéodory number show that \(D = 2d\) is attained and is stated in part two of the following theorem.

Theorem 6.2

  1. (i)

    For every moment functional \(L:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\rightarrow \mathbb {R}\) there is a \(D\le 2d\) and an extension to a moment functional \(L_0:\mathbb {R}[x_1,\dots ,x_n]_{\le 2D}\rightarrow \mathbb {R}\) that admits a flat extension \(L_{\infty }:\mathbb {R}[x_1,\cdots ,x_n]\rightarrow \mathbb {R}\).

  2. (ii)

    For every \(d\in \mathbb {N}\) there is an \(N\in \mathbb {N}\) such that for every \(n\ge N\) there is a moment functional L on \(\mathbb {R}[x_1,\cdots ,x_n]_{\le 2d}\) such that \(D=2d\) in (i) is required.

Proof

(i): By [16, Cor. 14] we have \(C:=\mathcal {C}_{\mathsf {A}_{n,2d}}(L)\le \left( {\begin{matrix} n+2d\\ n\end{matrix}}\right) -1\), i.e.,

$$\begin{aligned} L = \sum _{i=1}^{C} c_i\cdot {l}_{x_i} \quad \text {with}\quad c_i>0 \end{aligned}$$

and the \(l_{x_i}\) are linearly independent on \(\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\). Then \(L_{\infty }:\mathbb {R}[x_1,\dots ,x_n]\) \(\rightarrow \mathbb {R}\) defined by \(L_{\infty }(f):=\sum _{i=1}^{C} c_i\cdot {f}({x_i})\) is a flat extension of \(L_0:=L|_{\mathbb {R}[x_1,\dots ,x_n]_{\le 4d}}\).

(ii): Let \(s\in \mathcal {S}_{\mathsf {A}_{n,2d}}\) resp. \(L_s:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\rightarrow \mathbb {R}\) as in Proposition 5.3 and assume \(D = 2d - c\), \(c\in \mathbb {N}\). From the condition \(\mathcal {C}_{\mathsf {A}_{n,2d}}(s) \le \left( {\begin{matrix} n+D\\ n\end{matrix}}\right) \) that the Hankel matrix of the flat extension must be at least the size of the Carathéodory number of s we find that

$$\begin{aligned}&1 \le \lim _{n\rightarrow \infty } \frac{\begin{pmatrix}n + 2d - c\\ n\end{pmatrix}}{\mathcal {C}_{\mathsf {A}_{n,2d}}(s)} = \lim _{n\rightarrow \infty } \frac{\begin{pmatrix}n + 2d - c\\ n\end{pmatrix}}{\begin{pmatrix} n + 2d\\ n\end{pmatrix}} \cdot \underbrace{\frac{\begin{pmatrix} n+2d\\ n\end{pmatrix}}{\mathcal {C}_{\mathsf {A}_{n,2d}}(s)}}_{\rightarrow 1\ \text {by Thm. 5.6}}\\&\quad = \lim _{n\rightarrow \infty } \frac{(2d-c+1)\cdots (2d)}{(n+2d-c+1)\cdots (n+2d)} = 0. \end{aligned}$$

This is a contradiction, i.e., \(c=0\) must hold. \(\square \)

Example 6.3

Consider the moment sequences \(s = (s_\alpha )_{\alpha \in \mathbb {N}_0^n:|\alpha |\le 2d}\) resp. functionals \(L:\mathbb {R}[x_1,\dots ,x_n]_{\le 2d}\rightarrow \mathbb {R}\) from Proposition 5.3 supported on the grid \(\{1,\dots ,d\}^n\). The condition \(\mathcal {C}_{\mathsf {A}_{n,2d}}(s) \le \left( {\begin{matrix} n+D\\ n\end{matrix}}\right) \), meaning that the size of the Hankel matrix must be at least the size of Carathéodory number of s, shows that (nd) = (9, 2), (7, 3), (6, 4), (6, 5), and \((n',6)\), for all \(n'\ge 6\), are small examples where the worst case extension to degree \(D=2d\) is attained. Even for \(d = 10^{15}\) the worst case extension is already necessary for \(n=51\).