1 Introduction

A classical result of Fefferman states that the class BMO, the space of functions of bounded mean oscillation, is the dual space of \(H^1\). This fact, originally proved in [2] in the analytic setting, was later extended to the probabilistic context by Getoor and Sharpe [3]. The purpose of this paper is to establish a certain sharp estimate related to this duality in the one-dimensional context.

We start with some background and notation. Throughout, for brevity, the symbol \({\mathcal {I}}\) will stand for the interval [0, 1]. Let \((h_n)_{n\ge 0}\) be the Haar system on [0, 1], i.e.,

$$\begin{aligned}&h_0=\chi _{[0,1]},&h_1=\chi _{[0,1/2)}-\chi _{[1/2,1)},\\&h_2=\chi _{[0,1/4)}-\chi _{[1/4,1/2)},&h_3=\chi _{[1/2,3/4)}-\chi _{[3/4,1)},\\&h_4=\chi _{[0,1/8)}-\chi _{[1/8,1/4)},&h_5=\chi _{[1/4,3/8)}-\chi _{[3/8,1/2)}, \end{aligned}$$

and so on. Let \({\mathcal {H}}\) be a separable real Hilbert space with scalar product \(\cdot \) and norm \(|\cdot |\). For any dyadic subinterval \(I\subseteq {\mathcal {I}}\) and an integrable function \(\varphi :{\mathcal {I}}\rightarrow {\mathcal {H}}\), we will write \(\langle \varphi \rangle _I\) for the average of \(\varphi \) over I: that is, \(\langle \varphi \rangle _I=\frac{1}{|I|}\int _I \varphi \text{ d }u\). Furthermore, for any such \(\varphi \) and any nonnegative integer n, we will use the notation

$$\begin{aligned} \varphi _n=\sum _{k=0}^n \frac{1}{|I_k|}\mathop {\int }\limits _{\mathcal {I}}\varphi (s)h_k(s)\text{ d }s\,h_k \end{aligned}$$

for the projection of \(\varphi \) on the subspace generated by the first \(n+1\) Haar functions (\(I_k\) denotes the support of \(h_k\)). The dyadic square function of \(\varphi \) is

$$\begin{aligned} S(\varphi )(x)=\left( \sum \left| \frac{1}{|I_n|}\mathop {\int }\limits _{\mathcal {I}} \varphi (s)h_n(s)\text{ d }s\right| ^2 \right) ^{1/2},\qquad x\in {\mathcal {I}}, \end{aligned}$$
(1.1)

where the summation runs over all nonnegative integers n such that \(x\in I_n\). The Hardy space \(H^1({\mathcal {I}})\) consists of those \(\varphi \) on \({\mathcal {I}}\), for which the norm \(\Vert \varphi \Vert _{H^1}=\Vert S(\varphi )\Vert _{L^1}\) is finite. All the above definitions extend obviously to the case when \({\mathcal {I}}\) is replaced by an arbitrary dyadic subinterval of \({\mathbb {R}}\). Actually, one easily defines \(S(\varphi )\) and \(H^1\) when the underlying space is the entire line \({\mathbb {R}}\). To this end, consider in (1.1) the sum over all dyadic intervals I contained in \({\mathbb {R}}\), and replace \(I_n\) with I and \(h_n\) with \(h_I=\chi _{I^-}-\chi _{I^+}\), where \(I^-\), \(I^+\) are the left and right halves of I. The definition of \(H^1\) remains unchanged.

A function \(\psi :{\mathcal {I}}\rightarrow {\mathcal {H}}\) belongs to the (dyadic) class BMO if

$$\begin{aligned} ||\psi ||_{BMO}:=\sup _{I} \big (\langle |\psi -\langle \psi \rangle _I|^2\rangle _I\big )^{1/2}<\infty , \end{aligned}$$

where the supremum is taken over all dyadic subintervals I of \({\mathcal {I}}\). This space, introduced by John and Nirenberg in [6], plays a fundamental role in harmonic analysis and probability theory. It is well-known that it often serves as a convenient substitute for \(L^\infty \): many important operators—e.g., singular integrals, area functions—are not bounded on \(L^\infty \), but send \(L^\infty \) into BMO instead (cf. [4]). In addition, BMO behaves nicely from the viewpoint of interpolation, see, e.g., [1, 4]. Our motivation comes from another property, already mentioned above: this space is dual to \(H^1\). Specifically, it follows from the above result of Fefferman that for \({\mathcal {H}}={\mathbb {R}}\), there is a finite constant C such that for any \(\varphi \in {\mathcal {H}}^1\) and any \(\psi \in BMO\) with \(\langle \psi \rangle _{\mathcal {I}}=0\),

$$\begin{aligned} \mathop {\int }\limits _{\mathcal {I}} \varphi \cdot \psi \text{ d }u\le C\Vert \varphi \Vert _{H^1}\Vert \psi \Vert _{BMO}. \end{aligned}$$
(1.2)

We will identify the optimal constant in this estimate, in the context of general Hilbert spaces \({\mathcal {H}}\). Here is our main result.

Theorem 1.1

For any Hilbert space \({\mathcal {H}}\), the inequality (1.2) holds with the constant \(C=\sqrt{2}\). The constant is the best possible already for \({\mathcal {H}}={\mathbb {R}}\).

Obviously, this result is valid if we replace the interval \({\mathcal {I}}\) by an arbitrary dyadic subinterval of \({\mathbb {R}}\). Actually, by a standard limiting argument, it also remains true if \({\mathcal {I}}\) is replaced with the real line \({\mathbb {R}}\). There is a natural question about the optimal constant in the higher-dimensional setting. Unfortunately, although our argumentation seems to work up to some point, we have not managed to push all the relevant calculations through.

We would also like to mention two related works from the literature. First, Getoor and Sharpe [3] obtained the probabilistic version of the above inequality (with the same constant \(\sqrt{2}\)) for continuous, real-valued martingales. That result can be deduced from Theorem 1.1: namely, one can rephrase the estimate (1.2) in the language of the so-called dyadic martingales which, by approximation, lead to the continuous case. The reverse implication does not seem to be valid. We would also like to refer the reader to the paper [11] by Slavin and Volberg, which contains the proof of a certain version of (1.2) in the context of Triebel–Lizorkin spaces.

A few words about the proof are in order. Our approach will exploit the so-called Bellman function method, which reduces the study of (1.2) to the existence of a certain special function. The efficiency of this technique in the study of various sharp BMO estimates has been confirmed in numerous papers: we refer the interested reader to the works [5, 7,8,9,10, 12] (the complete list is much longer). We should stress, however, that the estimate (1.2) is four-dimensional, i.e., the corresponding Bellman function involves four variables. Hence, it does not fall into scope of the contexts studied in the aforementioned papers, which mostly handled two-dimensional problems.

The remaining part is organized as follows. The general version of our approach is described in detail in the next section. The final part of the paper is devoted to the proof of Theorem 1.1.

2 On the method of proof

Fix a Hilbert space \({\mathcal {H}}\) and consider the four-dimensional domain

$$\begin{aligned} {\mathscr {D}}&=\Big \{(x,s,y,z)\in {\mathcal {H}}\times [0,\infty )\times {\mathcal {H}}\times [0,\infty )\,:\,|y|^2\le z\le |y|^2+1\Big \}. \end{aligned}$$

Next, let \(V:{\mathcal {H}}\times [0,\infty )\times {\mathcal {H}}\rightarrow {\mathbb {R}}\) be a given function and suppose that we want to establish the estimate

$$\begin{aligned} \mathop {\int }\limits _{\mathcal {I}} V(\varphi ,S(\varphi ),\psi )\,\text{ d }u\le 0 \end{aligned}$$
(2.1)

for all simple functions \(\varphi ,\,\psi :{\mathcal {I}}\rightarrow {\mathcal {H}}\) satisfying \(||\psi ||_{BMO}\le 1\) and \(\langle \psi \rangle _{\mathcal {I}}=0\). Here by simplicity we mean that \(\varphi \), \(\psi \) can be written as finite sums of the form \(\sum _{k=0}^m a_kh_{k}\) for some integer m and some coefficients \(a_0\), \(a_1\), \(\ldots \), \(a_m\in {\mathcal {H}}\). This assumption implies in particular that the integral appearing in (2.1) makes sense. To study this problem, consider the class \({\mathcal {U}}(V)\) consisting of all \(U:{\mathscr {D}}\rightarrow (-\infty ,\infty ]\) satisfying the following three requirements:

$$\begin{aligned} U(x,s,y,|y|^2)\ge & {} V(x,s,y) \qquad \quad \text{ for } \text{ all } (x,s,y,|y|^2)\in {\mathscr {D}}, \end{aligned}$$
(2.2)
$$\begin{aligned} U(x,|x|,0,z)\le & {} 0 \qquad \qquad \qquad \;\;\text{ for } \text{ all } (x,|x|,0,z)\in {\mathscr {D}}, \end{aligned}$$
(2.3)

and the further condition that for any \((x,s,y,z)\in {\mathscr {D}}\) and any \(d,\,e\in {\mathcal {H}}\), \(f\in {\mathbb {R}}\) satisfying \((x_\pm ,\sqrt{s^2+|d|^2},y_\pm ,z_\pm ):=(x\pm d,\sqrt{s^2+|d|^2},y\pm e,z\pm f)\in {\mathscr {D}}\),

$$\begin{aligned} \begin{aligned}&U(x,s,y,z)\\&\quad \ge \frac{1}{2}\left[ U\big (x_-,\sqrt{s^2+|d|^2},y_-,z_-\big )+U\big (x_+,\sqrt{s^2+|d|^2},y_+,z_+\big )\right] . \end{aligned} \end{aligned}$$
(2.4)

So, the functions from \({\mathcal {U}}(V)\) cannot be too big nor too small, and they must satisfy a mid-concavity-type property. The connection between the class \({\mathcal {U}}(V)\) and the estimate (2.1) is studied in the two theorems below.

Theorem 2.1

If the class \({\mathcal {U}}(V)\) is nonempty, then (2.1) holds true for all simple \(\varphi ,\,\psi \) satisfying \(\Vert \psi \Vert _{BMO}\le 1\) and \(\langle \psi \rangle _{\mathcal {I}}=0\).

Proof

Fix \(\varphi \) and \(\psi \). The first step of our analysis is to prove that the sequence

$$\begin{aligned} \left( \mathop {\int }\limits _{\mathcal {I}} U\big (\varphi _n,S(\varphi _{n}),\psi _n,(|\psi |^2)_n\big )\text{ d }u\right) _{n\ge 0} \end{aligned}$$
(2.5)

is nonincreasing. (Let us comment here that \((|\psi |^2)_n\) is the projection of \(|\psi |^2\) on the subspace generated by the first \(n+1\) Haar functions.) Observe that \(0\le (|\psi |^2)_n-|\psi _n|^2\le 1\), where the left estimate follows from Schwarz’ inequality, while the right is due to \(\Vert \psi \Vert _{BMO}\le 1\). So, in particular, the integrals in (2.5) are well-defined: the point \(\big (\varphi _n,S(\varphi _{n}),\psi _n,(|\psi |^2)_n\big )\) belongs to the domain of U. Fix \(n\ge 1\), denote by I the support of \(h_n\), and let \(I^-\), \(I^+\) be the left and the right halves of I. Observe that \(\varphi _{n-1}\), \(S(\varphi _{n-1})\), \(\psi _{n-1}\), and \((|\psi |^2)_{n-1}\) are constant on I; denote the corresponding values by x, s, y, and z, respectively. Furthermore, there exist \(d,\,e\in {\mathcal {H}}\) and \(f\in {\mathbb {R}}\) such that \(\varphi _n \equiv x\pm d\), \(S(\varphi _n)\equiv \sqrt{s^2+|d|^2}\), \(\psi _n\equiv y\pm e\), and \((|\psi |^2)_n\equiv z+f\) on \(I^\pm \), respectively. Therefore,

$$\begin{aligned}&\frac{1}{|I|}\mathop {\int }\limits _I U\big (\varphi _n,S(\varphi _n),\psi _n,(|\psi |^2)_n\big )\text{ d }u\\&\quad =\frac{1}{|I|}\!\mathop {\int }\limits _{I^-} U\big (\varphi _n,S(\varphi _n),\psi _n,(|\psi |^2)_n\big )\text{ d }u+\frac{1}{|I|}\!\mathop {\int }\limits _{I^+} U\big (\varphi _n,S(\varphi _n),\psi _n,(|\psi |^2)_n\big )\text{ d }u\\&\quad =\frac{U(x-d,\sqrt{s^2+|d|^2},y-e,z-f)+U(x+d,\sqrt{s^2+|d|^2},y+e,z+f)}{2}, \end{aligned}$$

which, by (2.4), does not exceed

$$\begin{aligned} U(x,s,y,z)=\frac{1}{|I|}\mathop {\int }\limits _I U\big (\varphi _{n-1},S(\varphi _{n-1}),\psi _{n-1},(|\psi |^2)_{n-1}\big )\text{ d }u. \end{aligned}$$

Since \((\varphi _n,S(\varphi _n), \psi _n,(|\psi |^2)_n)\) and \((\varphi _{n-1},S(\varphi _{n-1}),\psi _{n-1},(|\psi |^2)_{n-1})\) coincide on \({\mathcal {I}}\setminus I\), the monotonicity of the sequence (2.5) follows. Now, recall that \(\varphi \), \(\psi \) are simple, which implies that there is m such that \(\varphi _m=\varphi \), \(S(\varphi _m)=S(\varphi )\), \(\psi _m=\psi \), and \((|\psi |^2)_m=|\psi |^2\). Combining this with (2.2) and (2.3), we obtain

$$\begin{aligned} \begin{aligned} \mathop {\int }\limits _{\mathcal {I}} V(\varphi ,S(\varphi ),\psi )\text{ d }u&\le \mathop {\int }\limits _{\mathcal {I}} U(\varphi _m,S(\varphi _m),\psi _m,(|\psi |^2)_m)\text{ d }u\\&\le \mathop {\int }\limits _{\mathcal {I}} U(\varphi _0,S(\varphi _0),\psi _0,(|\psi |^2)_0)\text{ d }u\\&=\mathop {\int }\limits _{\mathcal {I}} U(\langle \varphi \rangle _{\mathcal {I}},|\langle \varphi \rangle _{\mathcal {I}}|,\langle \psi \rangle _{\mathcal {I}},\langle |\psi |^2\rangle _{\mathcal {I}})\text{ d }u\le 0. \end{aligned} \end{aligned}$$
(2.6)

This completes the proof. \(\square \)

The beautiful feature of the method is that the implication of the above theorem can be reversed. For \((y,z)\in {\mathcal {H}}\times [0,\infty )\) such that \(|y|^2\le z\le |y|^2+1\), let \({\mathcal {M}}(y,z)\) denote the class of all simple functions \(\psi :{\mathcal {I}}\rightarrow {\mathcal {H}}\) satisfying \(\Vert \psi \Vert _{BMO}\le 1\), \(\langle \psi \rangle _{\mathcal {I}}=y\), and \(\langle |\psi |^2\rangle _{\mathcal {I}}=z\). The class \({\mathcal {M}}(y,z)\) is nonempty: for example, it contains the function \(\psi =(y-e)\chi _{[0,1/2)}+(y+e)\chi _{[1/2,1)}\), where \(e\in {\mathcal {H}}\) is a vector satisfying \(|e|^2=z-|y|^2\). Define \(U^0:{\mathscr {D}}\rightarrow (-\infty ,\infty ]\) by

$$\begin{aligned} U^0(x,s,y,z)=\sup \left\{ \mathop {\int }\limits _{\mathcal {I}} V\left( \varphi ,\sqrt{s^2-|x|^2+S^2(\varphi )},\psi \right) \text{ d }u \right\} , \end{aligned}$$
(2.7)

where the supremum is taken over all simple \(\varphi \) with \(\langle \varphi \rangle _{\mathcal {I}}=x\) and all \(\psi \in {\mathcal {M}}(y,z)\). Considering the constant function \(\varphi \equiv x\) and the above “two-point” function \(\psi =(y-e)\chi _{[0,1/2)}+(y+e)\chi _{[1/2,1)}\in {\mathcal {M}}(y,z)\), we derive that

$$\begin{aligned} U^0(x,s,y,z)\ge \frac{1}{2}\Big [V(x,s,y-e)+V(x,s,y+e)\Big ]. \end{aligned}$$
(2.8)

Theorem 2.2

If (2.1) holds for all simple \(\varphi ,\,\psi :{\mathcal {I}}\rightarrow {\mathcal {H}}\) satisfying \(||\psi ||_{BMO}\le 1\), then the class \({\mathcal {U}}(V)\) is nonempty and \(U^0\) is its least element.

Proof

Let us first handle the minimality of \(U^0\). Pick \(U \in {\mathcal {U}}(V)\), \((x,s,y,z)\in {\mathscr {D}}\) and an arbitrary pair \(\varphi \), \(\psi \) as in the definition of \(U^0(x,s,y,z)\). Repeating the arguments in (2.6), one gets

$$\begin{aligned} \mathop {\int }\limits _{\mathcal {I}}&V\left( \varphi ,\sqrt{s^2-|x|^2+S^2(\varphi )},\psi \right) \text{ d }u\\&\le \mathop {\int }\limits _{\mathcal {I}} U\left( \varphi _0,\sqrt{s^2-|x|^2+S^2(\varphi _0)},\psi _0,(|\psi |^2)_0,\right) \text{ d }u=U(x,s,y,z). \end{aligned}$$

So, taking the supremum over \(\varphi \) and \(\psi \), we get \(U^0(x,s,y,z)\le U(x,s,y,z)\) and therefore \(U^0\) is indeed the smallest element.

Now we check that \(U^0\) belongs to the class \({\mathcal {U}}(V)\). The majorization (2.2) is the direct consequence of (2.8): if \(z=y^2\), then \(e=0\) and hence the bound follows. The condition (2.3) is also easy: by (2.1), for any \(\varphi \) of average x and any \(\psi \in {\mathcal {M}}(0,z)\), we have

$$\begin{aligned} \mathop {\int }\limits _{\mathcal {I}} V(\varphi ,S(\varphi ),\psi )\text{ d }u\le 0. \end{aligned}$$

Taking the supremum over all such \(\varphi \) and \(\psi \), we get \(U^0(x,|x|,0,z)\le 0\). To prove (2.4), fix \(x,\,s,\,y,\,z,\,d,\,e,\,f\) as in the statement of this condition. Pick two simple functions \(\varphi _\pm :{\mathbb {R}}\rightarrow {\mathcal {H}}\) satisfying \(\langle \varphi _\pm \rangle _{\mathcal {I}}=x_\pm \), and two functions \(\psi _\pm \in {\mathcal {M}}(y_\pm ,z_\pm )\). Next, splice these objects by the formula

$$\begin{aligned} \varphi (t)={\left\{ \begin{array}{ll} \varphi _-(2t) &{} \text{ if } t\in [0,1/2),\\ \varphi _+(2t-1) &{} \text{ if } t\in [1/2,1], \end{array}\right. } \qquad \psi (t)={\left\{ \begin{array}{ll} \psi _-(2t) &{} \text{ if } t\in [0,1/2),\\ \psi _+(2t-1) &{} \text{ if } t\in [1/2,1]. \end{array}\right. } \end{aligned}$$

Then \(\varphi \) has the average x since

$$\begin{aligned} \langle \varphi \rangle _{\mathcal {I}}=\mathop {\int }\limits _0^{1/2} \varphi _-(2t)\text{ d }t+\mathop {\int }\limits _{1/2}^1 \varphi _+(2t-1)\text{ d }t=\frac{\langle \varphi _-\rangle _{\mathcal {I}}+\langle \varphi _+\rangle _{\mathcal {I}}}{2}=x. \end{aligned}$$

Similarly, we have

$$\begin{aligned} \langle \psi \rangle _{\mathcal {I}}=\frac{\langle \psi \rangle _{\mathcal {I}}+\langle \psi \rangle _{\mathcal {I}}}{2}=y,\qquad \langle |\psi |^2 \rangle _{\mathcal {I}}=\frac{\langle |\psi _-|^2 \rangle _{\mathcal {I}}+\langle |\psi _+|^2 \rangle _{\mathcal {I}}}{2}=z \end{aligned}$$

and hence \(\psi \in {\mathcal {M}}(y,z)\) (the inequality \(\Vert \psi \Vert _{BMO}\le 1\) is directly inherited from the analogous estimate for \(\psi _\pm \)). Finally, we easily check that

$$\begin{aligned} -\langle \varphi \rangle _{\mathcal {I}}^2+S^2(\varphi )(t)={\left\{ \begin{array}{ll} |d|^2-\langle \varphi _-\rangle _{\mathcal {I}}^2 +S^2(\varphi _-)(2t) &{}\text{ for } t\in [0,1/2),\\ |d|^2-\langle \varphi _+\rangle _{\mathcal {I}}^2 +S^2(\varphi _+)(2t-1)&{} \text{ for } t\in [1/2,1]. \end{array}\right. } \end{aligned}$$

Therefore, by the very definition of \(U^0\),

$$\begin{aligned} U^0(x,s,y,z)&\ge \mathop {\int }\limits _{\mathcal {I}} V\left( \varphi ,\sqrt{z^2-|x|^2+S^2(\varphi )},\psi \right) \\&= \mathop {\int }\limits _{0}^{1/2} V\left( \varphi ,\sqrt{z^2-\langle \varphi \rangle _{\mathcal {I}}^2+S^2(\varphi )},\psi \right) \\&\quad +\mathop {\int }\limits _{1/2}^1 V\left( \varphi ,\sqrt{z^2-\langle \varphi \rangle _{\mathcal {I}}^2+S^2(\varphi )},\psi \right) \\&=\frac{1}{2}\bigg [\mathop {\int }\limits _{{\mathcal {I}}} V\left( \varphi _-,\sqrt{z^2+|d|^2-\langle \varphi _-\rangle ^2_{\mathcal {I}}+S^2(\varphi _-)},\psi _-\right) \\&\qquad +\mathop {\int }\limits _{{\mathcal {I}}} V\left( \varphi _+,\sqrt{z^2+|d|^2-\langle \varphi _+\rangle ^2_{\mathcal {I}}+S^2(\varphi _+)},\psi _+\right) \bigg ]. \end{aligned}$$

Taking the supremum over all \(\varphi _\pm \) and \(\psi _\pm \) as above yields (2.4). \(\square \)

3 Proof of Theorem 1.1

3.1 Proof of (1.2)

The desired estimate is of the form (2.1) with \(V(x,s,y)=x\cdot y-\sqrt{2}s\), and by a straightforward approximation, it is enough to establish it for simple functions only. By virtue of Theorem 2.1, the estimate will be proved if we find an element of \({\mathcal {U}}(V)\). Consider the function \(U:{\mathscr {D}}\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} U(x,s,y,z)=x\cdot y+\frac{s}{\sqrt{2}}(z-|y|^2-2). \end{aligned}$$

Some steps which lead to the discovery of this object can be found in the next subsection. Let us verify that the function enjoys all the required properties. The majorization (2.2) is equivalent to \(z-|y|^2\ge 0\), which is guaranteed in the definition of the domain \({\mathscr {D}}\). The condition (2.3), after straightforward manipulations, transforms into \(s(z-2)\le 0\), which follows from the estimates \(s\ge 0\) and \(z\in [0,1]\) (the latter being the consequence of \((x,|x|,0,z)\in {\mathscr {D}}\)). It remains to establish the concavity-type condition (2.4). Note that

$$\begin{aligned}&\frac{1}{2}\left[ U\big (x_-,\sqrt{s^2+|d|^2},y_-,z_-\big )+U\big (x_+,\sqrt{s^2+|d|^2},y_+,z_+\big )\right] \\&\qquad \qquad =x\cdot y+d\cdot e+\frac{\sqrt{s^2+|d|^2}}{\sqrt{2}}\big (z-|y|^2-|e|^2-2\big ) \end{aligned}$$

and hence (2.4) can be rewritten in the form

$$\begin{aligned} d\cdot e+\frac{\sqrt{s^2+d^2}-s}{\sqrt{2}}(z-|y|^2-2)-\frac{\sqrt{s^2+d^2}}{\sqrt{2}}|e|^2\le 0. \end{aligned}$$

We have \(z-|y|^2\le 1\) (since \((x,s,y,z)\in {\mathscr {D}}\)) and \(d\cdot e\le |d||e|\), so it is enough to show that

$$\begin{aligned} -\frac{\sqrt{s^2+d^2}-s}{\sqrt{2}}+|d||e|-\frac{\sqrt{s^2+d^2}}{\sqrt{2}}|e|^2\le 0. \end{aligned}$$
(3.1)

Consider the left-hand side above as a quadratic function of |e|; the discriminant of this function is equal to

$$\begin{aligned} |d|^2-2\sqrt{s^2+|d|^2}(\sqrt{s^2+|d|^2}-s)=|d|^2\left( 1-\frac{2\sqrt{s^2+|d|^2}}{\sqrt{s^2+|d|^2}+s}\right) \le 0. \end{aligned}$$

Therefore (3.1) holds and hence \(U\in {\mathcal {U}}(V)\). This completes the proof of (1.2).

3.2 Sharpness

The examples, which show that the constant \(\sqrt{2}\) is optimal in (1.2), have very complicated structure and their analysis is quite involved. Most of these technicalities an be avoided, by an appropriate use of Theorem 2.1, as we shall see below. Throughout this subsection, we let \({\mathcal {H}}={\mathbb {R}}\) and assume that the inequality (1.2) holds with some constant C. This estimate is of the form (2.1), with \(V(x,s,y)=xy-Cs\). Recall the definition (2.7) of the associated special function \(U^0\):

$$\begin{aligned} U^0(x,s,y,z)&=\sup \mathop {\int }\limits _{\mathcal {I}} \Big (\varphi \psi -C \sqrt{s^2-|x|^2+S^2(\varphi )}\Big )\text{ d }u, \end{aligned}$$

the supremum taken over all simple \(\varphi \) of average x and all \(\psi \in {\mathcal {M}}(y,z)\). We start with some homogeneity-type properties of \(U^0\) (which also give some hint how the function U used in the previous subsection was discovered).

Lemma 3.1

For any \(\lambda >0\) and \(a\in {\mathbb {R}}\), we have

$$\begin{aligned} U^0(\lambda x,\lambda s,y,z)= & {} \lambda U^0(x,s,y,z), \end{aligned}$$
(3.2)
$$\begin{aligned} U^0(x+a, s,y,z)= & {} ay+ U^0(x,s,y,z), \end{aligned}$$
(3.3)

and

$$\begin{aligned} U^0(x,s,y+a,z+2ya+a^2)= U^0(x,s,y,z)+ax. \end{aligned}$$
(3.4)

Consequently, \(U^0(x,s,y,z)=xy+s\zeta (z-y^2)\) for some function \(\zeta :[0,1]\rightarrow {\mathbb {R}}\).

Proof

Fix an arbitrary \(\varphi \) of average x and any \(\psi \in {\mathcal {M}}(y,z)\). Then \(\lambda \varphi \) is also simple and has average \(\lambda x\), so by the definition of \(U^0(\lambda x,\lambda s,y,z)\),

$$\begin{aligned} U^0(\lambda x,\lambda s,y,z)&\ge \mathop {\int }\limits _{\mathcal {I}} \Big (\lambda \varphi \psi -C \sqrt{\lambda ^2s^2-|\lambda x|^2+S^2(\lambda \varphi )}\Big )\text{ d }u\\&=\lambda \mathop {\int }\limits _{\mathcal {I}} \Big (\varphi \psi -C \sqrt{s^2-|x|^2+S^2(\varphi )}\Big )\text{ d }u. \end{aligned}$$

Taking the supremum over all \(\varphi \) and \(\psi \) as above, we get \(U^0(\lambda x,\lambda s,y,z)\ge \lambda U^0(x,s,y,z)\). Putting \({\bar{x}}=\lambda x\), \({\bar{s}}=\lambda s\), and \(\bar{\lambda }=\lambda ^{-1}\), we get the reverse bound (with x, s, \(\lambda \) replaced by \({\bar{x}}\), \({\bar{s}}\), \({\bar{\lambda }}\)). The identity (3.3) is shown similarly: for any \(a\in {\mathbb {R}}\) and any simple \(\varphi \) of average x, we have \(\langle \varphi +a\rangle _{\mathcal {I}}=x+a\) and hence for any \(\psi \in {\mathcal {M}}(y,z)\),

$$\begin{aligned} U^0(x+a,\lambda s,y,z)&\ge \mathop {\int }\limits _{\mathcal {I}} \Big ((\varphi +a) \psi -C \sqrt{s^2-|x+a|^2+S^2(\varphi +a)}\Big )\text{ d }u\\&=ay+ \mathop {\int }\limits _{\mathcal {I}} \Big (\varphi \psi -C \sqrt{s^2-|x|^2+S^2(\varphi )}\Big )\text{ d }u. \end{aligned}$$

Consequently, we get \(U^0(x+a,\lambda s,y,z)\ge ay+U^0(x,s,y,z)\) and the reverse bound follows by replacing a with \(-a\). The proof of the identity (3.4) is the same and is left to the reader. To show the final claim, we use (3.3) (with \(a=-x\)), (3.4) (with \(a=-y\)), and then (3.2) (with \(\lambda =s^{-1}\)) to obtain

$$\begin{aligned} \begin{aligned} U^0(x,s,y,z)&=xy+U^0(0,s,y,z)\\&=xy+U^0(0,s,0,z-y^2)=xy+sU^0(0,1,0,z-y^2). \end{aligned} \end{aligned}$$
(3.5)

Thus we let \(\zeta (u)=U^0(0,1,0,u)\). It remains to show that \(\zeta \) is finite (recall that the elements of \({\mathcal {U}}(V)\), in general, are allowed to take infinite values). But this is simple: applying (3.5) to \(x=s=1\), \(y=0\), and \(z=u\) gives \(\zeta (u)=U^0(1,1,0,u)\), which, by (2.3), is nonpositive. \(\square \)

We are ready to prove that \(C\ge \sqrt{2}\). Fix \(\varepsilon \in {\mathbb {R}}\) and apply the inequality (2.4) to \((x,s,y,z)=(1,1,0,1)\) and \((d,e,f)=(\sqrt{2}\varepsilon ,\varepsilon ,0)\) to obtain

$$\begin{aligned}&U^0(1,1,0,1)\nonumber \\&\quad \ge \frac{1}{2}\left[ U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,1)+U^0(1+\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},\varepsilon ,1) \right] .\nonumber \\ \end{aligned}$$
(3.6)

Now it follows from (2.4) that the function \(z\mapsto U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,z)\) is midpoint concave on \([\varepsilon ^2,1+\varepsilon ^2]\). Since it is bounded from below (by (2.8), we have \(U^0(x,s,y,z)\ge xy-Cs\) for all \((x,s,y,z)\in {\mathscr {D}}\)), it is merely concave and therefore

$$\begin{aligned} U^0(1-\sqrt{2}\varepsilon ,&\sqrt{1+2\varepsilon ^2},-\varepsilon ,1)\nonumber \\&\quad \ge \varepsilon ^2U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,\varepsilon ^2)\nonumber \\&\qquad +(1-\varepsilon ^2)U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,1+\varepsilon ^2). \end{aligned}$$
(3.7)

However, the condition (2.2) implies

$$\begin{aligned} U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,\varepsilon ^2)\ge -(1-\sqrt{2}\varepsilon )\varepsilon -C\sqrt{1+2\varepsilon ^2}, \end{aligned}$$

and by (3.4), we get

$$\begin{aligned} U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,1+\varepsilon ^2)= & {} -(1-\sqrt{2}\varepsilon )\varepsilon \\&+U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},0,1). \end{aligned}$$

Plugging these two observations into (3.7) gives

$$\begin{aligned}&U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},-\varepsilon ,1)\\&\quad \ge -(1-\sqrt{2}\varepsilon )\varepsilon -C\varepsilon ^2\sqrt{1+2\varepsilon ^2}+(1-\varepsilon ^2)U^0(1-\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},0,1)\\&\quad = -(1-\sqrt{2}\varepsilon )\varepsilon -C\varepsilon ^2\sqrt{1+2\varepsilon ^2}+(1-\varepsilon ^2)\sqrt{1+2\varepsilon ^2}\zeta (1). \end{aligned}$$

Replacing \(\varepsilon \) with \(-\varepsilon \), we see that we also have

$$\begin{aligned}&U^0(1+\sqrt{2}\varepsilon ,\sqrt{1+2\varepsilon ^2},\varepsilon ,1)\\&\qquad \ge (1+\sqrt{2}\varepsilon )\varepsilon -C\varepsilon ^2\sqrt{1+2\varepsilon ^2}+(1-\varepsilon ^2)\sqrt{1+2\varepsilon ^2}\zeta (1). \end{aligned}$$

Combining the last two inequalities with (3.6) gives

$$\begin{aligned} \zeta (1)\ge \sqrt{2}\varepsilon ^2-C\varepsilon ^2\sqrt{1+2\varepsilon ^2}+(1-\varepsilon ^2)\sqrt{1+2\varepsilon ^2}\zeta (1). \end{aligned}$$

Now observe that \((1-\varepsilon ^2)\sqrt{1+2\varepsilon ^2}=1+o(\varepsilon ^2)\) as \(\varepsilon \rightarrow 0\). Therefore, moving the term \((1-\varepsilon ^2)\sqrt{1+2\varepsilon ^2}\zeta (1)\) to the left, dividing by \(\varepsilon ^2\) and letting \(\varepsilon \rightarrow 0\) gives \(0\ge \sqrt{2}-C\), which is equivalent to the desired bound \(C\ge \sqrt{2}\).