1 Introduction

The Muskat problem was first introduced by Morris Muskat [30] as a model for the flow of two immiscible fluids through a porous medium. Since its introduction, this problem has received sustained attention in a variety of fields. It is used to model flows in oil reservoirs (water is injected into the oil well to drive oil extraction), and in hydrology to model flows of groundwater through aquifers.

In this paper we are interested in obtaining the global existence of weak solutions for the Muskat problem with surface tension, based on its gradient flow structure. We begin by introducing a variational formulation of the problem, which will motivate our subsequent analysis. The fluid evolution can be written as Darcy’s law

$$\begin{aligned} v_i+b^{-1}_i\nabla \delta _{\rho _i}{\mathcal {E}}(\varvec{\rho })=0, \end{aligned}$$
(1)

coupled with the continuity equation

$$\begin{aligned} \partial _t \rho _i +\nabla \cdot (\rho _i v_i)=0, \end{aligned}$$
(2)

where \(v_i\) is the velocity of phase i, \(\varvec{\rho }=(\rho _1,\rho _2)\) is the collection of relative concentrations for each phase, \(\nabla (\delta _{\rho _i}{\mathcal {E}})\) denotes the spacial gradient of the classical first variation of the free energy with respect to \(\rho _i\), and \(b_i>0\) (\(i=1,2\)) denotes constant mobilities. For convenience, throughout the rest of the paper, we will refer to \(\varvec{\rho }\) as a collection of density functions; however, one should note that \(\varvec{\rho }\) only encodes information about the volume occupied by the fluids and nothing about their mass.

The physical setting for our problem is a bounded, convex open domain \(\Omega \subset {\mathbb {R}}^d\) with smooth boundary. We shall suppose that the two fluids fill the entire domain, and that they are confined to \(\Omega \) for all time. We then take the internal energy to be a sum of three distinct terms:

$$\begin{aligned} {\mathcal {E}}(\varvec{\rho })={\mathcal {E}}_p(\varvec{\rho })+{\mathcal {E}}_s(\varvec{\rho })+\Phi (\varvec{\rho }). \end{aligned}$$
(3)

The first term in the energy, describing incompressibility and containment of the fluids, is given by

$$\begin{aligned} {\mathcal {E}}_p(\varvec{\rho })= {\left\{ \begin{array}{ll} 0, &{}\quad \text {if} \; \rho _1(x)+\rho _2(x)=1 \; \text {for a.e.} \, x\in \Omega \\ +\infty , &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(4)

The immiscibility of the fluids and the surface tension force arise from the highly non-convex interaction energy

$$\begin{aligned} {\mathcal {E}}_s(\varvec{\rho })= {\left\{ \begin{array}{ll} \frac{\sigma }{2}|D\rho _1|(\Omega )+\frac{\sigma }{2}| D\rho _2|(\Omega ), &{}\quad \text {if} \; \rho _1,\rho _2\in BV(\Omega ;\{0,1\})\ {{\mathrm{and}}}\ \rho _1(x)\rho _2(x)=0 \; \text {for a.e.} \, x\in \Omega \\ +\infty , &{}\quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(5)

where \(|D\rho _i|(\Omega )\) denotes the total variation of \(\rho _i\) in \(\Omega \) and \(\sigma >0\) is a surface tension constant. Finally, \(\Phi (\varvec{\rho })\) denotes the potential energy of the fluid configuration, i.e.

$$\begin{aligned} \Phi (\varvec{\rho })=\int _\Omega \Phi _1\,{\mathrm{d}}\rho _1+\int _\Omega \Phi _2\,{\mathrm{d}}\rho _2, \end{aligned}$$

where \(\Phi _1,\Phi _2:\Omega \rightarrow {\mathbb {R}}\) are given Lipschitz continuous potentials. A typical example is when one assumes these to be gravitational potentials, i.e.

$$\begin{aligned} \Phi _i= g_ix\cdot e_d, \quad g_i >0,\ \ i=1,2, \end{aligned}$$
(6)

where \(g_i\)’s are proportional to the specific gravity of each fluid.

Although the internal energy is singular, when \(\rho _1\) and \(\rho _2\) are separated by a smooth interface \(\Gamma := \partial \{\rho _1>0\}\cap \partial \{\rho _2>0\}\), one can formulate a classical solution to the Muskat problem equations (12). In the classical solution, the flow is driven by the pressure variables \(p_i\) for each phase, which are Lagrange multipliers generated by \({\mathcal {E}}_p\) above. The continuity equation becomes

figure a

and the pressure is determined by solving the free boundary problem

figure b

where \(\kappa \) denotes the mean curvature of \(\Gamma \), oriented to be positive when \(\{\rho _2>0\}\) is convex at the point, n denotes the outer normal along \(\partial \Omega \) and along \(\Gamma \), and \({\tilde{n}}\) denotes the co-normal vector orthogonal to \(\partial \Gamma \) and tangential to \(\Gamma \). Note that the final condition relating the co-normal vector at \(\partial \Gamma \cap \partial \Omega \) to the normal vector of \(\partial \Omega \) implies that \(\Gamma \) must meet \(\partial \Omega \) orthogonally (see Lemma 3.1 and Remark 3.2 for the weak formulation of this condition). To summarize the ideas in the formal derivation of (\(\hbox {MP}_1\))-(\(\hbox {MP}_2\)) from (1)–(2) using the definition of \({\mathcal {E}}\), we heuristically have \(\nabla \delta _{\rho _i}{\mathcal {E}}_p=\nabla p_i\), \(\nabla \delta _{\rho _i}\Phi =\nabla \Phi _i\) while the contribution of \(\nabla \delta _{\rho _i}{\mathcal {E}}_s\) will act only on \(\Gamma \) in the form of the curvature \(\kappa \).

Problems like (\(\hbox {MP}_2\)) received a lot of attention in the past decades. Most of the works focus on the zero surface tension model (\(\sigma =0\)) and well-posedness of regular solutions with graph property [2, 6, 8,9,10]. In the presence of surface tension, the problem has stronger regularity properties in stable settings [34], but still, topological singularities can occur in finite time, for instance when heavier fluid is placed on top of the lighter one [14]. Thus, our aim is to construct global-in-time weak solutions to the Muskat problem (\(\hbox {MP}_2\)), which exist past the formation of singularities.

To construct global-in-time solutions, we exploit the gradient flow structure of the Muskat problem. As noted by Otto in [31, 32], Darcy’s law can be approximated by the Euler–Lagrange equation for the minimizing movements scheme (or JKO [19] scheme) with time step size \(\tau >0\),

$$\begin{aligned} \varvec{\rho }^{n+1}=\mathop {{\mathrm{arg\,min}}}\limits _{\varvec{\rho }} \left\{ {\mathcal {E}}(\varvec{\rho })+\sum _{i=1}^2\frac{b_i}{2\tau } W_2^2(\rho _i, \rho _i^n)\right\} , \end{aligned}$$
(7)

where \(W_2(\rho _i, \rho _i^n)\) denotes the 2-Wasserstein or 2-Monge–Kantorovich distance. In this context, the squared \(W_2\) distance has a physical interpretation as the energy dissipated by friction as the fluids flow through the porous media.

Let us note that Wasserstein gradient flows of energies involving total variation terms have been considered before in the literature, though only in the case of one phase models (see e.g. [5, 26]), and hence with no incompressibility or interaction constraints. As a result, the techniques developed in those papers do not appear to be applicable here—the constrained two phase setting adds many additional difficulties.

Indeed, it is not easy to obtain a complete characterization of the solutions to the minimizing movements problem (7). The interaction energy (5) is sufficiently non-convex that problem (7) is non-convex for any \(\tau >0\). As a result, one must be careful in using duality to introduce the pressure as a Lagrange multiplier. Furthermore, we are interested in developing a scheme which could be used for numerical implementations. The formulation (7) is poorly suited for numerical methods. Optimizing over the non-convex constraint set \(\{\rho _1, \rho _2\in BV(\Omega ; \{0,1\}): \rho _1(x)\rho _2(x)=0\; \text {a.e.}\}\) is extremely difficult. For these reasons, we instead consider a relaxed version of minimizing movements scheme inspired by [13] and [22].

Approximation of the perimeter by the Heat Content

In our analysis, we replace the interaction energy \({\mathcal {E}}_s\) in (7) by the heat content energy

$$\begin{aligned} \text {HC}_{\varepsilon }(\varvec{\rho })&:= \sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{\Omega } (G_{\varepsilon } \star \rho _1)(x)\,{\mathrm{d}}\rho _2(x)\nonumber \\&=\sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{\Omega } \int _{{\mathbb {R}}^d} G(z) \,{\mathrm{d}}\rho _1(x+\sqrt{\varepsilon } z)\,{\mathrm{d}}\rho _2(x). \end{aligned}$$
(8)

Here \(G_\varepsilon :{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) stands for the standard heat kernel (with mean 0 and variance \(\varepsilon >0\)) and the densities \(\rho _i\) are assumed to be defined on all of \({\mathbb {R}}^d\) by extending them to zero off of \(\Omega \). Let us notice that in [13] and [22], for similar purposes the authors use periodic extensions. The analysis in both cases and the validity of results using both kinds of extensions is essentially the same.

Dating back to the work of De Giorgi, approximate perimeter energies have been used in the literature to study geometric variational problems (see for instance [29] and [1]). The use of the heat content energy to study the multi-phase mean curvature flow was first introduced by Esedoḡlu and Otto in [13]. It was observed in [13] that the threshold dynamics, a well known numerical scheme for mean curvature motion introduced by Merriman et al. [28], is precisely a minimizing movements scheme for the heat content energy. Esedoḡlu and Otto also showed that \(\text {HC}_\varepsilon \) \(\Gamma \)-converges (with respect to the \(L^1\) topology) to \({\mathcal {E}}_s\) as \(\varepsilon \rightarrow 0\). Building off of these results, Laux and Otto showed in [22] that under an energy convergence assumption, the threshold dynamics scheme produces weak solutions to the multi-phase motion by mean curvature in the limit \(\varepsilon \rightarrow 0\).

Our goal is to consider such a framework in the context of the Muskat problem by studying the minimizing movements scheme

$$\begin{aligned} \varvec{\rho }^{n+1}=\mathop {{\mathrm{arg\,min}}}\limits _{\varvec{\rho }} \left\{ {\mathcal {E}}_\varepsilon (\varvec{\rho })+\sum _{i=1}^2\frac{b_i}{2\tau } W_2^2(\rho _i, \rho _i^n)\right\} , \end{aligned}$$
(9)

where we used the notation

$$\begin{aligned} {\mathcal {E}}_\varepsilon (\varvec{\rho }):={\mathcal {E}}_p(\varvec{\rho })+ {{\mathrm{HC}}}_{\varepsilon }(\varvec{\rho })+\Phi (\varvec{\rho }). \end{aligned}$$

As we alluded above, the scheme (9) has a number of numerical advantages over (7). Unlike \({\mathcal {E}}_s\) which is neither convex nor concave, the heat content is a strictly concave functional of the densities. This concavity can be exploited to simplify numerical implementations, along the same lines as the linearization trick noted in Subsection 5.1 of [13]. After applying this trick, the resulting variational problem becomes convex, and thus, can be efficiently solved using the recently introduced back-and-forth method [18]. See Figs. 1, 2 and 3, for a demonstration of the numerical performance of the scheme.

Although the heat content in principle allows mixing of the phases, we shall show that the discrete in time solutions constructed by the JKO scheme always stay unmixed with a sharp interface between the phases for all time (see Proposition 2.3 below). This phenomenon is due to the fact that \({{\mathrm{HC}}}_\varepsilon \) behaves like a strictly concave functional (see Lemma 2.2). Thus, one retains the essential properties of the Muskat problem evolution.

In the context of the Muskat problem, the heat content also has a natural physical interpretation. In a discrete statistical mechanics model with N particles, surface tension can be seen to arise from short range interactions between particles in different phases (cf. [17]). Typically, the discrete surface tension takes the form

$$\begin{aligned} \frac{1}{N^2}\sum _{i\in P_1, j\in P_2}V(|x_i-x_j|)=\frac{1}{N^2}\sum _{i\in P_1,j\in P_2}\int _{{\mathbb {R}}^d}\int _{{\mathbb {R}}^d} V(|x-x'|)\delta (x-x_i)\delta (x'-x_j), \end{aligned}$$

where \(P_r\) is the particle index in each phase r, V is some decreasing function, and \(x_i\) is the location of particle i [17]. By taking the limit \(N\rightarrow \infty \) in above formula, one obtains an analogue of the heat content energy where the kernel \(G_{\varepsilon }\) is replaced with V. Here it is worth noting that we choose to work with the heat kernel for computational convenience, indeed a different choice of kernel may be more physically relevant.

Finally, let us also emphasize that the heat content approach can be naturally extended to the multiphase Muskat problem evolution (with any number of phases), without incurring any additional difficulties (just as in the case of [13] and [22]). This includes scenarios where the surface tension force depends on the phases that are interacting [13]. To present our ideas in the simplest possible way, we do not pursue the multiphase case in this paper.

Statement of our main results

From the relaxed minimizing movements scheme (9), we obtain a sequence of discrete in time approximations to the Muskat flow. When we take the time step \(\tau \) and the heat content approximation parameter \(\varepsilon \) to zero together, we hope to recover weak solutions to the Muskat problem. Our main results show that this is indeed the case under the assumption that there is no loss of perimeter when passing from discrete to continuous solutions. For the precise statements of the convergence results we refer to Theorems 3.1 and 3.2. In addition, we show that our weak formulation (see Definition 3.1) encodes all of the conditions in (\(\hbox {MP}_1\))–(\(\hbox {MP}_2\)) (see Lemma 3.1 and Remark 3.2).

Theorem 1.1

Let \(T>0\) be a fixed time horizon, let \(\varepsilon ,\tau >0\) be fixed and let \((\rho _1^{\varepsilon ,\tau },\rho _{2}^{\varepsilon ,\tau })\) be the discrete in time interpolations of the densities obtained from the minimizing movements scheme (9). Let moreover \(p^{\varepsilon ,\tau }\) stand for the discrete in time interpolations between the scalar pressure fields, obtained as Lagrange multipliers associated to the incompressibility constraint in (9). Then

  • There exists a family \((A^{\varepsilon ,\tau }_t)_{t\in [0,T]}\subseteq \Omega \) of measurable sets such that \(\rho _{1}^{\varepsilon ,\tau }(t,\cdot )=\chi _{A^{\varepsilon ,\tau }_t}\) and \(\rho _{2}^{\varepsilon ,\tau }(t,\cdot )=\chi _{\Omega \setminus A^{\varepsilon ,\tau }_t}\).

  • There exists \(\rho _i\in L^1([0,T];BV(\Omega ;\{0,1\}))\cap {{\mathrm{AC}}}^2([0,T];{{\mathscr {P}}}(\Omega ))\) (\(i=1,2\)) such that as \(\max \{\varepsilon ,\tau \}\downarrow 0\) and along a subsequence \((\rho _{1}^{\varepsilon ,\tau }, \rho _{2}^{\varepsilon ,\tau })\rightarrow (\rho _1,\rho _2)\) strongly in \(L^1([0,T]\times \Omega )\times L^1([0,T]\times \Omega )\). Moreover, \(\rho _1, \rho _2\in L^1([0,T];BV(\Omega ))\) and they are also characteristic functions that sum up to one, i.e.

    $$\begin{aligned} \rho _1(t,\cdot ) =\chi _{A_t\cap \Omega }\hbox { and }\rho _2(t,\cdot ) = \chi _{\Omega \setminus A_t}, \end{aligned}$$
    (10)

    for a measurable family of sets \((A_t)_{t\in [0,T]}\) which are of finite perimeter.

  • There exists a scalar pressure field \(p\in L^2([0,T]; (C^{0,\alpha }(\Omega ))^*)\) such that along a subsequence \(\nabla p^{\varepsilon ,\tau }{\mathop {\rightharpoonup }\limits ^{\star }}\nabla p\) weakly-\(\star \) in \(L^2([0,T]; (C^{1}(\Omega ))^*)\) as \(\max \{\varepsilon ,\tau \}\downarrow 0\).

  • Finally, under the assumption of energy convergence

    $$\begin{aligned} \int _0^T {\mathrm{HC}}_\varepsilon (\rho _{1}^{\varepsilon ,\tau }, \rho _{2}^{\varepsilon ,\tau })\,{\mathrm{d}}t \rightarrow \int _0^T \left( \sum _i \frac{\sigma }{2}\int _{\Omega } |D\rho _i|\right) \,{\mathrm{d}}x\,{\mathrm{d}}t, \end{aligned}$$
    (EC)

    \((\rho _i, v_i,p)\) with \(i=1,2\) solves, in the weak sense (see Definition 3.1), the problem (\(\hbox {MP}_1\))–(\(\hbox {MP}_2\)). Here formally \(v_i\) corresponds to \(-b_i^{-1}(\nabla p + \nabla \Phi _i)\).

It is challenging to study qualitative or geometric properties of our solutions. We will illustrate heuristically in Section 4 that, even for global minimizers of the energy, there are diverse possibilities depending on the values of the specific gravity and volumes of the two phases. This is verified with several numerical simulations in Section 4.1.

Remarks on our results

  1. (1)

    Let us underline the fact that our discrete-time scheme (9) produces minimizers that are characteristic functions of a partition of \(\Omega \). To prove this fact, we exploit the strict concavity of the heat content along the admissible set. This seems to be an interesting property in its own right, and ensures that numerical implementations of the scheme maintain a sharp interface at every time step.

  2. (2)

    From the point of view of Wasserstein gradient flows, an interesting remark on our results is that while one cannot expect strong compactness for the interpolated densities \((\rho _{1}^{\varepsilon ,\tau },\rho _{2}^{\varepsilon ,\tau })\) in the case when \(\varepsilon >0\) is fixed and \(\tau \downarrow 0\), when sending both parameters to 0 in the same time, we regain the strong compactness. This phenomenon is mainly due to the fact that in the limit as \(\max \{\tau ,\varepsilon \}\downarrow 0\) we recover total variation estimates on the densities. This compactness is obtained via a standard Aubin–Lions type argument.

  3. (3)

    The energy convergence assumption (EC) is rather natural and the same as the ones given in [22] and [23]. This assumption ensures that there is no sudden loss of boundary between phases in the limit \(\varepsilon \rightarrow 0\). If there is a loss of interface then one cannot obtain weak solutions. Indeed, if two components of the support of \(\rho _1\) merge into each other and remove a sizable part of its boundary, one can expect a discontinuous change of the pressure term p in the entire domain \(\Omega \), creating an inconsistency between the discrete and limiting evolutions.

Replacing the assumption with a direct argument has been discussed for mean curvature flows [11, 40]. Unfortunately, these results rely strongly on certain properties of the mean curvature flow (especially the comparison principle), which do not hold for fourth order equations like the Muskat problem.

There are also results in the literature [24, 35] which eliminate the assumption for third order curvature driven flows (specifically the Stefan problem and the Mullins–Sekerka flow respectively). However, the solutions constructed in [24] are discontinuous in time (the interface may experience sudden jumps in time) and the formulation in [35] only keeps track of the regular part of the interface. Let us also note that the Stefan problem and the Mullins–Sekerka flow are not similar to the Muskat problem. In particular, the jump condition for the pressure across the interface is different for the Muskat problem, which leads to qualitatively different behavior.

Paper summary

The rest of the paper is organized as follows: in Section 2, we derive the basic properties of the minimizing movements scheme (9) and construct our discrete-time quantities. We begin by showing that solutions to the minimization problem are characteristic functions at every time step of the discrete scheme. We then derive the existence of pressure as a Lagrange multiplier for the incompressibility constraint and obtain the Euler–Lagrange equation for the minimization problem. Our derivation of these equations were inspired by previous results from [12, 20, 21, 27]. In particular, the definition of the discrete in time pressure variable is very much inspired by [27].

In Section 3, we take \(\tau \) and \(\varepsilon \) to zero together to obtain weak solutions to the Muskat problem, under the assumption that the internal energy of the discrete solutions converges to the internal energy of the limiting solutions. The main task in this section amounts to showing that one can pass to the limit in the Euler–Lagrange equation obtained in Section 2. This can be done using the standard theory for Wasserstein gradient flows if \(\varepsilon \) is held fixed. However, the joint limit, \(\tau , \varepsilon _{\tau }\rightarrow 0\), requires an adaptation of the arguments of [22] to the case of Wasserstein gradient flows. Let us underline that one can rely entirely on the results of [22] to pass to the limit the weak curvature equation. An adaptation of the Aubin–Lions type argument from [22] can be done to get compactness of the density terms. The only difference here is that we are using \(W_2\) as metric while in [22] a different metric is used, but this does not impose crucial difficulties. An interesting link to the flow-exchange technique introduced in [26], which is a typical tool for \(W_2\) gradient flows, is also pointed out. Last, we developed the necessary estimates and compactness results on the pressure terms. These are new and clearly were not present in the setting of mean curvature flows. The compactness that we get on the pressure is in the sense of distributions.

Finally, in Section 4, we conclude the paper with a demonstration of the numerical method on several examples and a discussion on the global minimizers of the approximated internal energy associated to the Muskat problem. While this discussion remains at the heuristic level, our conjectures are supported by the equilibrium states attained in our numerical experiments (c.f. Figs. 1, 2 and 3). We end the paper with an appendix section, where we recall the results from [22] that are used when passing to the limit the weak curvature equation.

2 The Wasserstein Minimizing Movements Scheme for the Heat Content

2.1 Some Preliminary Results

Recall that the setting for our problem is a smooth convex domain \(\Omega \subset {\mathbb {R}}^d\) and without loss of generality by scaling we assume that \({\mathscr {L}}^d(\Omega )=2\). By \({{\mathscr {P}}}(\Omega )\) we denote the space of Borel probability measures on \({\mathbb {R}}^d\) supported on \({\overline{\Omega }}\). \({{\mathscr {P}}}^{{\mathrm{ac}}}(\Omega )\) stands for the elements of \({{\mathscr {P}}}(\Omega )\) that are absolutely continuous with respect to .

Let \(\Phi _1,\Phi _2:\Omega \rightarrow {\mathbb {R}}\) be given Lipschitz potentials and let us recall the definition of the potential energy \(\Phi :{{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega )\rightarrow {\mathbb {R}}\), given as

$$\begin{aligned} \Phi (\varvec{\rho }):=\int _{\Omega }\Phi _1\,{\mathrm{d}}\rho _1+\int _{\Omega }\Phi _2\,{\mathrm{d}}\rho _2. \end{aligned}$$

Let \(\varepsilon >0\). We consider the heat content \({{\mathrm{HC}}}_\varepsilon :{{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega )\rightarrow {\mathbb {R}}\) defied in (8), using the standard heat kernel \(G_\varepsilon :{\mathbb {R}}^d\rightarrow {\mathbb {R}}\), i.e.

$$\begin{aligned} G_\varepsilon (x)=\frac{1}{(4\pi \varepsilon )^{d/2}}e^{-\frac{|x|^2}{4\varepsilon }}. \end{aligned}$$

We also use the notations \(K_\varepsilon ,G:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) to denote \(K_\varepsilon (x)=\sigma \sqrt{\frac{2\pi }{\varepsilon }}G_\varepsilon (x)\) and \(G(x)=G_1(x).\) We have the following preliminary results:

Lemma 2.1

Let \({{\mathrm{HC}}}_\varepsilon \) be defined as in (8). We have the following properties:

  1. (1)

    \({{\mathrm{HC}}}_\varepsilon \) is bounded from below and continuous w.r.t. the weak-\(\star \) convergence on \({{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega ).\)

  2. (2)

    \({{\mathrm{HC}}}_\varepsilon \) displacement \(\lambda \)-convex on \({{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega )\), with \(\lambda =-\frac{\sigma \sqrt{2\pi }}{(4\pi )^{d/2}}\frac{1}{\varepsilon ^{(d+3)/2}}\).

Proof

(1) Is immediate by the definition of \({{\mathrm{HC}}}_\varepsilon \).

To show (2), it is enough to show that the function \(z\mapsto K_\varepsilon (z)\) is \(\lambda \)-convex in the classical sense for some \(\lambda \in {\mathbb {R}}\). We have

$$\begin{aligned} D^2 K_\varepsilon (x)=\frac{\sigma \sqrt{2\pi }}{2(4\pi )^{d/2}}\frac{1}{\varepsilon ^{(d+3)/2}} e^{-\frac{|x|^2}{4\varepsilon }}\left( \frac{1}{2\varepsilon }x\otimes x- I_d\right) . \end{aligned}$$

Since the matrix \(x\otimes x\) is positive semidefinite for any \(x\in {\mathbb {R}}^d\), setting \(\lambda =-\frac{1}{2(4\pi )^{d/2}}\frac{\sigma \sqrt{2\pi }}{\varepsilon ^{(d+3)/2}}\), we have that the matrix \(D^2K_\varepsilon (x)-\lambda I_d\) is positive semidefinite for any \(x\in {\mathbb {R}}^d,\) which implies in particular that \(K_\varepsilon \) is \(\lambda \)-convex.

We conclude similarly as in [12, Lemma 2.1] the displacement \(2\lambda \)-convexity of \({{\mathrm{HC}}}_\varepsilon \) on \({{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega ).\) \(\square \)

Lemma 2.2

If \(P\subset {{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega )\) denotes the set of pairs \((\rho _1, \rho _2)\) such that \(\rho _1+\rho _2=1\) a.e. in \(\Omega \), then the heat content is strictly concave along line segments in P.

Proof

For any pair \((\rho _1, \rho _2)\in P\) we may write \(\rho _2(x)=1-\rho _1(x)\). Thus, extending the densities by 0 outside of \(\Omega \), we have

$$\begin{aligned} {{\mathrm{HC}}}_{\varepsilon }(\rho _1,\rho _2)= & {} \sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{{\mathbb {R}}^d} (G_{\varepsilon }\star \rho _1)(x)(1-\rho _1(x))\,{\mathrm{d}}x\\= & {} \sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{{\mathbb {R}}^d} (G_{\varepsilon }\star \rho _1)(x)\,{\mathrm{d}}x-\sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{{\mathbb {R}}^d} (G_{\varepsilon }\star \rho _1)(x)\rho _1(x)\,{\mathrm{d}}x \end{aligned}$$

Ignoring the constant multiples, both terms can be expressed conveniently in the Fourier domain:

$$\begin{aligned} {\hat{\rho }}_1(0)-\int _{{\mathbb {R}}^d} {\hat{G}}(\xi \sqrt{\varepsilon })|{\hat{\rho }}_1(\xi )|^2\,{\mathrm{d}}\xi . \end{aligned}$$

Now the strict concavity follows immediately as \({\hat{G}}(\xi \sqrt{\varepsilon })>0\) for all \(\xi \in {\mathbb {R}}^d\). \(\square \)

2.2 The Minimizing Movements Scheme

Now we are ready to discuss the minimizing movements scheme. Our first result confirms the existence of minimizers, and shows that any minimizing configuration \(\varvec{\rho }=(\rho _1,\rho _2)\) is a completely unmixed partition of the domain. As we will see, the phases stay unmixed thanks to the concavity of the heat content.

Proposition 2.3

Suppose that \(\rho _1^n, \rho _2^n\in {{\mathscr {P}}}(\Omega )\) and let \(\tau >0\) and \(b_1,b_2>0\). Then the set of minimizers of the problem

$$\begin{aligned}&\inf \left\{ {{\mathrm{HC}}}_{\varepsilon }(\varvec{\rho })+\Phi (\varvec{\rho })+\frac{b_1}{2\tau }W_2^2(\rho _1, \rho _1^n)+\frac{b_2}{2\tau }W_2^2(\rho _2, \rho _2^n):\right. \nonumber \\&\quad \left. \rho _1,\rho _2\in {{\mathscr {P}}}^{{\mathrm{ac}}}(\Omega ), \rho _1+\rho _2=1\ {\mathrm{a.e.}}\right\} \end{aligned}$$
(11)

is non-empty and any solution \((\rho _1^*,\rho _2^*) \in {{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega )\) is the characteristic function of a partition of \(\Omega \).

Proof

The existence of a solution of the optimization problem is an easy consequence of the weak lower semicontinuity of the objective functional and the weak-\(\star \) compactness of \({{\mathscr {P}}}(\Omega )\times {{\mathscr {P}}}(\Omega )\). Let us remark that the constraint \(\rho _1+\rho _2=1\) a.e. is closed under weak convergence, since \(\int _{\Omega }\rho _1\,{\mathrm{d}}x+\int _{\Omega }\rho _2\,{\mathrm{d}}x={\mathscr {L}}^d(\Omega ).\)

To show that an arbitrary solution \((\rho _1^*,\rho _2^*)\) is the characteristic functions of a partition of \(\Omega \), let us rewrite equivalently the minimization problem in terms of transport plans \(\pi _i\in {{\mathscr {P}}}(\Omega \times \Omega )\). Recall that \(\pi _i\) is a plan between \(\rho _i^n\) and \(\rho _i\), whenever

$$\begin{aligned} \int _{\Omega \times \Omega }\varphi (x)\,{\mathrm{d}}\pi _i(x,y)= & {} \int _{\Omega }\varphi (x)\,{\mathrm{d}}\rho _i^n(x)\ \ {{\mathrm{and}}}\\ \int _{\Omega \times \Omega }\psi (y)\,{\mathrm{d}}\pi _i(x,y)= & {} \int _{\Omega }\psi (y)\,{\mathrm{d}}\rho _i(y) \end{aligned}$$

for any \(\varphi ,\psi \in C(\Omega ).\) Since we are always working with measures \(\rho _i^n,\rho _i\) that are absolutely continuous w.r.t. , in the new minimization problem below, as we will see, we can restrict our search to plans that have absolutely continuous marginals w.r.t. . For a measure \(\theta \in {{\mathscr {P}}}(\Omega \times \Omega )\), we use the notation \(\theta ^1:=(P^x)_\#\theta \) and \(\theta ^2:=(P^y)_\#\theta \) to denote its marginals (here \(P^x,P^y:\Omega \times \Omega \rightarrow \Omega \) stand for the canonical projections from \(\Omega \times \Omega \) onto \(\Omega \)).

Thus, we aim to solve

$$\begin{aligned} \begin{aligned}&\displaystyle \inf \left\{ {\mathcal {F}}(\pi _1,\pi _2)+{\mathcal {G}}(\pi _1,\pi _2) +\sum _{i=1}^2\int _{\Omega \times \Omega }\frac{b_i}{2\tau }|x-y|^2\,{\mathrm{d}}\pi _i(x,y)\right\} =:\inf {\mathcal {S}}(\pi _1,\pi _2)\\&\displaystyle {{\mathrm{subject\ to\ }}} \pi _i^1=\rho ^n_i \ {{\mathrm{and}}}\ \sum _{i=1}^2 \pi _i^2=1\ {\mathrm{a.e.}}. \end{aligned} \end{aligned}$$
(12)

Here we denote

$$\begin{aligned} {\mathcal {G}}(\pi _1,\pi _2):=\int _{\Omega \times \Omega }\sum _{i=1}^2\Phi _i(y)\,{\mathrm{d}}\pi _i(x,y). \end{aligned}$$

We define, moreover,

$$\begin{aligned} {\mathcal {F}}(\pi _1,\pi _2)&:=\sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{\Omega } \int _{\Omega \times {\mathbb {R}}^d}G(x_1-\sqrt{\varepsilon }y)\,{\mathrm{d}}\pi _1(x,y)\,{\mathrm{d}}x_1\\&\quad -\sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{\Omega \times \Omega } \int _{\Omega \times {\mathbb {R}}^d} G(y_1-\sqrt{\varepsilon }y_2)\,{\mathrm{d}}\pi _1(x_2,y_2)\,{\mathrm{d}}\pi _1(x_1,y_1) \end{aligned}$$

and

$$\begin{aligned} {\mathcal {S}}(\pi _1,\pi _2):={\mathcal {F}}(\pi _1,\pi _2)+{\mathcal {G}}(\pi _1,\pi _2) +\sum _{i=1}^2\int _{\Omega \times \Omega }\frac{b_i}{2\tau }|x-y|^2\,{\mathrm{d}}\pi _i(x,y), \end{aligned}$$

where we have extended the second marginals of \(\pi _i\) by 0 outside of \(\Omega \).

The minimization is carried out over a weakly compact set and \({\mathcal {S}}\) is weakly lower semicontinuous and bounded below, thus minimizers exist. If \(\varvec{\pi }^*=(\pi _1,\pi _2)\) is a minimizer of (12) then we can construct a minimizer \(\varvec{\rho }^*=(\rho _1^*,\rho _2^*)\) of the original problem by taking \(\rho _i^*=(P^y)_\#\pi ^*_i\).

Now we consider the properties of minimizers. Clearly, \({\mathcal {F}}\) is Gâteaux differentiable at \(\varvec{\pi }^*\) in the sense that there exists \(\delta {\mathcal {F}}(\varvec{\pi }^*)\in C(\Omega \times \Omega )\) such that

$$\begin{aligned} \langle \delta {\mathcal {F}}(\varvec{\pi }^*), \varvec{\theta }\rangle&:=\sigma \sqrt{\frac{2\pi }{\varepsilon }} \int _{\Omega }\int _{\Omega \times {\mathbb {R}}^d}G(x_1-\sqrt{\varepsilon }y)\,{\mathrm{d}}\theta _1(x,y)\,{\mathrm{d}}x_1\nonumber \\&\quad -\sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{\Omega \times \Omega }\int _{\Omega \times {\mathbb {R}}^d} G(y_1 -\sqrt{\varepsilon }y_2)\,{\mathrm{d}}\pi _1(x_2,y_2)\,{\mathrm{d}}\theta _1(x_1,y_1)\nonumber \\&\quad -\sigma \sqrt{\frac{2\pi }{\varepsilon }}\int _{\Omega \times \Omega }\int _{\Omega \times {\mathbb {R}}^d} G(y_1-\sqrt{\varepsilon }y_2)\,{\mathrm{d}}\theta _1(x_2,y_2)\,{\mathrm{d}}\pi _1(x_1,y_1), \end{aligned}$$
(13)

where \(\varvec{\pi }+t\varvec{\theta }\) is any admissible perturbation of \(\varvec{\pi }.\) Similarly, as the other terms in the definition of \({\mathcal {S}}\) are linear in \(\pi \), these are in the same way differentiable, therefore \({\mathcal {S}}\) is Gâteaux differentiable in this sense.

From Lemma 2.2 it follows that \({\mathcal {S}}\) is concave along line segments \(\varvec{\pi }+t\varvec{\theta }\) (\(t\in (-1,1)\)), where \(\varvec{\pi }\) is a feasible point and \(\varvec{\theta }=(\theta _1,\theta _2)\) is a feasible direction at \(\varvec{\pi }\) i.e.

$$\begin{aligned} \theta _i^1(x)=0\ {\mathrm{a.e.}}\ x\in \Omega ,\ \ \int _{\Omega \times \Omega }\,{\mathrm{d}}\theta _i(x,y)=0\ \ {{\mathrm{and}}}\ \ \sum _{i=1}^2 \theta _i^2(y)=0 \ {\mathrm{a.e.}}\ y\in \Omega , \end{aligned}$$
(14)

and for some \(\delta >0\)

$$\begin{aligned} \pi _i+t\theta _i\ge 0, \end{aligned}$$
(15)

in the sense of signed measures, for all \(t\in [0,\delta )\). Furthermore, it follows from Lemma 2.2 that \({\mathcal {S}}\) is strictly concave on line segments \(\varvec{\pi }+t\varvec{\theta }\), if for some i the marginal \(\theta _i^2(y) \) is not 0 for almost every y. Therefore, if \(\varvec{\pi }^*=(\pi _1^*, \pi _2^*)\) is a minimizer and \(\varvec{\theta }\) is a feasible direction at \(\varvec{\pi ^*}\) with at least one non-trivial marginal then

$$\begin{aligned} \langle \delta {\mathcal {S}}(\varvec{\pi }^*), \varvec{\theta }\rangle >0, \end{aligned}$$

where \(\delta {\mathcal {S}}(\varvec{\pi }^*)\) stands for the first variation of \({\mathcal {S}}\) at \(\varvec{\pi }^*\) defined as is (13).

Let \(\varvec{\pi }\) be a feasible solution and let \(\rho _i(y)= \pi _i^2(y)\). Now suppose that there exists \(0<\alpha <1\) such that the set \(\Omega _{\alpha }=\{y\in \Omega : \rho _1(y), \rho _2(y) \in (\alpha , 1-\alpha )\}\) has positive measure. Partition \(\Omega _{\alpha }\) into two sets \(E_1, E_2\) of equal measure. Then there exist measure preserving maps \(T:E_1\rightarrow E_2\) and \(S:E_2\rightarrow E_1\) such that \(S\circ T\) and \(T\circ S\) are the identity almost everywhere on their respective domains (for example one may choose \(T=\nabla \psi \) to be the optimal transport map between the densities \(\varvec{1}_{E_1}\) and \(\varvec{1}_{E_2}\) and \(S=\nabla \psi ^*\)). Now we construct a feasible direction \(\varvec{\theta }\) at \(\varvec{\pi }\) as follows: let \(r(y)=\frac{\rho _1(T(y))}{\rho _2(y)}\) for \(y\in E_1\) and we define the signed measures \(\theta _1,\theta _2\in {{\mathscr {M}}}(\Omega \times \Omega )\) as

i.e.

$$\begin{aligned} \int _{\Omega \times \Omega }\varphi (x,y)\,{\mathrm{d}}\theta _1(x,y)&:=\int _{\Omega \times E_1}\varphi (x,y)\,{\mathrm{d}}\rho _1^n(x)\,{\mathrm{d}}\rho _1(T(y))\\&\quad -\int _{\Omega \times E_2}\varphi (x,y)\,{\mathrm{d}}\rho _1^n(x)\,{\mathrm{d}}\rho _1(y) \end{aligned}$$

and

i.e.

$$\begin{aligned} \int _{\Omega \times \Omega }\varphi (x,y)\,{\mathrm{d}}\theta _2(x,y)&:=-\int _{\Omega \times E_1}\varphi (x,y)r(y)\,{\mathrm{d}}\rho _2^n(x)\,{\mathrm{d}}\rho _2(y)\nonumber \\&\quad +\int _{\Omega \times E_2}\varphi (x,y)r(S(y))\,{\mathrm{d}}\rho _2^n(x)\,{\mathrm{d}}\rho _2(S(y)) \end{aligned}$$

for any \(\varphi \in C(\Omega \times \Omega ).\)

Since \(\theta _1^2(y)=\rho _1(T(y))>\alpha \) a.e. on \(E_1\) we see that \(\varvec{\theta }\) has a nontrivial marginal.

Let us now check that \(\varvec{\theta }\) is feasible, i.e. it satisfies (14) and (15). If \(\beta <\min \{1-\alpha ,\alpha \}\) then \(\varvec{\pi }\pm \beta \varvec{\theta }\) defines a non-negative measure. Next, we check that \(\varvec{\theta }\) satisfies \(\theta _i^1(x)=0\) for a.e. \(x\in \Omega \) and \(i=1,2\) and \(\sum _{i=1}^2 \theta _i^2(y) =0\) for a.e. \(y\in \Omega \). For \(\varphi \in C(\Omega )\), we have

$$\begin{aligned} \int _\Omega \varphi (x)\,{\mathrm{d}}\theta ^1_1(x)= & {} \int _{\Omega \times \Omega }\varphi (x) \,{\mathrm{d}}\theta _1(x,y)\\= & {} \int _\Omega \varphi (x)\,{\mathrm{d}}\rho _1^n(x)\int _{E_1}\rho _1(T(y))\,{\mathrm{d}}y - \int _\Omega \varphi (x)\rho _1^n(x)\int _{E_2} \rho _1(y)\,{\mathrm{d}}y=0 \end{aligned}$$

and

$$\begin{aligned} \int _{\Omega }\varphi (x) \,{\mathrm{d}}\theta _2^1(x)&= \int _{\Omega \times \Omega }\varphi (x)\,{\mathrm{d}}\theta _1(x,y)\\&=-\int _{\Omega }\varphi (x)\,{\mathrm{d}}\rho _2^n(x)\int _{E_1} r(y)\rho _2(y)\,{\mathrm{d}}y \\&\quad + \int _\Omega \varphi (x)\,{\mathrm{d}}\rho _2^n(x)\int _{E_2} r(S(y))\rho _2(S(y))\,{\mathrm{d}}y=0, \end{aligned}$$

where we have used that both T and S are measure preserving between \(E_1\) and \(E_2\) and vice-versa, respectively. Let us notice that these arguments also show (by taking \(\varphi \equiv 1\)) that

$$\begin{aligned} \int _{\Omega \times \Omega } \,{\mathrm{d}}\theta _i(x,y) =0,\ \ i=1,2. \end{aligned}$$

Now, for \(\varphi \in C(\Omega ),\) we have

$$\begin{aligned} \sum _{i=1}^2\int _{\Omega }\varphi (y)\,{\mathrm{d}}\theta _i^2(y)&=\sum _{i=1}^2\int _{\Omega \times \Omega }\varphi (y)\,{\mathrm{d}}\theta _i(x,y)\\&=\int _{\Omega }\,{\mathrm{d}}\rho _1^n(x)\int _{E_1}\varphi (y)\rho _1(T(y))\,{\mathrm{d}}y\\&\quad -\int _{\Omega }\,{\mathrm{d}}\rho _1^n(x)\int _{E_2}\varphi (y)\,{\mathrm{d}}\rho _1(y)\\&\quad -\int _\Omega \,{\mathrm{d}}\rho _2^n(x)\int _{E_1}\varphi (y)r(y)\,{\mathrm{d}}\rho _2(y)\\&\quad +\int _\Omega \,{\mathrm{d}}\rho _2^n(x)\int _{E_2}\varphi (y)r(S(y))\rho _2(S(y))\,{\mathrm{d}}y\\&=\int _{E_1}\phi (y)[\rho _1(T(y))-r(y)\rho _2(y)]\,{\mathrm{d}}y \\&\quad + \int _{E_2}\phi (y)[r(S(y))\rho _2(S(y))-\rho _1(y)]\,{\mathrm{d}}y = 0. \end{aligned}$$

Note that our arguments in fact show that \(-\varvec{\theta }\) is also a feasible direction at \(\varvec{\pi }\). \(\pm \varvec{\theta }\) have nontrivial marginals, and it is not possible to have both \(\langle \delta {\mathcal {S}}(\varvec{\pi }^*), \varvec{\theta }\rangle >0\) and \(-\langle \delta {\mathcal {S}}(\varvec{\pi }^*), \varvec{\theta }\rangle >0\). Therefore, \(\varvec{\pi }\) cannot be a minimizer. This allows us to conclude that for any minimizer \(\varvec{\rho }^*\) of the original problem each density \(\rho _i^*\) takes values \(\{0,1\}\) almost everywhere. \(\square \)

2.3 Optimality Conditions and Construction of the Pressure Variables

In the next Lemma, we give a more complete characterization of the minimizers in terms of certain necessary inequalities. In particular, this is the first place where we see the appearance of the pressure variable, which plays an essential role in all of the subsequent analysis. Note that for convenience we express this result using the notation \(K_\varepsilon :=\sigma \sqrt{\frac{2\pi }{\varepsilon }}G_{\varepsilon }.\)

Lemma 2.4

Let \((\rho _1^*,\rho _2^*)\) be an optimizer in (11) and let \((\rho _1,\rho _2)\) a pair of probability measures such that \(\rho _1+\rho _2=1\) a.e. Then,

  1. (i)

    We have the optimality condition

    $$\begin{aligned}&\int _{\Omega }\left[ K_\varepsilon \star \rho _2^*+\Phi _1+b_1\varphi _1/\tau \right] (\rho _1-\rho _1^*)\,{\mathrm{d}}x\nonumber \\&\quad +\int _{\Omega }\left[ K_\varepsilon \star \rho _1^*+\Phi _2+b_2\varphi _2/\tau \right] (\rho _2-\rho _2^*)\,{\mathrm{d}}x\ge 0, \end{aligned}$$
    (16)

    for a suitable pair of Kantorovich potentials \((\varphi _1,\varphi _2)\) in the optimal transport of \(\rho _1^*\) onto \(\rho _1^n\) and \(\rho _2^*\) onto \(\rho _2^n\), respectively.

  2. (ii)

    There exists a function \(p:\Omega \rightarrow {\mathbb {R}}\) that is Lipschitz continuous on \({{\,\mathrm{spt}\,}}(\rho ^*_i)\), \(i=1,2\) and is such that for any probability densities \(\rho _1, \rho _2\) with \(\rho _1+\rho _2=1\) a.e. we have

    $$\begin{aligned}&\int _{\Omega }\left[ K_\varepsilon \star \rho _2^*+\Phi _1+b_1\varphi _1/\tau +p \right] (\rho _1-\rho _1^*)\,{\mathrm{d}}x\nonumber \\&\quad +\int _{\Omega }\left[ K_\varepsilon \star \rho _1^*+\Phi _2+b_2\varphi _2/\tau +p \right] (\rho _2-\rho _2^*)\,{\mathrm{d}}x\ge 0, \end{aligned}$$
    (17)

    moreover, we have

    $$\begin{aligned} \begin{array}{l} \nabla p=-\nabla K_\varepsilon \star \rho _2^*-\nabla \Phi _1-b_1\nabla \varphi _1/\tau \ \ {\mathrm{a.e.\ in \ spt}}(\rho _1^*)\\ {{\mathrm{and}}}\\ \nabla p=-\nabla K_\varepsilon \star \rho _1^*-\nabla \Phi _2-b_2\nabla \varphi _2/\tau \ \ {\mathrm{a.e.\ in \ spt}}(\rho _2^*). \end{array} \end{aligned}$$
    (18)

Proof

Let \((\rho _1,\rho _2)\) be a pair of probability measures such that \(\rho _1+\rho _2=1\) a.e. For \(\delta \in [0,1]\) let us consider the competitors \((\rho _1^\delta ,\rho _2^\delta ):=(\rho _1^*+\delta (\rho _1-\rho _1^*),\rho _2^*+\delta (\rho _2-\rho _2^*))\), which by construction satisfy the constraint.

By optimality, we have

$$\begin{aligned}&{{\mathrm{HC}}}_\varepsilon (\rho _1^\delta ,\rho _2^\delta )-{{\mathrm{HC}}}_\varepsilon (\rho _1^*,\rho _2^*) +\Phi (\rho _1^\delta ,\rho _2^\delta )-\Phi (\rho _1^*,\rho _2^*)\\&\quad +\frac{1}{2\tau }\left( W_2^2(\rho _1^\delta ,\rho _1^n)-W_2^2 (\rho _1^*,\rho _1^n)+W_2^2(\rho _2^\delta ,\rho _2^n)-W_2^2(\rho _2^*,\rho _2^n)\right) \ge 0. \end{aligned}$$

Using the exact same argument as in [27, Lemma 3.1] to develop the Wasserstein part on the one hand and the first variations of \({{\mathrm{HC}}}_\varepsilon \) and \(\Phi \) on the other hand, we find (16).

For (ii) (similarly as in [21, Proposition 4.7]), let us notice first that (16) can be written in the form

$$\begin{aligned} \int _{\Omega }\left[ K_\varepsilon \star \rho _2^*+\Phi _1+b_1\varphi _1/\tau \right] h_1\,{\mathrm{d}}x+\int _{\Omega }\left[ K_\varepsilon \star \rho _1^*+\Phi _2+b_2\varphi _2/\tau \right] h_2\,{\mathrm{d}}x\ge 0, \end{aligned}$$

where \((h_1,h_2)\in L^\infty (\Omega )\times L^\infty (\Omega )\) is such that \(\rho _1^*+\delta h_1+\rho _2^*+\delta h_2=1\) and \(\rho _i^*+\delta h_i\in [0,1]\) for \(i=1, 2\). We know from Proposition 2.3 that \(\rho _1^*, \rho _2^*\) forms a partition of \(\Omega \). Therefore, we must take \(h_1\le 0\) on \({{\,\mathrm{spt}\,}}(\rho _1^*)\), and \(h_1\ge 0\) on \({{\,\mathrm{spt}\,}}(\rho _2^*)\) (and vice-versa for \(h_2\)). To preserve the constraint \(\rho _1^*+\delta h_1+\rho _2^*+\delta h_2=1\) and the mass of each density, we must also take \(h_1+h_2=0\) a.e. and \(\int _{\Omega }h_1\,{\mathrm{d}}x=\int _{\Omega }h_2\,{\mathrm{d}}x=0\).

Now if we set \(h_2=-h_1\), we find that

$$\begin{aligned} \int _{\Omega }\left[ K_\varepsilon \star \rho _2^*+\Phi _1+b_1\varphi _1/\tau - \Big (K_\varepsilon \star \rho _1^*+\Phi _2+b_2\varphi _2/\tau \Big ) \right] h_1\,{\mathrm{d}}x \ge 0, \end{aligned}$$

for any \(h_1\in L^\infty (\Omega )\) with 0 mean such that \(h_1\le 0\) a.e. on \({{\,\mathrm{spt}\,}}(\rho _1)\). This implies that there exist constants \(C_1, C_2\in {\mathbb {R}}\) such that

$$\begin{aligned} K_\varepsilon \star \rho _2^*+\Phi _1+b_1\varphi _1/\tau -C_1\le & {} K_\varepsilon \star \rho _1^*+\Phi _2+b_2\varphi _2/\tau -C_2\ \ {\mathrm{a.e.}}\; {{\,\mathrm{spt}\,}}(\rho _1)\\ K_\varepsilon \star \rho _1^*+\Phi _2+b_2\varphi _2/\tau -C_2\le & {} K_\varepsilon \star \rho _2^*+\Phi _1+b_1\varphi _1/\tau -C_1 \ \ {\mathrm{a.e.}}\; {{\,\mathrm{spt}\,}}(\rho _2) \end{aligned}$$

From here, we can define the pressure variable as

$$\begin{aligned} p&:=C_1-K_\varepsilon \star \rho _2^*-\Phi _1-b_1\varphi _1/\tau \quad {\text {a.e. on}}\ {{\,\mathrm{spt}\,}}(\rho _1^*)\ \ {{\mathrm{and}}}\nonumber \\ p&:=C_2-K_\varepsilon \star \rho _1^*-\Phi _2-b_2\varphi _2/\tau \quad {\text {a.e. on}}\ {{\,\mathrm{spt}\,}}(\rho _2^*). \end{aligned}$$
(19)

Since the Kantorovich potentials are Lipschitz continuous and the other terms are smooth, we find that p is Lipschitz continuous (note this regularity may degenerate as \(\tau \downarrow 0\)). By construction, p clearly satisfies the inequality in (17), and in particular the value of the l.h.s. is equal to zero. Since the functions under consideration are all Lipschitz continuous, we obtain (18). \(\square \)

2.4 Continuous in Time Solutions for \(\varepsilon >0\) Fixed

Now we are ready to begin constructing time interpolations from the discrete scheme. In this section we restrict ourselves to the case where \(\varepsilon \) is held fixed. In this special case, we can use standard arguments from the theory of Wasserstein gradient flows to obtain continuous in time equations in the limit \(\tau \downarrow 0\). When \(\varepsilon \) is held fixed, we have strong compactness on the pressure term. However, similarly to the models in [20, 21], we will lack strong compactness on the density variables, which would mean that in the limit when \(\tau \downarrow 0\), only a weaker version of the system will be available (see (25)). Let us also note that later on in Section 3, we will need some of the pressure estimates provided below when we take \(\varepsilon \) to zero along with \(\tau \).

Let \(T>0\) be a given time horizon and \(N\in {\mathbb {N}}\) and \(\tau >0\) such that \(N\tau =T.\) Let \(\varepsilon >0\) be fixed. By now, it is standard how to construct weak solutions to PDEs that have gradient flow structure, using minimizing movement schemes as

$$\begin{aligned} \begin{aligned}&\displaystyle \left( {\rho }^{n+1, \tau }_1,{\rho }^{n+1, \tau }_2\right) \\&\quad \displaystyle \in \mathop {{\mathrm{arg\,min}}}\limits \left\{ {{\mathrm{HC}}}_{\varepsilon }(\rho _1,\rho _2)+\Phi (\rho _1,\rho _2)+\frac{b_1}{2\tau }W_2^2(\rho _1, \rho _1^n)+\frac{b_2}{2\tau }W_2^2(\rho _2, \rho _2^n):\right. \\&\qquad \left. \rho _1,\rho _2\in {{\mathscr {P}}}^{{\mathrm{ac}}}(\Omega ), \rho _1+\rho _2=1\ {\mathrm{a.e.}}\right\} . \end{aligned} \end{aligned}$$
(20)

Let \(\rho _i^\tau :[0,T]\rightarrow {\mathcal {P}}(\Omega )\) be defined as

$$\begin{aligned} \rho _i^\tau (t):=\rho _i^{n+1},\ {{\mathrm{if}}}\ t\in [n\tau ,(n+1)\tau ). \end{aligned}$$
(21)

We define the corresponding velocities, pressures and momentum variables as

$$\begin{aligned} \begin{aligned} v_i^\tau (t,\cdot )&:=\frac{\nabla \varphi _i^{n+1}}{\tau },\ {{\mathrm{if}}}\ t\in [n\tau ,(n+1)\tau ),\\ p_i^\tau (t,\cdot )&:=p_i^{n+1},\ {{\mathrm{if}}}\ t\in [n\tau ,(n+1)\tau ),\\ E_i^\tau (t,\cdot )&:=v_i^\tau (t,\cdot )\rho _i^\tau ,\ {{\mathrm{if}}}\ t\in [n\tau ,(n+1)\tau ), \end{aligned} \end{aligned}$$
(22)

where \(\varphi _i^{n+1}\) and \(p_i^{n+1}\) are defined in Lemma 2.4. Following the same steps as in [20, 27], the analysis boils down to obtain a sufficient amount of uniform (in \(\tau \)) estimates and compactness for the previously obtained functions, then pass to the limit \(\tau \downarrow 0.\)

Also, as additional tools we construct the corresponding continuous in time (geodesic) interpolations between the densities and the corresponding velocities and momentum variables, \(({{\tilde{\rho }}}_i^\tau ,{{\tilde{v}}}_i^\tau ,{{\tilde{E}}}_i^\tau )\). We refer to [20, Section 3.1] (see also [27, 37]) for the precise construction. In particular, as a consequence of this construction, we have

$$\begin{aligned} \partial _t{{\tilde{\rho }}}_i^\tau +\nabla \cdot {{\tilde{E}}}_i^\tau =0, \end{aligned}$$

in the sense of distribution on \((0,T)\times \Omega \). Let us comment on the role of the geodesic interpolations. These interpolations (beside the more standard piecewise constant ones) in the context of \(W_2\)-gradient flows were first used by Santambrogio (see the discussion in [37, Section 8.3]). Their role is the following: for any \(\tau \), these interpolations, since pieces of geodesic curves, by definition solve continuity equations. Since the piecewise constant interpolations match the geodesic ones at node points \(n\tau \), if both of them converge, they need to converge to the same limit. Therefore, one automatically has a continuity equation as the limit of piecewise constant interpolations. An alternative way (which is more often used in the literature) would be to say that the piecewise constant interpolations solve a continuity equation up to an error term, then one would need to show that this error term is converging to zero as the time discretization parameter tends to zero.

Based on the same techniques as in [20, 27], it is easy to obtain the following estimates:

Lemma 2.5

Let \((\rho _i^\tau , v_i^\tau ,E_i^\tau )\) and \(({{\tilde{\rho }}}_i^\tau ,{{\tilde{v}}}_i^\tau ,{{\tilde{E}}}_i^\tau )\) be the previously constructed piecewise constant and continuous in time interpolations, respectively. Then there exists \(C>0\) independent of \(\tau >0\) and depending only on \({{\mathrm{HC}}}_\varepsilon (\rho _{1,0},\rho _{2,0})\) such that

  1. (i)

    \(W_2(\rho _{i}^\tau (t),\rho _{i}^\tau (s))\le C\sqrt{t-s+\tau }\) and \(W_2({{\tilde{\rho }}}_{i}^\tau (t),{{\tilde{\rho }}}_{i}^\tau (s))\le C\sqrt{t-s}\) for any \(0\le s\le t\le T.\) Moreover, up to passing to subsequences \((\rho _i^\tau )_{\tau >0}\) and \(({{\tilde{\rho }}}_i^\tau )_{\tau >0}\) converge (uniformly with respect to \(W_2\)) as \(\tau \downarrow 0\) to the same limit.

  2. (ii)

    \((v_i^\tau )_{\tau >0}\) is uniformly bounded in \(L^2([0,T]; L^2_{\rho _i^\tau })\).

  3. (iii)

    \((E_i^\tau )_{\tau >0}\) and \(({{\tilde{E}}}_i^\tau )_{\tau >0}\) are uniformly bounded in \({{\mathscr {M}}}^d([0,T]\times \Omega )\) and up to passing to subsequences they have the same distributional limits as \(\tau \downarrow 0\).

  4. (iv)

    \(\displaystyle \int _0^T\int _{{\mathbb {R}}^d}|\nabla G_\varepsilon \star \rho _i^\tau |\,{\mathrm{d}}x\,{\mathrm{d}}t\le CT\,{{\mathrm{HC}}}_{\varepsilon }(\rho _{1,0},\rho _{2,0})\).

  5. (v)

    \(\displaystyle \int _0^T\int _{\Omega }|\rho _i^\tau (t,x+\delta d)-\rho _i^\tau (t,x)|\,{\mathrm{d}}x\,{\mathrm{d}}t\le CT(\delta +\sqrt{\varepsilon }){{\mathrm{HC}}}_{\varepsilon }(\rho _{1,0},\rho _{2,0})\) for any \(d\in {\mathbb {R}}^d\) with \(|d|=1\) and any \(\delta >0\).

  6. (vi)

    \(\displaystyle \int _0^T\int _{\Omega }|\nabla p^\tau |^2\,{\mathrm{d}}x\,{\mathrm{d}}t\le C\int _0^T\int _{\Omega }\frac{1}{\varepsilon ^2}\left( \left( G_{2\varepsilon }\star \rho _2^\tau \right) ^2\rho _1^\tau +\left( G_{2\varepsilon }\star \rho _1^\tau \right) ^2\rho _1^\tau \right) \,{\mathrm{d}}x\,{\mathrm{d}}t + C\), and as a consequence, \((\nabla p^\tau )_{\tau >0}\) is uniformly bounded in \(L^2([0,T]\times \Omega ;{\mathbb {R}}^d)\) and \(\left( p^\tau -\fint _{\Omega }p^\tau (\cdot ,x)\,{\mathrm{d}}x\right) _{\tau >0}\) is uniformly bounded in \(L^2([0,T]\times \Omega )\) by a constant of the form \(TC(1+1/\varepsilon ^2).\)

  7. (vii)

    \((\nabla p^\tau )_{\tau >0}\) is uniformly bounded in \(L^2([0,T]; (C^1(\Omega ))^*),\) independently of \(\tau \) and \(\varepsilon \). In particular, \(\nabla p^\tau \) is uniformly bounded in \({\mathscr {D}}'((0,T)\times \Omega ;{\mathbb {R}}^d)\) and \(\nabla p^\tau (t,\cdot )\) defines a uniformly bounded distribution of order one, for a.e. \(t\in [0,T]\).

Proof

Let us notice that the proofs of point (i), (ii) and (iii) follow the exact same lines of the proofs of [27, Lemma 3.3] and [20, Lemma 3.3], so we omit them.

From Proposition 2.3 we know that that \(\rho ^n_i\)’s are characteristic functions, therefore the proofs of (iv) and (v) follow the exact same lines as the proof of [22, Lemma 2.4.].

Let us give the details on (vi). From the identities (18) from Lemma 2.4, we have that there exists \(C>0\) (independent of \(\tau >0\), which might increase from one inequality to the next) such that

$$\begin{aligned}&\int _0^T\int _{\Omega }|\nabla p^\tau |^2(\rho _1^\tau +\rho _2^\tau )\,{\mathrm{d}}x\,{\mathrm{d}}t\\&\quad \le C\sum _{n=1}^N\tau \int _{\Omega }\left( (|\nabla K_\varepsilon \star \rho _2^n|^2+|\nabla \Phi _1|^2)\rho _1^n+(|\nabla K_\varepsilon \star \rho _1^n|^2+|\nabla \Phi _2|^2)\rho _2^n\right) \,{\mathrm{d}}x\\&\qquad +\frac{C\tau }{\tau ^2}\sum _{n=1}^N\sum _{i=1}^2 b_iW_2^2(\rho _i^n,\rho _i^{n-1})\\&\quad \le C\int _0^T\int _{\Omega }\frac{1}{\varepsilon ^2}\left( \left( G_{2\varepsilon }\star \rho _2^\tau \right) ^2\rho _1^\tau +\left( G_{2\varepsilon }\star \rho _1^\tau \right) ^2\rho _1^\tau \right) \,{\mathrm{d}}x\,{\mathrm{d}}t + C, \end{aligned}$$

where we have used the uniform bounds on \((\rho _i^\tau )_{\tau >0}\) from the previous points and

$$\begin{aligned} |\nabla K_\varepsilon \star \rho _i^n|^2=\frac{2\pi \sigma }{\varepsilon }\left( |\nabla G_\varepsilon |\star \rho _i^n\right) ^2\le \frac{2\pi \sigma }{\varepsilon }\left( \frac{1}{\sqrt{\varepsilon }} G_{2\varepsilon }\star \rho _i^n\right) ^2=\frac{2\pi \sigma }{\varepsilon ^2}\left( G_{2\varepsilon }\star \rho _i^n\right) ^2 \end{aligned}$$

Since \(\rho _1^n+\rho _2^n=1\) a.e., this previous bound implies that \((\nabla p^\tau )_{\tau >0}\) is uniformly bounded in \(L^2([0,T]\times \Omega ;{\mathbb {R}}^d).\) As a consequence of Poincaré’s inequality, we have that \(\left( p^\tau -\fint _{\Omega }p^\tau (\cdot ,x)\,{\mathrm{d}}x\right) _{\tau >0}\) is uniformly bounded in \(L^2([0,T]\times \Omega )\). Both uniform bounds have the form \(TC(1+1/\varepsilon ^2).\)

To show (vii), let \(\xi \in C^1([0,T]\times \Omega ;{\mathbb {R}}^d)\). Fix \(t\in [n\tau ,(n+1)\tau )\). Taking the inner product of both sides in (18) with \(\xi (t,\cdot )\) and integrating on \(\Omega \) w.r.t. \(\rho _i^{n+1}\) (we drop the dependence on t in the notation of \(\xi \)), we obtain

$$\begin{aligned} \int _{\Omega }\nabla p^{n+1}\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x= & {} -\sigma \sqrt{2\pi }\int _{\Omega }\frac{1}{\sqrt{\varepsilon }}\nabla G_\varepsilon \star \rho _2^{n+1}\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x\nonumber \\&-\int _{\Omega }\nabla \Phi _1\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x - \int _{\Omega }\frac{b_1}{\tau }\nabla \varphi _1^{n+1}\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x, \end{aligned}$$
(23)

and interchanging the roles of \(\rho _1^{n+1}\) and \(\rho _2^{n+1}\), we get

$$\begin{aligned} \int _{\Omega }\nabla p^{n+1}\cdot \xi \rho _2^{n+1}\,{\mathrm{d}}x= & {} -\sigma \sqrt{2\pi }\int _{\Omega }\frac{1}{\sqrt{\varepsilon }}\nabla G_\varepsilon \star \rho _1^{n+1}\cdot \xi \rho _2^{n+1}\,{\mathrm{d}}x\nonumber \\&-\int _{\Omega }\nabla \Phi _2\cdot \xi \rho _2^{n+1}\,{\mathrm{d}}x - \int _{\Omega }\frac{b_2}{\tau }\nabla \varphi _2^{n+1}\cdot \xi \rho _2^{n+1}\,{\mathrm{d}}x. \end{aligned}$$
(24)

First, we have

$$\begin{aligned}&\int _{\Omega }\frac{1}{\sqrt{\varepsilon }}\nabla G_\varepsilon \star \rho _2^{n+1}\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x+\int _{\Omega }\frac{1}{\sqrt{\varepsilon }}\nabla G_\varepsilon \star \rho _1^{n+1}\cdot \xi \rho _2^{n+1}\,{\mathrm{d}}x\\&\quad =\frac{1}{2\varepsilon }\int _{{\mathbb {R}}^d}\int _{\Omega }z G(z) \left[ \rho _2^{n+1}(x+\sqrt{\varepsilon }z)\cdot \xi (x)\rho _1^{n+1}(x)\right. \\&\qquad \left. +\rho _1^{n+1}(x+\sqrt{\varepsilon }z)\cdot \xi (x)\rho _2^{n+1}(x)\right] \,{\mathrm{d}}x\,{\mathrm{d}}z\\&\quad =\frac{1}{2\varepsilon }\int _{{\mathbb {R}}^d}\int _{\Omega }z G(z) \left[ \rho _2^{n+1}(x+\sqrt{\varepsilon }z)\cdot \xi (x)\rho _1^{n+1}(x)\right. \\&\qquad \left. +\rho _1^{n+1}(x)\cdot \xi (x-\sqrt{\varepsilon }z)\rho _2^{n+1}(x-\sqrt{\varepsilon }z)\right] \,{\mathrm{d}}x\,{\mathrm{d}}z\\&\quad =\frac{1}{2\varepsilon }\int _{\Omega }\int _{{\mathbb {R}}^d}z G(z) \left[ \rho _2^{n+1}(x+\sqrt{\varepsilon }z)\cdot \xi (x)\rho _1^{n+1}(x)\right. \\&\qquad \left. -\rho _1^{n+1}(x)\cdot \xi (x+\sqrt{\varepsilon }z)\rho _2^{n+1}(x+\sqrt{\varepsilon }z)\right] \,{\mathrm{d}}z\,{\mathrm{d}}x\\&\quad =\frac{1}{2\varepsilon }\int _{\Omega }\int _{{\mathbb {R}}^d}z G(z)\rho _2^{n+1} (x+\sqrt{\varepsilon }z)\rho _1^{n+1}(x)\cdot \left[ \xi (x)-\xi (x+\sqrt{\varepsilon }z)\right] \,{\mathrm{d}}z\,{\mathrm{d}}x\\&\quad =\frac{1}{2\sqrt{\varepsilon }}\int _{\Omega }\int _{{\mathbb {R}}^d}\int _1^0 z G(z)\rho _2^{n+1}(x+\sqrt{\varepsilon }z)\rho _1^{n+1}(x)\cdot (D\xi (x+s\sqrt{\varepsilon }z)z)\,{\mathrm{d}}s\,{\mathrm{d}}z\,{\mathrm{d}}x, \end{aligned}$$

where in the second and third equalities we have used the change of variables \(x\mapsto x-\sqrt{\varepsilon }z\) and \(z\mapsto -z\), respectively and the fundamental theorem of calculus in the last equality. Now, since

$$\begin{aligned} |z|^2 G(z)\le \alpha G(z/\beta ),\ \forall z\in {\mathbb {R}}^d, \end{aligned}$$

for some universal constants \(\alpha ,\beta >0\), by the previous chain of equalities, we obtain

$$\begin{aligned}&\Bigg |\int _{\Omega }\frac{1}{\sqrt{\varepsilon }}\nabla G_\varepsilon \star \rho _2^{n+1}\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x+\int _{\Omega }\frac{1}{\sqrt{\varepsilon }}\nabla G_\varepsilon \star \rho _1^{n+1}\cdot \xi \rho _2^{n+1}\,{\mathrm{d}}x\Bigg |\\&\quad \le C{{\mathrm{HC}}}_{2\varepsilon }(\rho _1^{n+1},\rho _2^{n+1})\Vert D\xi (t,\cdot )\Vert _{L^\infty }\\&\quad \le C{{\mathrm{HC}}}_{\varepsilon }(\rho _1^0,\rho _2^0)\Vert D\xi (t,\cdot )\Vert _{L^\infty }, \end{aligned}$$

where, in the last inequality we used the monotonicity of \({{\mathrm{HC}}}_\varepsilon \) along the sequence \(\left( \rho _1^n,\rho _2^n\right) _n\).

Furthermore, we have that

$$\begin{aligned} \int _{\Omega }\frac{1}{\tau }|\nabla \varphi _i^{n+1}| |\xi |\rho _i^{n+1}\,{\mathrm{d}}x=\int _{\Omega }|v_i^{n+1}|\rho _i^{n+1}|\xi |\,{\mathrm{d}}x \end{aligned}$$

and

$$\begin{aligned} \Bigg |\int _{\Omega }\nabla \Phi _1\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x+\int _{\Omega }\nabla \Phi _1\cdot \xi \rho _1^{n+1}\,{\mathrm{d}}x\Bigg |\le C\Vert \xi \Vert _{L^1}. \end{aligned}$$

Using the fact that we have piecewise constant interpolations, integrating the last equality in time on [0, T], we get

$$\begin{aligned} \int _0^T\int _{\Omega }|v_i^\tau |\rho _i^\tau |\xi |\,{\mathrm{d}}x\,{\mathrm{d}}t \le \left( \int _0^T\int _{\Omega }|v_i^\tau |^2\rho _i^\tau \,{\mathrm{d}}x\,{\mathrm{d}}t\right) ^{\frac{1}{2}}\left( \int _0^T\int _{\Omega }|\xi |^2\,{\mathrm{d}}x\,{\mathrm{d}}t\right) ^{\frac{1}{2}}. \end{aligned}$$

Now, adding together (23) and (24), then integrating in time on [0, T] and using (ii), we obtain

$$\begin{aligned} \Bigg |\int _0^T\int _{\Omega }\nabla p^\tau \cdot \xi \,{\mathrm{d}}x\,{\mathrm{d}}t\Bigg |\le C\Vert D\xi \Vert _{L^1(C^0)}+ C\Vert \xi \Vert _{L^2(L^2)}\le C\Vert \xi \Vert _{L^2(C^{1})}, \end{aligned}$$

where the constant \(C>0\) is independent of \(\tau >0\) and \(\varepsilon >0\). The thesis of follows. \(\square \)

Now we can use the above pressure estimates to derive a system of continuous in time equations in the limit \(\tau \rightarrow 0\) when \(\varepsilon \) is held fixed.

Theorem 2.1

Let \(\varepsilon >0\), \(\rho _{1,0},\rho _{2,0}\in {{\mathscr {P}}}(\Omega )\) such that \(\rho _{1,0}+\rho _{2,0}=1\) a.e. There exists \(\rho _i\in {{\mathrm{AC}}}^2([0,T];{{\mathscr {P}}}(\Omega ))\), \(i=1,2\), \(\overline{p}\in L^2([0,T];H^1(\Omega ))\) and \(\zeta _1,\zeta _2\in L^2([0,T]\times \Omega ;{\mathbb {R}}^d)\) such that \(\rho _1+\rho _2=1\) a.e. in \([0,T]\times \Omega \), \(b_1\zeta _1+b_2\zeta _2=\nabla \overline{p}\) a.e. in \([0,T]\times \Omega \) and the system

$$\begin{aligned} \left\{ \begin{array}{ll} \partial _t\rho _1-\nabla \cdot \left[ b_1^{-1}\rho _1(\nabla K_\varepsilon \star \rho _2+\nabla \Phi _1)+\zeta _1\right] =0, &{} (0,T)\times \Omega ,\\ \partial _t\rho _2-\nabla \cdot \left[ b_2^{-1}\rho _2(\nabla K_\varepsilon \star \rho _1+\nabla \Phi _2)+\zeta _2\right] =0, &{} (0,T)\times \Omega ,\\ \rho _1(0,\cdot )=\rho _{1,0}, \quad \rho _2(0,\cdot )=\rho _{2,0}, &{} \Omega . \end{array} \right. \end{aligned}$$
(25)

is satisfied in the sense of distributions on \((0,T)\times \Omega .\)

It remains open whether in the previous theorem we have strong convergence \(\rho _i^\tau \rightarrow \rho _i\) in \(L^2([0,T]\times \Omega )\) as \(\tau \downarrow 0\), and in particular whether one can claim that \(\zeta _i=b_i^{-1}\rho _i\nabla \overline{p}\).

Proof of Theorem 2.1

Using the uniform (in \(\tau \)) estimates in Lemma 2.5, we have the existence of \(\rho _i\in {{\mathrm{AC}}}^2([0,T];{{\mathscr {P}}}(\Omega ))\), \(i=1,2\) such that up to passing to a subsequence, both \((\rho _1^\tau ,\rho _2^\tau )\) and \(({{\tilde{\rho }}}_1^\tau ,{{\tilde{\rho }}}_2^\tau )\) converge to them uniformly in time, w.r.t. \(W_2\).

Since \(\left( p^\tau -\fint _{\Omega }p^\tau (\cdot ,x)\,{\mathrm{d}}x\right) _{\tau >0}\) is uniformly bounded in \(L^2([0,T];H^1(\Omega ))\), there exists \(\overline{p}\in L^2([0,T];H^1(\Omega ))\) such that up to passing to a subsequence, \(p^\tau -\fint _{\Omega }p^\tau (\cdot ,x)\,{\mathrm{d}}x\rightharpoonup \overline{p}\), weakly in \(L^2([0,T]\times \Omega )\) and \(\nabla p^\tau \rightharpoonup \nabla \overline{p}\), weakly in \(L^2([0,T]\times \Omega ;{\mathbb {R}}^d)\) as \(\tau \downarrow 0\).

We only need to identify the limits of the momentum variables \((E_i^\tau )_{\tau >0}\). From the weak convergence of the density variables, we have that up to passing to a subsequence, \(\rho _1^\tau \nabla K_\varepsilon \star \rho _2^\tau {\mathop {\rightharpoonup }\limits ^{\star }}\rho _1\nabla K_\varepsilon \star \rho _2\), as \(\tau \downarrow 0\) and similarly \(\rho _2^\tau \nabla K_\varepsilon \star \rho _1^\tau {\mathop {\rightharpoonup }\limits ^{\star }}\rho _2\nabla K_\varepsilon \star \rho _1\), and \(\rho _i^\tau \nabla \Phi _i{\mathop {\rightharpoonup }\limits ^{\star }}\rho _i\nabla \Phi _i\), as \(\tau \downarrow 0\) as vector measures.

Furthermore, since \((\rho _i^\tau \nabla p^\tau )_{\tau >0}\) is uniformly bounded in \(L^2([0,T]\times \Omega ;{\mathbb {R}}^d)\), there exists \(\zeta _i\in L^2([0,T]\times \Omega ;{\mathbb {R}}^d)\), \(i=1,2\) such that up to passing to a subsequence \(b_i^{-1}\rho _i^\tau \nabla p^\tau \rightharpoonup \zeta _i\) weakly in \(L^2([0,T]\times \Omega ;{\mathbb {R}}^d)\), as \(\tau \downarrow 0\).

Last, by the previous arguments, we have

$$\begin{aligned} b_1 b_1^{-1}\rho _1^\tau \nabla p^\tau +b_2 b_2^{-1}\rho _2^\tau \nabla p^\tau =\nabla p^\tau \rightharpoonup \nabla \overline{p}=b_1\zeta _1+b_2\zeta _2, \end{aligned}$$

as \(\tau \downarrow 0\) in \(L^2([0,T]\times \Omega ;{\mathbb {R}}^d).\) Therefore, the thesis of the theorem follows. \(\square \)

It is open whether the continuum in time densities in Theorem 2.1 are characteristic functions. Indeed, since we can only guarantee that the discrete densities converge weakly to the continuum densities, the characteristic function property may be lost in the limit. We point out that due to the energy bounds, the densities are “almost" characteristic functions, i.e. we have

Lemma 2.6

For \((\rho _1^\tau ,\rho _2^\tau )\) given as above with \(\varepsilon >0\) fixed, for any \(\alpha \in (0,1)\) we have

$$\begin{aligned} {\mathscr {L}}^d( \alpha \le \rho _i^\tau \le 1-\alpha ) \le \frac{3\sqrt{\varepsilon }}{\sigma {\sqrt{2\pi }} \alpha (1-\alpha )}{{\mathrm{HC}}}_{\varepsilon }(\rho _1^{\tau }, \rho _2^{\tau }) \hbox { for all } \tau . \end{aligned}$$

Proof

Let us set \(i=1\), the other case will be parallel. We have

$$\begin{aligned} {\mathscr {L}}^d(\alpha \le \rho _1^\tau \le 1-\alpha )\le & {} \frac{1}{\alpha (1-\alpha )}\int _{\Omega } \rho ^{\tau }_1(x)(1-\rho ^{\tau }_1(x))\,{\mathrm{d}}x\\= & {} \frac{1}{\alpha (1-\alpha )}\int _{\Omega } \left[ \rho ^{\tau }_1(x)-\rho ^{\tau }_1(x)^2\right] \,{\mathrm{d}}x. \end{aligned}$$

By Jensen’s inequality, the above is smaller than

$$\begin{aligned}&\frac{1}{\alpha (1-\alpha )}\int _{\Omega }\left[ \rho ^{\tau }_1(x)-(G_{\varepsilon }\star \rho ^{\tau }_1)(x)^2\right] \,{\mathrm{d}}x\\&\quad \le \frac{\sqrt{\varepsilon }}{\sigma {\sqrt{2\pi }}\alpha (1-\alpha )}{{\mathrm{HC}}}_{\varepsilon }(\rho _1^{\tau }, \rho _2^{\tau })\\&\qquad +\frac{1}{\alpha (1-\alpha )}\int _{\Omega } (G_{\varepsilon }\star \rho ^{\tau }_1)(x)\Big (\rho ^{\tau }_1(x) -(G_{\varepsilon }\star \rho ^{\tau }_1)(x)\Big )\,{\mathrm{d}}x. \end{aligned}$$

We then estimate

$$\begin{aligned}&\frac{1}{\alpha (1-\alpha )}\int _{\Omega } (G_{\varepsilon }\star \rho ^{\tau }_1)(x)\Big (\rho ^{\tau }_1(x) -(G_{\varepsilon }\star \rho ^{\tau }_1)(x)\Big )\,{\mathrm{d}}x\\&\quad \le \frac{1}{\alpha (1-\alpha )}\int _{\Omega } \int _{{\mathbb {R}}^d} G(z) |\rho _1^{\tau }(x)-\rho _1^{\tau }(x-\sqrt{\varepsilon } z)|\,{\mathrm{d}}z\,{\mathrm{d}}x \end{aligned}$$

Since \(\rho ^{\tau }_1\) and \(\rho ^{\tau }_2\) take values in \(\{0,1\}\), we have \( |\rho _1^{\tau }(x)-\rho _1^{\tau }(x-\sqrt{\varepsilon } z)|\le \rho ^{\tau }_1(x)\rho ^{\tau }_2(x-\sqrt{\varepsilon }z)+\rho ^{\tau }_1(x-\sqrt{\varepsilon }z)\rho ^{\tau }_2(x)\). Applying these inequalities, the result follows. \(\square \)

3 Muskat Flow with Surface Tension

In this section, we complete the proof of Theorem 1.1 and show that when \(\varepsilon =\varepsilon _{\tau }\) goes to zero along with \(\tau \), the time interpolated minimizing movements scheme constructed in (21)–(22) converges to a weak formulation of the Muskat problem with surface tension under the energy convergence assumption (EC). This amounts to showing that each quantity in the system of Euler–Lagrange equations given in (18) converges (in an appropriate sense) to the correct limiting object. In particular, we will need strong \(L^1\) convergence of the density functions in \([0,T]\times \Omega \). To obtain the necessary compactness for strong \(L^1\) convergence, we develop an adaptation of an Aubin–Lions type lemma in Proposition 3.2. We then conclude our result by verifying the convergence of the Euler–Lagrange equations to the weak formulation of the Muskat problem in a similar manner to the approach in [22].

Before we introduce the weak formulation of the Muskat problem, let us recall the classical formulation of the problem. When the two phases are separated by a smooth interface \(\Gamma \), the Muskat problem is given by the continuity equation

figure c

along with the free boundary problem for the pressure

figure d

where \(\kappa \) is the mean curvature of \(\Gamma \), oriented to be positive when \({{\,\mathrm{spt}\,}}(\rho _2)\) is convex at the point. The weak formulation of the Muskat problem with surface tension is provided in Definition 3.1 below.

Definition 3.1

We say that \((\rho _i,v_i,p)\) is a weak solution to the Muskat problem with surface tension, if for a.e. \(T>0\), \(\rho _i\in L^1([0,T];BV(\Omega ;\{0,1\}))\cap {{\mathrm{AC}}}^2([0,T];{{\mathscr {P}}}(\Omega ))\), \(\rho _1\rho _2=0\) and \(\rho _1+\rho _2=1\) a.e., \(v_i\in L^2([0,T];L^2_{\rho _{i,t}}(\Omega ;{\mathbb {R}}^d))\), \(p\in L^2([0,T]; (C^{0,\alpha }(\Omega ))^*)\) and for any \(\psi \in C^\infty ([0,T]\times \overline{\Omega })\) and for any vector field \(\xi \in C^\infty ([0,T]\times \Omega ;{\mathbb {R}}^d)\) with zero normal component on \(\partial \Omega \), we have

$$\begin{aligned} \int _0^T\int _{\Omega } \left( \rho _i v_i \cdot \nabla \psi + \rho _i \partial _t\psi \right) \,{\mathrm{d}}x \,{\mathrm{d}}t = \left[ \int _{\Omega }\rho _i \psi \,{\mathrm{d}}x \right] ^T_0 \end{aligned}$$
(26)

with

$$\begin{aligned}&\int _0^T\int _{\Omega } \sum _i \rho _i(b_iv_i+\nabla \Phi _i)\cdot \xi - p\nabla \cdot \xi \nonumber \\&\quad + \frac{\sigma }{2}\left( \frac{D\rho _1}{|D\rho _1|}\otimes \frac{D\rho _1}{|D\rho _1|} : \nabla \xi + \nabla \cdot \xi \right) \left( |D \rho _1| +|D \rho _2| \right) \,{\mathrm{d}}x\,{\mathrm{d}}t =0. \end{aligned}$$
(27)

Remark 3.1

Let us remark here that \(\frac{D\rho _1}{|D\rho _1|}\) stands for the \(L^\infty \) density of \(D\rho _1\) with respect to the total variation measure \(|D\rho _1|\) (after using the Radon–Nikodym differentiation). Furthermore, let us notice that the term \(\left( \frac{D\rho _1}{|D\rho _1|}\otimes \frac{D\rho _1}{|D\rho _1|} : \nabla \xi \right) \left( |D \rho _1| +|D \rho _2| \right) \) is meaningful in the sense that \(f(\nu /|\nu |)\,{\mathrm{d}}|\nu |\) defines a matrix valued element of \({{\mathscr {M}}}(\Omega )\) for any \(f:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^{d\times d}\) which is continuous and 1-homogeneous (see for instance [3, Proposition 3.15] in the case when \(\nu =D\rho \) for some \(\rho \in BV(\Omega )\)). In particular, if \(\rho \in BV(\Omega ;\{0,1\})\), then \(D\rho /|D\rho |\) stands for the measure theoretic normal to the boundary of the set \({{\,\mathrm{spt}\,}}(\rho )\) and by \(\nu /|\nu |\) we mean the density of \(\nu \) with respect to its total variation measure \(|\nu |\).

Remark 3.2

Although we restrict our attention in the weak curvature equation (27) to test functions with vanishing normal component on \(\partial \Omega \), we do not lose information at the boundary. The weak continuity equation (26) encodes the zero normal boundary condition for the velocities, and (27) still encodes the condition that \(\Gamma \) and \(\partial \Omega \) meet orthogonally (c.f. the proof of Lemma 3.1).

Now we are ready to state the main result of our paper.

Theorem 3.1

Given initial data \(\rho _{1,0},\rho _{2,0}\in BV(\Omega )\) such that \(\rho _{1,0}\rho _{2,0}=0\) and \(\rho _{1,0}+\rho _{2,0}=1\) a.e. in \(\Omega \) and Lipschitz continuous potential functions \(\Phi _1,\Phi _2:\Omega \rightarrow {\mathbb {R}}\), there exists \((\rho _i,v_i, p)\) with \(i=1,2\) such that under the energy convergence assumption (EC), it solves (\(\hbox {MP}_1\))–(\(\hbox {MP}_2\)) in the sense of Definition 3.1.

We postpone the proof of the previous theorem to the end of this section. First, let us show a consistency result, i.e. that classical solutions of the Muskat problem satisfy the weak formulation (26) and (27).

Lemma 3.1

If smooth solutions of (\(\hbox {MP}_1\)) and (\(\hbox {MP}_2\)) exist with \(\rho _i = \chi _{A_i}\) and with a \(C^2\) hypersurface \(\Gamma = \partial A_1 = \partial A_2\), then \(\rho _i\) satisfies (26) and (27) with the choice of

$$\begin{aligned} v_i := -b_i^{-1}(\nabla p_i+\nabla \Phi _i) \hbox { and } p = p_1\rho _1 + p_2\rho _2 + \sigma \,{\mathrm{d}}\mu , \end{aligned}$$
(28)

where \(\mu \in (C([0,T]\times \Omega ))^*\) is the surface measure of \(\Gamma =\cup _{t>0}(\Gamma _t\times \{t\})\), i.e. after disintegration

or using test functions

$$\begin{aligned} \int _0^T \int _{\Omega } f \,{\mathrm{d}}\mu = \int _0^T\int _{\Gamma _t} f \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}\,{\mathrm{d}}t \hbox { for any } f\in C([0,T]\times \Omega ). \end{aligned}$$

Remark 3.3

Note that our notion of solution requires adding the surface measure \(\sigma \,{\mathrm{d}}\mu \) to the classical pressure variable. This singular term appears from the minimizing movement scheme, where it ensures that a vacuum does not form at the interface. In general, we expect that the singular part in the weak pressure variable corresponds to the surface measure in (28).

Proof of Lemma 3.1

(26) is a standard weak expression of (\(\hbox {MP}_1\)) with \(b_iv_i = -\nabla p_i-\nabla \Phi _i\). Let us write again \(\Gamma = \cup _{t>0}(\Gamma _t\times \{t\})\). Then we have

$$\begin{aligned}&\int _0^T\int _{\Omega } \left[ \sum _i \rho _i(b_iv_i+\nabla \Phi _i)\cdot \xi - p\nabla \cdot \xi \right] \,{\mathrm{d}}x\,{\mathrm{d}}t \\&\quad =\int _0^T\int _{\Omega } \left[ \sum _i \rho _i(b_iv_i+\nabla \Phi _i) \cdot \xi - (p_1\rho _1 + p_2\rho _2) \nabla \cdot \xi \right] \,{\mathrm{d}}x\,{\mathrm{d}}t\\&\qquad - \int _0^T\int _{\Omega } \sigma (\nabla \cdot \xi )\,{\mathrm{d}}\mu \\&\quad = \int _0^T \int _{\Gamma _t }(p_1-p_2) \xi \cdot \nu \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}\,{\mathrm{d}}t - \sigma \int _0^T\int _{\Gamma _t}\nabla \cdot \xi \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}\,{\mathrm{d}}t \\&\qquad {-\int _0^T\int _{\partial \Omega }(p_1\rho _1+p_2\rho _2)\xi \cdot n\,{\mathrm{d}}{{\mathscr {H}}}^{d-1}}\\&\quad = \frac{\sigma }{2}\int _0^T\int _{\Gamma _t} \kappa \xi \cdot \nu \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}\,{\mathrm{d}}t -\sigma \int _0^T\int _{\Gamma _t}\nabla \cdot \xi \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}\,{\mathrm{d}}t\\&\quad =-\frac{\sigma }{2}\int _0^T \int _{\Gamma _t} [\nabla \cdot \xi + \nu \cdot \nabla \xi \nu ] \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}. \end{aligned}$$

Here for the second equality we used integration by parts for the first integral, using the fact that

and for the third equality we used the curvature jump condition at the interface, and the fact that \(\xi \) has zero normal component on \(\partial \Omega \). To conclude, let us recall that \(\kappa \) is oriented to be positive when \({{\,\mathrm{spt}\,}}(\rho _2)\) is convex at the point. With this orientation, observe that for any \(C^1\) vector field \(\xi \) we have

$$\begin{aligned} \int _{\Gamma _t} \kappa \xi \cdot \nu \,{\mathrm{d}}{{\mathscr {H}}}^{d-1} = \int _{\Gamma _t} [\nabla \cdot \xi -\nu \cdot \nabla \xi \nu ] \,{\mathrm{d}}{{\mathscr {H}}}^{d-1}-\int _{\partial \Gamma _t\cap \partial \Omega }\xi \cdot {\tilde{n}}\,{\mathrm{d}}{{\mathscr {H}}}^{d-2}, \end{aligned}$$

where \(\nu = D\rho _1/|D\rho _1|\) is normal to \(\Gamma \) toward the support of \(\rho _1\), \({\tilde{n}}\) stands for the co-normal vector (orthogonal to \(\partial \Gamma _t\) and tangential to \(\Gamma _t\)). Note that the lower dimensional term \(\int _{\partial \Gamma _t\cap \partial \Omega }\xi \cdot {\tilde{n}}\,{\mathrm{d}}{{\mathscr {H}}}^{d-2}\) must vanish since we know that the co-normal \({\tilde{n}}\) coincides with the boundary normal n. Indeed, we can write

$$\begin{aligned} \int _{\partial \Gamma _t\cap \partial \Omega }\xi \cdot {\tilde{n}}\,{\mathrm{d}}{{\mathscr {H}}}^{d-2}=\int _{\partial \Gamma _t\cap \partial \Omega }\xi \cdot n\,{\mathrm{d}}{{\mathscr {H}}}^{d-2}=0, \end{aligned}$$

where the final equality follows from the fact that \(\xi \) has zero normal component on \(\partial \Omega \). \(\square \)

3.1 Preliminary Estimates

We present below a compactness result on the piecewise constant interpolations of the density variables \((\rho _1^\tau ,\rho _2^\tau )_{\tau >0}\), when the parameter \(\varepsilon \) is vanishing together with \(\tau .\)

Proposition 3.2

Let \((\rho _1^\tau ,\rho _2^\tau )_{\tau >0}\) be the piecewise constant interpolations constructed in Section 2 using the minimizing movements scheme (20). Let moreover \(\rho _{1,0},\rho _{2,0}\in {{\mathscr {P}}}(\Omega )\) be such that \(\rho _{i,0}\in BV(\Omega ;\{0,1\})\) with \(\rho _{1,0}+\rho _{2,0}=1\) a.e. in \(\Omega \) and in particular \({\mathcal {E}}(\rho _{1,0},\rho _{2,0})<+\infty .\)

Then, there exists \(\rho _1,\rho _2\in L^1([0,T];BV(\Omega ;\{0,1\}))\) such that \(\rho _1+\rho _2=1\) a.e. in \([0,T]\times \Omega \) and up to passing to a subsequence, \(\rho _i^\tau \rightarrow \rho _i\) strongly in \(L^1([0,T]\times \Omega )\) as \(\varepsilon _\tau :=\max \{\tau ,\varepsilon \}\downarrow 0\).

Proof

First, let us notice that by the uniform quasi-Hölder estimate from Lemma 2.5(i), we have that there exists a subsequence of \((\rho _i^\tau )_{\tau >0}\) (that we do not relabel) and \(\rho _i\in {{\mathrm{AC}}}^2([0,T];{{\mathscr {P}}}(\Omega ))\) such that \(W_2(\rho _{i,t}^\tau ,\rho _{i,t})\rightarrow 0\) as \(\varepsilon _\tau \downarrow 0\), uniformly in t. We shall work with this subsequence from now on.

The rest of the proof relies on a careful adaptation of the Aubin–Lions lemma, developed in [36] and used in a similar context for instance in [20, 21].

Let us notice that in order to use the Aubin–Lions argument, we need to show a tightness condition (cf. [36, Definition 1.3, Remark 1.5]) for the time-dependent family \((\rho _i^\tau )_{\tau >0}\). This is typically done by providing a compact set (of the space of probability measures), where ‘most’ of the sequences \((\rho _i^\tau (t))_{\tau >0}\) lie. Note that the estimate in Lemma 2.5(v) does not provide such a compact set. Indeed, for \(\tau >0\), the densities \((\rho _i^\tau (t))_{\tau >0}\) are not actually BV in space. Inspired by arguments in [13], we will work first with an auxiliary sequence \(\overline{\rho }_i^\tau :[0,T]\times \Omega \rightarrow [0,1]\), defined as

$$\begin{aligned} \overline{\rho }_i^\tau :=G_{\varepsilon }\star \rho _i^\tau , \end{aligned}$$

where we have performed a convolution with the heat kernel \(G_\varepsilon \) only in the spacial variable. It worth noticing that this ‘perturbation’ of \(\rho _i^\tau \) is reminiscent to the one obtained via the so-called flow interchange technique introduced in [26]. We have the following properties for this new sequence.

Claim 1

\((\overline{\rho }_i^\tau )_{\tau >0}\) is uniformly bounded (w.r.t. \(\varepsilon \) and \(\tau \)) in \(L^1([0,T]; BV(\Omega ))\).

Proof of Claim 1

The uniform \(L^\infty \) bounds on \((\rho _i^\tau )_{\tau >0}\) are also preserved for \((\overline{\rho }_i^\tau )_{\tau >0}\). Now, as in the proof of [22, Lemma 2.4] we conclude that

$$\begin{aligned} \int _0^T\int _{\Omega }|\nabla \overline{\rho }_{i,t}^\tau |\,{\mathrm{d}}x\,{\mathrm{d}}t\le C T{{\mathrm{HC}}}_\varepsilon (\rho _{1,0},\rho _{2,0}) \end{aligned}$$
(29)

for a uniform constant C, which proves our claim. \(\square \)

Claim 2

There exists a subsequence of \((\overline{\rho }_i^\tau )_{\tau >0}\) (that we do not relabel) and \(\overline{\rho }_i\in L^1([0,T]\times \Omega )\) such that \(\overline{\rho }_i^\tau \rightarrow \overline{\rho }_i\) strongly in \(L^1([0,T]\times \Omega )\) as \(\varepsilon _\tau \downarrow 0.\)

Proof of Claim 2

For \(t\in (0,T)\) fixed, let us notice that \(\overline{\rho }_{i,\tau }^\tau \) actually can be seen as the evolution of \(\rho _{i,t}^\tau \) via the heat flow for time \(\varepsilon >0\). It is well-known that \(W_2\) is contractive along the heat flow, so we have

$$\begin{aligned} W_2(\overline{\rho }_{i,t}^\tau ,\overline{\rho }_{i,s}^\tau )\le W_2(\rho _{i,t}^\tau ,\rho _{i,s}^\tau ),\ \forall \ 0\le s\le t\le T. \end{aligned}$$
(30)

Now let us set \(g:L^1(\Omega )\times L^1(\Omega )\rightarrow [0,+\infty ]\) and \({\mathcal {F}}:L^1(\Omega )\rightarrow [0,+\infty ]\) defined as

$$\begin{aligned} g(\mu ,\nu ):=W_2(\mu ,\nu ) \end{aligned}$$

and

$$\begin{aligned} {\mathcal {F}}(\mu ):=\left\{ \begin{array}{ll} |D\mu |(\Omega ), &{} {{\mathrm{if}}}\ \mu \in BV(\Omega ),\\ +\infty , &{} {{\mathrm{otherwise}}}. \end{array} \right. \end{aligned}$$

By construction, \({\mathcal {F}}\) is convex, l.s.c. in \(L^1(\Omega )\) and it sublevel sets are compact in \(L^1(\Omega ),\) therefore it defines a normal coercive integrand.

From (29) and from the definition of \({\mathcal {F}}\), we have

$$\begin{aligned} \sup _{\varepsilon _\tau }\int _0^T{\mathcal {F}}(\overline{\rho }_{i,t}^\tau )\,{\mathrm{d}}t<+\infty , \end{aligned}$$

and similarly, from Lemma 2.5(i) and from (30), we have

$$\begin{aligned} \lim _{h\downarrow 0}\sup _{\varepsilon _\tau }\int _0^{T-h}g(\overline{\rho }_{i,t+h}^\tau ,\overline{\rho }_{i,t}^\tau )\,{\mathrm{d}}t<+\infty , \end{aligned}$$

therefore the assumptions of [36, Theorem 2] are fulfilled and one can conclude that there exists \(\overline{\rho }_i\in L^1([0,T]\times \Omega )\) and a subsequence of \((\overline{\rho }_i^\tau )_{\tau >0}\) (that we do not relabel) which is converging in measure, and in particular pointwise a.e. to \(\overline{\rho }_i\) as \(\varepsilon _\tau \downarrow 0\). The strong convergence in \(L^1([0,T]\times \Omega )\) follows from Lebesgue’s dominated convergence theorem, since \((\overline{\rho }_i^\tau )_{\tau >0}\) is uniformly bounded. The claim follows. \(\square \)

Claim 3

There exists a subsequence of the original sequence \((\rho _{i}^\tau )_{\tau >0}\) which is converging to \(\overline{\rho }_i\) strongly in \(L^1([0,T]\times \Omega )\) as \(\varepsilon _\tau \downarrow 0\).

Proof of Claim 3

Passing to the same subsequence in the original sequence \((\rho _{i}^\tau )_{\tau >0}\) (that we do not relabel) as in the last step of the proof of Claim 2, we have

$$\begin{aligned} \Vert \rho _{i}^\tau -\overline{\rho }_i\Vert _{L^1([0,T]\times \Omega )}\le & {} \Vert \rho _i^\tau -\overline{\rho }_i^\tau \Vert _{L^1([0,T]\times \Omega )}+\Vert \overline{\rho }_i^\tau -\overline{\rho }_i\Vert _{L^1([0,T]\times \Omega )}\\\le & {} C\sqrt{\varepsilon }+\Vert \overline{\rho }_i^\tau -\overline{\rho }_i\Vert _{L^1([0,T]\times \Omega )}, \end{aligned}$$

for a constant \(C>0\) (independent of \(\tau \) and \(\varepsilon \)) and the claim follows by taking \(\varepsilon _\tau \downarrow 0.\) In the last inequality we have used

$$\begin{aligned} \Vert \rho _1^\tau -\overline{\rho }_1^\tau \Vert _{L^1([0,T]\times \Omega )}&=\int _0^T\int _\Omega |\rho _1^\tau -G_\varepsilon \star \rho _1^\tau |\,{\mathrm{d}}x \,{\mathrm{d}}t\\&\le \int _0^T\int _{\Omega } \int _{{\mathbb {R}}^d} G(z) |\rho _1^{\tau }(x)-\rho _1^{\tau }(x-\sqrt{\varepsilon } z)|\,{\mathrm{d}}z\,{\mathrm{d}}x\,{\mathrm{d}}t\\&\le CT\sqrt{\varepsilon }{{\mathrm{HC}}}_\varepsilon (\rho _{1,0},\rho _{2,0}), \end{aligned}$$

where we relied on the fact that since \(\rho ^{\tau }_1\) and \(\rho ^{\tau }_2\) take values in \(\{0,1\}\), we have \( |\rho _1^{\tau }(x)-\rho _1^{\tau }(x-\sqrt{\varepsilon } z)|\le \rho ^{\tau }_1(x)\rho ^{\tau }_2(x-\sqrt{\varepsilon }z)+\rho ^{\tau }_1(x-\sqrt{\varepsilon }z)\rho ^{\tau }_2(x)\).

To conclude, let us notice that since \((\rho _i^\tau )_{\tau >0}\) converges uniformly w.r.t. \(W_2\) to \(\rho _i\), \(\overline{\rho }_i\) and \(\rho _i\) have to coincide. Also, for this last subsequence, we can pass to the limit as \(\varepsilon _\tau \downarrow 0\) in the estimation of Lemma 2.5(v) to obtain that actually \(\rho _i\in L^1([0,T];BV(\Omega ;\{0,1\}))\).

\(\square \)

Let \(\varvec{\rho }=(\rho _1,\rho _2)\) the limit point obtained in Proposition 3.2. Later in this section, we will profit from the assumption

$$\begin{aligned} \int _0^T{{\mathrm{HC}}}_{\varepsilon _{\tau }}(\varvec{\rho }^{\tau })\,{\mathrm{d}}t \rightarrow \int _0^T{\mathcal {E}}_s(\varvec{\rho })\,{\mathrm{d}}t, \hbox { as } \varepsilon \downarrow 0. \end{aligned}$$
(EC)

3.2 Derivation of the Weak Curvature Equation for \(\varepsilon \) Going to Zero Together With \(\tau \)

Let \(\xi \in C^2([0,T]\times \Omega ;{\mathbb {R}}^d)\). Fix \(t\in [n\tau ,(n+1)\tau ).\) We consider the piecewise constant interpolations \((\rho _i^\tau ,v_i^\tau ,p^\tau )\). Let us take the inner product of the equations (18) with \(\xi (t,\cdot )\), multiply with the corresponding \(\rho _i^\tau \) and integrate over \([0,T]\times \Omega \). Adding up the two equations we get

$$\begin{aligned}&\int _0^T\int _{\Omega }\left[ \nabla K_\varepsilon \star \rho _2^\tau +\nabla \Phi _1+v_1^\tau +\nabla p^\tau \right] \cdot \xi \rho _1^\tau \,{\mathrm{d}}x\,{\mathrm{d}}t\\&\quad +\int _0^T\int _{\Omega }\left[ \nabla K_\varepsilon \star \rho _1^\tau +\nabla \Phi _2+v_2^\tau +\nabla p^\tau \right] \cdot \xi \rho _2^\tau \,{\mathrm{d}}x\,{\mathrm{d}}t=0, \end{aligned}$$

of rearranging the terms yields

$$\begin{aligned}&\int _0^T\int _{\Omega }\xi \cdot \left[ (\nabla K_\varepsilon \star \rho _2^\tau )\rho _1^\tau +(\nabla K_\varepsilon \star \rho _1^\tau )\rho _2^\tau \right] +(\rho _1^\tau (v_1^\tau +\nabla \Phi _1)+\rho _2^\tau (v_2^\tau +\nabla \Phi _1))\cdot \xi \nonumber \\&\quad +\nabla p^\tau \cdot \xi \,{\mathrm{d}}x\,{\mathrm{d}}t=0. \end{aligned}$$
(31)

Our aim is now to pass to the limit in this last expression (31) as \(\varepsilon _\tau \downarrow 0\) to recover (27).

Theorem 3.2

Let \((\rho _i^\tau ,v_i^\tau ,p^\tau )_{\tau >0}\) be the piecewise constant interpolations constructed in (21)–(22). Then there exists \(\rho _i\in L^1([0,T];BV(\Omega ;\{0,1\}))\cap {{\mathrm{AC}}}^2([0,T];{{\mathscr {P}}}(\Omega ))\), \(v_i\in L^2([0,T];L^2_{\rho _i}(\Omega ;{\mathbb {R}}^d))\) and \(p\in {L^2([0,T]; (C^{0,\alpha }(\Omega ))^*)}\) such that, up to passing to a subsequence that we do not relabel, we have the following:

  1. (i)

    \(\rho _i^\tau \rightarrow \rho _i\), as \(\varepsilon _\tau \downarrow 0\), strongly in \(L^1([0,T]\times \Omega )\);

  2. (ii)

    \(v_i^\tau \rho _i^\tau {\mathop {\rightharpoonup }\limits ^{\star }}v_i\rho _i\), as \(\varepsilon _\tau \downarrow 0\), weakly-\(\star \) in \({{\mathscr {M}}}^d([0,T]\times \Omega )\);

  3. (iii)

    \(\nabla p^\tau {\mathop {\rightharpoonup }\limits ^{\star }}\nabla p\), as \(\varepsilon _\tau \downarrow 0\), weakly-\(\star \) in \(L^2([0,T];(C^1(\Omega ))^\star );\)

Proof

  1. (i)

    Is a consequence of Proposition 3.2 via the Aubin–Lions type argument.

  2. (ii)

    Let us notice that the estimates in Lemma 2.5(i–iii) are independent of \(\varepsilon >0\), therefore there exists \(E_i\in {{\mathscr {M}}}^d([0,T]\times \Omega )\), \(i=1,2\) such that \(E_i^\tau {\mathop {\rightharpoonup }\limits ^{\star }}E_i\) and \({{\tilde{E}}}_i^\tau {\mathop {\rightharpoonup }\limits ^{\star }}E_i\) as \(\varepsilon _\tau \downarrow 0\). Moreover, we have that \((\rho _i,E_i)\) solves \(\partial _t+\nabla \cdot (E_i)=0\) in the sense of distributions and \(\rho _i\in {{\mathrm{AC}}}^2([0,T];({{\mathscr {P}}}(\Omega ),W_2)).\) Therefore, there exists \(v_i\in L^2([0,T];L^2_{\rho _i}(\Omega ;{\mathbb {R}}^d))\) such that \(\partial _t\rho _i+\nabla \cdot (\rho _i v_i)=0\) in the sense of distributions.

  3. (iii)

    Let us notice first that from Lemma 2.5(vii) we have that the sequence \((\nabla p^\tau )_\tau \) is uniformly bounded in \(L^2([0,T];(C^1(\Omega ))^*)\), independently of \(\varepsilon \). Then, the Banach–Bourbaki–Alaouglu theorem yields that it is sequentially pre-compact in that space, so we have that there exists \(\zeta \in L^2([0,T];(C^1(\Omega ))^*)\) such that up to passing to a subsequence \(\nabla p^\tau {\mathop {\rightharpoonup }\limits ^{\star }}\zeta \), as \(\varepsilon _\tau \downarrow 0\).

Thus, it only remains to show that \(\zeta \) is a gradient. Let us notice that by construction of \(p^\tau \),

$$\begin{aligned} \displaystyle \int _0^T\int _{\Omega }\nabla p^\tau \cdot \xi \,{\mathrm{d}}x\,{\mathrm{d}}t=0, \end{aligned}$$

for any incompressible smooth field \(\xi \). Therefore, we also have that \(\langle \zeta ,\xi \rangle =0\) for any incompressible smooth field \(\xi \), so we would be done if one would have a Helmholtz decomposition in this corresponding space. This result is well known (see for instance [39, Lemma 2.2.1]), if \(\zeta (t,\cdot )\in W^{-1,q}(\Omega )^d\), for some \(q\in (1,+\infty )\). However, our limit object has slightly worse regularity.

To overcome this issue, let us argue using the following claim. This is a consequence of classical result in the theory of elliptic equations and Schauder estimates (see for instance [15, Theorems 5.23–5.24])

Claim

Let \(\varphi \in C^{0,\alpha }(\Omega )\). Then the problem \(-\Delta u=\varphi \) (with homogeneous zero Neumann boundary condition, if \(\int _\Omega \varphi \,{\mathrm{d}}x=0\)) has a unique (modulo constants) solution \(u\in C^{2,\alpha }(\Omega )\) such that \(\Vert u\Vert _{C^{2,\alpha }}\le \Vert \varphi \Vert _{C^{0,\alpha }}\).

Now, let \(\varphi \in C^{0,\alpha }(\Omega )\) and \(u\in C^{2,\alpha }(\Omega )\) as in the claim. Let \(t\in [0,T]\). Then we have

$$\begin{aligned} \Bigg |\int _{\Omega }p^\tau (t,\cdot )\varphi \,{\mathrm{d}}x\Bigg |&=\Bigg |\int _{\Omega }p^\tau (t,\cdot )\Delta u\,{\mathrm{d}}x\Bigg |\\&=\Bigg |\int _{\Omega }\nabla p^\tau (t,\cdot )\cdot \nabla u\,{\mathrm{d}}x\Bigg |\le C\Vert \nabla u\Vert _{C^1}\le C\Vert u\Vert _{C^{2,\alpha }}\le C\Vert \varphi \Vert _{C^{0,\alpha }}, \end{aligned}$$

where in the first inequality we have used the last uniform estimate from the proof of Lemma 2.5(vii). This implies that \((p^\tau (t,\cdot ))_{\tau }\) is uniformly bounded in \((C^{0,\alpha }(\Omega ))^*\).

To conclude the thesis of the lemma, we observe that (by possibly subtracting the mean) \((p^\tau )_\tau \) is also converging weakly-\(\star \) to some p, therefore we will have that \(\zeta =\nabla p\) in \({\mathscr {D}}'((0,T)\times \Omega )\). \(\square \)

To complete the proof of of Theorem 1.1, it remains to show that limit of equation (31) converges to the weak formulation of the Muskat problem. This result is proven in Proposition A.1, which in turn uses the results of Lemmas A.2 and A.3. These are direct consequences of the corresponding results from [22]. However, for completeness and to facilitate the reading, we collected them in the Appendix A below. These results allow to simply conclude this section with the proof of Theorem 3.1.

Proof of Theorem 3.1

This proof is a direct consequence of Theorem 3.2 and Proposition A.1. Indeed, these two result, allow us to pass to the limit in the equation (31) to obtain (27). \(\square \)

4 Numerics and Equilibrium Shapes

In this section we present several examples of the performance of our numerical scheme and a discussion of equilibrium shapes.

4.1 Numerical Implementation

The Muskat problem evolution can be simulated by discretizing the minimizing movements scheme (9) onto a regular grid. At first glance, numerically solving the discretized variational problem is not so simple. Indeed, Problem (9) is not convex with respect to \(\rho \) and the Wasserstein distance is challenging to work with numerically. However, as noted in the introduction, the scheme can be substantially simplified by applying the heat content linearization trick used in [13]. To that end, note that the convexity of the heat content gives us the upper bound

$$\begin{aligned} {{\mathrm{HC}}}_{\varepsilon }(\varvec{\rho })\le {{\mathrm{HC}}}_{\varepsilon }(\varvec{\rho }^n)+(\delta {{\mathrm{HC}}}_{\varepsilon }(\varvec{\rho }^n), \varvec{\rho }-\varvec{\rho }^n), \end{aligned}$$
(32)

where the second term is the linearization of the heat content about the previous iterate \(\varvec{\rho }^n\). Thus, if we replace (9) by the linearized scheme,

$$\begin{aligned} \varvec{\rho }^{n+1}= & {} \mathop {{\mathrm{arg\,min}}}\limits _{\varvec{\rho }} \, {{\mathrm{HC}}}_{\varepsilon }(\varvec{\rho }^n)+(\delta {{\mathrm{HC}}}{E}_{\varepsilon }(\varvec{\rho }^n), \varvec{\rho }-\varvec{\rho }^n)+{\mathcal {E}}_p(\varvec{\rho })+ \Phi (\varvec{\rho })\nonumber \\&+\sum _{i=1}^2\frac{b_i}{2\tau }W_2^2(\rho _i, \rho ^n_i). \end{aligned}$$
(33)

we obtain a convex variational problem, and inequality (32) ensures that the scheme still posses the energy dissipation property

$$\begin{aligned} E_{\varepsilon }(\varvec{\rho }^{n+1})+\sum _{i=1}^2\frac{b_i}{2\tau }W_2^2(\rho _i^{n+1}, \rho _i^n)\le E_{\varepsilon }(\varvec{\rho }^{n}). \end{aligned}$$

A nearly identical argument to the proof of Proposition 2.3 shows that the set of minimizers of (33) scheme always contains a configuration where \(\rho _1\) and \(\rho _2\) are characteristic functions.

To solve problem (33) we introduce the pressure as a Lagrange multiplier for the incompressibility constraint, and instead work with the corresponding dual problem. Up to a constant, the dual problem has the form

$$\begin{aligned} p_{n+1}=\mathop {{\mathrm{arg\,max}}}\limits _{p} \int _{\Omega } (p+\psi _1^n)^{c_1}(x)\rho _1^n+(p+\psi _2^n)^{c_2}(x)\rho _2^n(x)-p(x)\, dx, \end{aligned}$$
(34)

where

$$\begin{aligned} \psi _i^n(x)=\big (\delta {{\mathrm{HC}}}_{\varepsilon }(\rho _i^n)\big )(x)+\Phi _i(x), \end{aligned}$$

and \(c_1\), \(c_2\) denote the quadratic c-transform

$$\begin{aligned} (p+\psi _i^n)^{c_i}(x)=\inf _{y\in \Omega } \; p(y)+\psi _i^n(y)+\frac{b_i}{2\tau }|y-x|^2 \end{aligned}$$

(note c-transforms play an essential role in optimal transport see for instance [37]). The dual problem is concave with respect to p, and can be solved using the recently introduced back-and-forth method [18], which efficiently solves optimal transport problems in dual form.

Due to the two phase nature of the problem, the optimal densities \(\rho _i^{n+1}\) are not a simple function of the optimal pressure \(p^{n+1}\) (this is in contrast to one-phase incompressible fluid flow where the occupied region is the support of the pressure). On the other hand, once we have solved for the optimal pressure in (34), we can recover the velocities \(v_i\) for each phase (c.f. equation (31)). Thus, in principle, one can recover the densities \(\rho ^{n+1}_i\) from \(v_i\) and \(\rho _i^n\) by solving the continuity equation for time \(\tau \). However, solving the continuity equation accurately is challenging due to the discontinuity of the densities at the phase boundary. Luckily, since we know that the densities remain as characteristic functions, we can instead compute \(\rho _i^{n+1}\) using the level set method [33]. If we let \(\varphi \) be the signed distance function to the interface between \(\rho _1^n\) and \(\rho _2^n\), then by solving the transport equation

$$\begin{aligned} \partial _t \varphi +\nabla \varphi \cdot (v_i\rho _i)=0 \end{aligned}$$

for time \(\tau \), we can recover \(\rho _i^{n+1}\) through the sign of \(\varphi \). The advantage of this approach is that the transport equation with Lipschitz initial data can be solved much more accurately than the continuity equation with discontinuous initial data.

4.2 Numerical Experiments

We demonstrate the performance of the numerical scheme on 3 different examples in 2 dimensions. In each experiment, we take our computational domain to be the unit square \([0,1]^2\) and set the surface tension constant to be \(\sigma =0.15\).

In the first two examples, shown in Figs. 1 and 2, we and choose potentials \(\Phi _i(x,y)=-w_iy\) where \(w_1=5\) and \(w_2=1\). In Fig. 1, the starting configuration for phase 1 is a small square and in Fig. 2, the starting configuration for phase 1 is a large square. In both cases, the square becomes round and falls to the bottom of the computational domain. However, due to the difference in mass between examples 1 and 2, the equilibrium configurations are different. In Fig. 1, the equilibrium configuration is a half disc sitting at the bottom of the domain, while in Fig. 2 the equilibrium configuration is a flat strip.

In the last example, shown in Fig. 3, we choose a different potential that leads to a topological change. We set

$$\begin{aligned} \Phi _1(x,y)={\left\{ \begin{array}{ll} \frac{1}{2}-|y-\frac{1}{2}| &{} \text {if} \; \; y>\frac{1}{2}\\ \frac{5}{4}\big (\frac{1}{2}-|y-\frac{1}{2}|\big ) &{} \text {if} \; \; y\le \frac{1}{2} \end{array}\right. } \end{aligned}$$

and \(\Phi _2(x,y)=0\). The potential encourages phase 1 to migrate to the top and the bottom of the computational domain, with a stronger force attracting the drop to the bottom. Because the potential pulls the drop in opposite directions, ultimately the initial drop is ripped apart into two separate droplets. Thanks to the scheme’s implicit representation of the interface \(\Gamma \), there is no difficulty in simulating topological changes.

Fig. 1
figure 1

Numerical simulation of the Muskat problem evolution with a gravity potential. The yellow phase is heavier than the black phase. Under the influence of gravity and surface tension, the yellow phase becomes round and falls. The images show snapshops of the evolution at several interesting times. The final configuration is the stationary state

Fig. 2
figure 2

Numerical simulation of the Muskat problem evolution with a gravity potential. The setup is the same as in Fig. 1 except that the yellow region has a larger mass. The images show snapshops of the evolution at several interesting times. The final configuration is the stationary state, which is qualitatively different from the final configuration in Fig. 1 due to the larger mass

Fig. 3
figure 3

Numerical simulation of the Muskat problem evolution with a vertical potential that attracts the yellow phase to the top and the bottom of the computational domain. The potential is asymmetric and attracts the yellow phase more strongly to the bottom of the domain. The yellow phase is pulled apart and undergoes a topological change

4.3 Discussions on the Structure of Equilibrium Shapes

In this subsection we discuss global equilibrium of the energy with gravity potentials

$$\begin{aligned} {{\tilde{{\mathcal {E}}}}}_\varepsilon (\rho _1,\rho _2) := {{\mathrm{HC}}}_\varepsilon (\rho _1, \rho _2) + \Phi (\rho _1,\rho _2). \end{aligned}$$

where \(\Phi _i(x)= c_ix_d\), with \(0<c_1<c_2\). The order of the constants denote that \(\rho _1\) is the lighter fluid, where the vector \(-e_d\) denotes the direction of gravity. The coordinate here is \(x = (x', x_d)\), and for simplicity, we consider a cylindrical domain,

$$\begin{aligned} \Sigma := \{|x'|\le 1\} \times [0,h]=B^{d-1}_1(0)\times [0,h]. \end{aligned}$$

Here the convolution is taken with the extension of the density functions as zero outside of the domain, with the density constraint \(\rho _1+ \rho _2 = 1\) in \(\Omega \) and \(\int _{{\mathbb {R}}^d} \,{\mathrm{d}}\rho _i = M_i\). Since the densities are extended by zero outside \(\Omega \), this will produce a Neumann boundary condition for the interface \(\Gamma \). In particular, we expect that \(\Gamma \) intersects \(\partial \Omega \) orthogonally.

Let us mention that away from the global equilibrium, there are diverse possibilities of stationary states for \({{\tilde{{\mathcal {E}}}}}_\varepsilon \) even with zero potentials. For instance any choice of characteristic functions \(\rho _1\) and \(\rho _2\) generating the interface as a disjoint union of spheres, \(\{|x-a_i| = r_i\}\), is a stationary solution of the limit energy.

In the limit \(\varepsilon \rightarrow 0\), the \(\Gamma \)-convergence properties indicate that the global equilibrium of the \(\varepsilon \)-energy converges to the limiting density pair \((\rho _1,\rho _2) = (\chi _{A}, \chi _{A^c})\), which is the global minimizer of the limit energy

$$\begin{aligned} E_{\infty }(A):= {{\mathrm{Per}}}(A) + \int _A \Psi (x) \,{\mathrm{d}}x, \hbox { where } \Psi (x):= (\Phi _1 - \Phi _2)(x) \end{aligned}$$

under the volume constraint \({\mathscr {L}}^d(A) = M\). Away from the domain boundary, the classical minimal surface theory yields the \(C^{2,\alpha }\) regularity of \(\partial A\) with the Euler–Lagrange equation

$$\begin{aligned} -\kappa -\Psi (x) = \lambda \hbox { on } \partial A, \end{aligned}$$

where \(\lambda \) is the Lagrange multiplier associated to the volume constraint and \(\kappa \) stands for the curvature.

When \(\partial A\) is away from the lateral and bottom portion of the cylinder, this corresponds to the classical pendant liquid droplet problem where the minimizer is known to be rotationally symmetric and convex (see [16]). When the droplet boundary touches the cylinder boundary, various shapes of drops are possible (see Fig. 4) and the complete description of possibly non-smooth global minimizers appear to be open. In general, when \(\Sigma \) has \(C^{1,\alpha }\) boundary, it is shown in [41] that \(\partial A\) is \(C^{1,\alpha }\) up to the boundary and meets \(\partial \Sigma \) orthogonally. For further discussion of available results we refer to [25]. An ongoing work on numerical simulations for our flow suggest that the equilibrium states even for the \(\varepsilon \)-energy can be categorized as the ones on Fig. 4. In dimension two, we could observe an additional equilibrium shape, as in Fig. 5.

Fig. 4
figure 4

Equilibrium shapes in any dimensions, depending on the proportion of the lighter versus the heavier fluid: orange stands for the lighter fluid, while gray stands for the heavier fluid

Fig. 5
figure 5

An additional equilibrium shape that we predict only in dimension two