Partial derivative with respect to the measure and its application to general controlled mean-field systems

doi:10.1016/j.spa.2021.01.003

Stochastic Processes and their Applications

Volume 134, April 2021, Pages 265-307

https://doi.org/10.1016/j.spa.2021.01.003 Get rights and content

Abstract

Let $(E, E)$ be an arbitrary measurable space. The paper first focuses on studying the partial derivative of a function $f : P_{2, 0} (R^{d} \times E) \to R$ defined on the space of probability measures $μ$ over $(R^{d} \times E, B (R^{d}) \otimes E)$ whose first marginal $μ_{1} ≔ μ (\cdot \times E)$ has a finite second order moment. This partial derivative is taken with respect to $q (d x, z)$ , where $μ$ has the disintegration $μ (d x d z) = q (d x, z) μ_{2} (d z)$ with respect to its second marginal $μ_{2} (\cdot) = μ (R^{d} \times \cdot)$ . Simplifying the language, we will speak of the derivative with respect to the law $μ$ conditioned to its second marginal. Our results extend those of the derivative of a function $g : P_{2} (R^{d}) \to R$ over the space of probability measures with finite second order moment by P.L. Lions (see Lions (2013)) but cover also as a particular case recent approaches considering $E = R^{k}$ and supposing the differentiability of $f$ over $P_{2} (R^{d} \times R^{k})$ , in order to use the derivative $\partial_{μ} f$ to define the partial derivative ${(\partial_{μ} f)}_{1}$ . The second part of the paper focuses on investigating a stochastic maximum principle, where the controlled state process is driven by a general mean-field stochastic differential equation with partial information. The control set is just supposed to be a measurable space, and the coefficients of the controlled system, i.e., those of the dynamics as well as of the cost functional, depend on the controlled state process $X$ , the control $v$ , a partial information on $X$ , as well as on the joint law of $(X, v)$ . Through considering a new second-order variational equation and the corresponding second-order adjoint equation, and a totally new method to prove the estimate for the solution of the first-order variational equation, the optimal principle is proved through spike variation of an optimal control and with the help of the tailor-made form of second-order expansion. We emphasize that in our assumptions we do not need any regularity of the coefficients neither in the control variable nor with respect to the law of the control process.

Introduction

Let $T > 0$ be a fixed time horizon and $(Ω, F, P)$ be a given complete probability space on which a $(k + r)$ -dimensional Brownian motion $W = (W^{1}, W^{2})$ is defined. In this paper we are interested in studying the partial derivative of a function $f : P_{2, 0} (R^{d} \times E) \to R$ with respect to the law $μ$ conditioned to its second marginal $μ_{2} = μ (R^{d} \times \cdot)$ , where $μ$ is from the space of probability measures $P_{2, 0} (R^{d} \times E)$ over $(R^{d} \times E, B (R^{d}) \otimes E)$ whose first marginal $μ_{1} ≔ μ (\cdot \times E)$ has a finite second order moment. In the second part, as an interesting application, we will study Peng’s maximum principle for a class of general optimal control problems with the McKean–Vlasov dynamics: $\{\begin{aligned} d X_{t}^{v} = & b (t, P_{(X_{t}^{v}, v_{t})}, X_{t}^{v}, E [X_{t}^{v} | F_{t}^{W^{1}}], v_{t}) d t \\ + σ (t, P_{(X_{t}^{v}, v_{t})}, X_{t}^{v}, E [X_{t}^{v} | F_{t}^{W^{1}}], v_{t}) d W_{t}, t \in [0, T], \\ X_{0}^{v} = & x_{0}, \end{aligned}$ where $P_{ξ} ≔ P \circ ξ^{- 1}$ denotes the law of the random variable $ξ$ under $P$ , ${(F_{t}^{W^{1}})}_{0 \leq t \leq T}$ is the natural filtration generated by $W^{1}$ , and the coefficients $b, σ$ are measurable functions with appropriate dimensions (The precise assumptions on them will be given in Section 4). The control process $v$ takes its values in an arbitrary measurable space $(U, G)$ . Here, we will not assume any differentiability of the coefficients with respect to the control $v$ , and neither with respect to the second marginal of the law of $(X, v)$ .

Stochastic control problems with controlled McKean–Vlasov dynamics have been used to describe the (Nash) equilibrium state of the symmetric game, see, for example, Chassagneux et al. [9]. In [8], with the help of a tailor-made stochastic maximum principle, the authors proved the existence of a mean-field game strategy. It is worth noting that their stochastic maximum principle is based on the assumption that the Hamiltonian is strictly convex with respect to the control $v$ , which plays an important role in the discussion.

The main difficulty in dealing with the stochastic maximum principle (SMP) without convexity of the control state space $U$ nor regularity assumptions of the coefficients with respect to the control variable is to find the variational equations and adjoint equations. Peng [17] was the first to get rid of the convexity assumption on the control state space by using the second-order term in the Taylor expansion and he proved the necessary condition of optimality for a control in the case where the diffusion coefficient $σ$ depends on the control. On the other hand, since the pioneering works by J.M. Lasry and P.L. Lions [13] and Huang, Malhamé, Caines [11], the research on mean-field problems has attracted a lot of researchers. And so also mean-field stochastic control optimal problems have been studied by many authors; we refer to [1], [2], [3], [14] and references cited therein. Acciaio et al. [1] studied mean-field stochastic control problems where the cost functional and the state dynamics depend on the joint distribution of the controlled state and the control process. They proved the Pontryagin stochastic maximum principle with differentiability assumptions of the coefficients with respect to the control and its law. Buckdahn, Li and Ma [3] were the first to study the optimal control problem for a class of general mean-field stochastic differential equations with non-convex control domains and they obtained the related Peng’s stochastic maximum principle. We emphasize that in [3] the coefficients do not depend on the law of the control, that is, the cost functional and the state dynamics only depend on the law of the controlled state.

Strongly inspired by Buckdahn, Li and Ma [3], [4], our work investigates a generalized mean-field stochastic maximum principle for the optimal control problem where the coefficients not only depend on the law of $(X, v)$ but also with partial information. Similar to the works of [2], [3], [17], the second-order variational equation and the second-order adjoint equation are obtained without convexity assumption on the control state space. In our setting, the coefficients depend on the joint law of $(X, v)$ without assuming differentiability with respect to the law of control, which extends the existing works. As in the pioneering work of Peng [17] we do not require the differentiability of the coefficients in the control variable $v$ , and in order to be coherent, we neither suppose it with respect to the second marginal of the law of $(X, v)$ . In the existing literature the partial derivative of a function $f (μ), μ \in P_{2} (R^{d} \times R^{k})$ (the space of Borel probability measures with finite second order moment over $R^{d} \times R^{k}$ ) has been introduced as the first component ${(\partial_{μ} f)}_{1} (μ, y, z)$ of the derivative $(\partial_{μ} f) (μ, y, z) = ({(\partial_{μ} f)}_{1} (μ, y, z), {(\partial_{μ} f)}_{2} (μ, y, z)) \in R^{d} \times R^{k}$ . However, when, for instance, $μ$ is a probability measure over $R^{d} \times E$ , where $(E, E)$ is an arbitrary measurable space, or simply when $E = U$ is a control state space, such a global differentiability property is rather restrictive and shall be avoided. For this reason we study in Section 3 the partial differentiability with respect to the law $μ$ conditioned to its second marginal $μ_{2} = μ (R^{d} \times \cdot)$ without any assumption of regularity with respect to $μ_{2}$ . Our results cover the particular cases that $f$ is differentiable over $P_{2} (R^{d})$ and $P_{2} (R^{d} \times R^{k})$ , respectively, and an example of the partial differentiability of $f : P_{2} (R^{d} \times E) \to R$ is discussed, when $f$ does not have any regularity with respect to the second marginal law.

In order to come back to our generalized mean-field stochastic maximum principle, in addition to the dependence of the coefficients of Eq. (1.1) of the joint law of $(X, v)$ , they also depend on partial information of the controlled state process, i.e., in our case, on the conditional expectation $E [X_{t} | F_{t}^{W^{1}}]$ , which immediately leads to subtle difficulties in treating some appropriate estimates. A special case of our setting was studied in [2], where the dependence of the coefficients on $P_{(X_{t}, v_{t})}$ is reduced to that on $E [X_{t}]$ . It is very natural to study optimal control problems under partial information since controllers can only get partial information in most cases, see, for example, Huang, Wang and Xiong [12] obtained the stochastic maximum principle for control problems under partial information with the differentiability of the coefficients with respect to the control $v$ .

Our optimal control problem we study in this paper can be illustrated by the following motivating example:

Example 1.1

Let $W^{j}, j \geq 0$ , be a sequence of independent 1-dimensional Brownian motions defined over a probability space $(Ω, F, P)$ . We consider the model of a financial market with a highly risky equity fund and a riskless asset with risk free rate $r$ . The price $S_{t}$ of a share of the equity fund at time $t$ is given by $S_{t} = S_{0} exp {σ_{0} W_{t}^{0} + (μ - σ_{0}^{2} ∕ 2) t}$ , where $σ_{0} > 0$ is the volatility and $μ \in R$ the return rate. Each of the $N (\geq 1)$ investors holds a portfolio $π_{t}^{i} = (u_{t}^{i}, ρ_{t}^{i}), t \geq 0$ (portfolio of the $i$ th investor) which he tries to optimize. The value at time $t$ of the investor $i$ is $V_{t}^{i} = u_{t}^{i} S_{t} + ρ_{t}^{i} e^{r t}$ whose sell price $X_{t}^{i}$ is perturbed by an independent Brownian motion $σ W^{i}$ with volatility $σ > 0$ : $X_{t}^{i} = V_{t}^{i} + σ W_{t}^{i}, t \geq 0$ . The portfolios are supposed to be self-financing, which implies that the dynamics of $X^{i}$ is given by $\{\begin{aligned} d X_{t}^{i} & = u_{t}^{i} d S_{t} + r ρ_{t}^{i} e^{r t} d t + σ d W_{t}^{i} \\ = u_{t}^{i} d S_{t} + r (X_{t}^{i} - u_{t}^{i} S_{t} - σ W_{t}^{i}) d t + σ d W_{t}^{i}, t \geq 0, \\ X_{0}^{i} & = x \in R . \end{aligned}$ The $i$ th investors optimizes his portfolio by choosing the optimal process $u^{i}$ which is adapted with respect to the filtration $F^{i}$ generated by $(W^{0}, W^{i})$ . His gain functional $J_{i} (u^{(N)}), u^{(N)} = (u^{1}, \dots, u^{N})$ , depends of his choice $u^{i}$ but also on the investment processes $u^{j}, j \neq i$ , of the other investors and is given by $J_{i} (u^{(N)}) = E [ψ (X_{T}^{i}, \frac{1}{N} \sum_{j = 1}^{N} X_{T}^{j})] - c E [\int_{0}^{T} \int_{0}^{1} u_{t}^{i} I {u_{t}^{i} \geq {(F_{u_{t}^{(N)}}^{(N, i)})}^{- 1} (1 - α)} ρ (α) d α d t],$ where $ψ : R^{2} \to R$ is a utility function satisfying suitable assumptions, indicating the utility of $X_{T}^{i}$ for the $i$ th investor, which he also measures in comparison with the average sell value $\frac{1}{N} \sum_{j = 1}^{N} X_{T}^{j}$ observed at time $T$ , $ρ (\cdot) \in L^{1} ([0, 1], R_{+})$ , $F_{u_{t}^{(N)}}^{(N, i)} (s) = \frac{1}{N - 1} \sum_{j \leq N, j \neq i} I {u_{t}^{j} \leq s}, s \in R,$ denotes the empirical cumulative distribution function for $u_{t}^{(N)}$ , and ${(F_{u_{t}^{(N)}}^{(N, i)})}^{- 1} (1 - α) = inf {s \in R : F_{u_{t}^{(N)}}^{(N, i)} (s) \geq 1 - α}$ is the left-inverse empirical cumulative distribution function at level $1 - α$ ; ${(F_{u_{t}^{(N)}}^{(N, i)})}^{- 1} (1 - α)$ can be interpreted at the empirical value at risk at level $α > 0$ , ${VaR}_{α} (u^{(N, i)}) ≔ {(F_{u_{t}^{(N)}}^{(N, i)})}^{- 1} (1 - α)$ ( $u^{(N, i)} = {(u^{j})}_{j \neq i}$ )). This empirical value at risk allows to describe the risk exposition of the $i$ th investor with comparing his investment process $u_{t}^{i}$ with that of the other $N - 1$ investors $u_{t}^{(N, i)} :$ $\begin{matrix} R (u_{t}^{i} ∕ u_{t}^{(N, i)}) & = & E [\int_{0}^{1} u_{t}^{i} I {u_{t}^{i} \geq {(F_{u_{t}^{(N)}}^{(N, i)})}^{- 1} (1 - α)} ρ (α) d α] \\ = & \int_{0}^{1} {ES}_{α} [u_{t}^{i} ∕ u_{t}^{(N, i)}] ρ (α) d α, \end{matrix}$ and ${ES}_{α} [u_{t}^{i} ∕ u_{t}^{(N, i)}] ≔ E [u_{t}^{i} I {u_{t}^{i} \geq {(F_{u_{t}^{(N)}}^{(N, i)})}^{- 1} (1 - α)}]$ can be regarded as the expected empirical shortfall at level $α$ (expected empirical CVaR). Thus, the gain function is the difference between the expected utility of the sell value at a finite time horizon $T$ and insurance prime for the risk run by the investor with his investment strategy, $J_{i} (u^{(N)}) = E [ψ (X_{T}^{i})] - c \int_{0}^{T} R (u_{t}^{i} ∕ u_{t}^{(N, i)}) d t .$ Each of the $N$ investors optimizes his gain functional, and since of the symmetry of the control problem with respect to the investors, one can assume that there is a measurable, non anticipating functional ${\bar{u}}^{N} : [0, T] \times C ([0, T]; R^{2}) \to R$ such that, for the optimal investment strategy $u^{i}$ of the $i$ th investor, $u_{t}^{i} = {\bar{u}}_{t}^{N} (W_{\cdot \land t}^{0}, W_{\cdot \land t}^{i}), t \in [0, T], 1 \leq i \leq N$ . Moreover, when the number of investors $N$ tends to $+ \infty$ , one gets with the help of an adequate version of the law of large numbers the weak convergence, $P$ -a.s., for the regular conditional law $P^{W^{0}} = P {\cdot | W^{0}}$ knowing $W^{0}$ , $P^{W^{0}} \circ {[(u_{t}^{i}, F_{u_{t}^{(N)}}^{(N, i)} (1 - α))]}^{- 1} ⟹ P^{W^{0}} \circ {[(u_{t}, F_{u_{t}}^{W^{0}} (1 - α))]}^{- 1},$ for some limit investment strategy $u$ of the form $u_{t} = {\bar{u}}_{t} (W_{\cdot \land t}^{0}, W_{\cdot \land t}^{i}), t \in [0, T]$ , for some non anticipating, measurable functional $\bar{u} : [0, T] \times C ([0, T]; R^{2}) \to R$ , and for $F_{u_{t}}^{W^{0}}$ denoting the conditional cumulative distribution function of $u_{t}$ knowing $W^{0}$ , $F_{u_{t}}^{W^{0}} (1 - α) = P {u_{t} \leq 1 - α | W^{0}}$ .

Then, observing that $ϕ (x) ≔ \int_{0}^{1} I {x \geq 1 - α} ρ (α) d α = \int_{0}^{x} ρ (1 - α) d α, x \in [0, 1]$ , is continuous and bounded, and supposing, for simplicity, that $0 \leq u_{t}^{i} \leq C, t \in [0, T], i \geq 1$ , is bounded, using the conditional independence of $u_{t}^{i}$ and $F_{u_{t}^{(N)}}^{(N, i)} (\cdot)$ knowing $W^{0}$ , we have $\begin{matrix} R (u_{t}^{i} ∕ u_{t}^{(N, i)}) & = & E [\int_{0}^{1} E^{W^{0}} [u_{t}^{i} I {F_{u_{t}^{(N)}}^{(N, i)} (u_{t}^{i}) \geq 1 - α}] ρ (α) d α] \\ = & E [\int_{0}^{1} \int_{0}^{C} v P^{W^{0}} {F_{u_{t}^{(N)}}^{(N, i)} (v) \geq 1 - α} P_{u_{t}^{i}}^{W^{0}} (d v) ρ (α) d α] \\ = & E [\int_{0}^{C} v E^{W^{0}} [ϕ (F_{u_{t}^{(N)}}^{(N, i)} (v))] P_{u_{t}^{i}}^{W^{0}} (d v)] . \end{matrix}$ But, as $N$ tends to infinity, the latter expression converges to $E [\int_{0}^{C} v ϕ (F_{u_{t}}^{W^{0}} (v)) P_{u_{t}}^{W^{0}} (d v)]$ , i.e., as $N \to + \infty$ , $\begin{matrix} R (u_{t}^{i} ∕ u_{t}^{(N, i)}) & = & E [\int_{0}^{C} v ϕ (F_{u_{t}}^{W^{0}} (v)) P_{u_{t}}^{W^{0}} (d v)] \\ = & E [\int_{0}^{1} E^{W^{0}} [u_{t} I {u_{t} \geq {(F_{u_{t}}^{W^{0}})}^{- 1} (1 - α)}] ρ (α) d α] \\ = & E [\int_{0}^{1} {ES}_{α}^{W^{0}} [u_{t}] ρ (α) d α], \end{matrix}$ where ${ES}_{α}^{W^{0}} [u_{t}] = E^{W^{0}} [u_{t} I {u_{t} \geq {(F_{u_{t}}^{W^{0}})}^{- 1} (1 - α)}]$ is the expected shortfall at level $α$ with respect to the regular conditional probability $P^{W^{0}}$ knowing $W^{0}$ . Hence, taking into account that, for the optimal controls, $\frac{1}{N} \sum_{j = 1}^{N} X_{T}^{j} \to E^{W^{0}} [X_{T}^{1}]$ , $P^{W^{0}}$ -a.s., this leads to the limit control problem for a typical investor $i$ , when the number of investors is very large, $\{\begin{aligned} d X_{t}^{i} & = u_{t}^{i} d S_{t} + r (X_{t}^{i} - u_{t}^{i} S_{t} - σ W_{t}^{i}) d t + σ d W_{t}^{i}, t \geq 0, \\ X_{0}^{i} & = x \in R . \end{aligned}$ endowed with the gain functional $J (u) = E [ψ (X_{T}^{i}, E^{W^{0}} [X_{T}^{i}])] - c E [\int_{0}^{T} \int_{0}^{1} {ES}_{α}^{W^{0}} [u_{t}] ρ (α) d α d t] .$ We observe here that ${ES}_{α}^{W^{0}} [u_{t}]$ is a non differentiable function of the law $P_{u_{t}}$ of $u_{t}$ . On the other hand, we see the particular role played here by the conditional law $P^{W^{0}}$ . Remarking that $E^{W^{0}} [X_{T}^{i}] = E [X_{T}^{i} | F_{T}^{W^{0}}]$ , where $F_{T}^{W^{0}}$ is the $σ$ -field generated by ${W_{t}^{0}, t \in [0, T]}$ , we see that the above example generalizes in a direct way to control problems of the type (1.1) endowed with a gain or cost functional of the form (4.2).

Our main result for the SMP for our control problem with dynamics (1.1) can be stated roughly as follows. We assume that $u (\cdot)$ is an optimal control and $X$ is the corresponding optimal controlled process. Then there exist two pairs of stochastic processes $(p, q)$ and $(P, Q)$ , the solutions of the first-order and the second-order adjoint equations, respectively, such that, for all $v \in U_{a d}$ , $d t d P$ - $a . e .$ , $H (t, P_{(X_{t}, v_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], v_{t}, p_{t}, q_{t}) - H (t, P_{(X_{t}, u_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], u_{t}, p_{t}, q_{t}) + \frac{1}{2} P_{t} | σ (t, P_{(X_{t}, v_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], v_{t}) - σ {(t, P_{(X_{t}, u_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], u_{t}) |}^{2} \leq 0 .$ Moreover, on the classical Wiener space $(Ω = C ([0, T]; R^{2}), F = B (Ω) ⋁ N_{P}, P)$ with $W = (W^{1}, W^{2})$ as coordinate process, if the coefficients $b (t, μ, x, y, v)$ , $σ (t, μ, x, y, v)$ and $f (t, μ, x, y, v)$ are continuous with respect to $μ \in P_{2, 0} (R^{d} \times E)$ in the $W_{2, T V} (\cdot, \cdot)$ -metric, for all $(t, ω, x, y, v)$ , with a continuous modulus $ρ$ $(ρ : R_{+} \to R_{+} increasing, continuous, ρ (0) = 0)$ then, for all $v \in U_{a d}$ and $\bar{v} \in U$ , $d t d P$ - $a . e .$ , $H (t, P_{(X_{t}, v_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], \bar{v}, p_{t}, q_{t}) - H (t, P_{(X_{t}, u_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], u_{t}, p_{t}, q_{t}) + \frac{1}{2} P_{t} | σ {(t, P_{(X_{t}, v_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], \bar{v}) - σ (t, P_{(X_{t}, u_{t})}, X_{t}, E [X_{t} | F_{t}^{W^{1}}], u_{t}) |}^{2} \leq 0 .$ where the Hamiltonian $H$ is of the form $H (t, μ, x, y, v, p, q) ≔ p b (t, μ, x, y, v) + q σ (t, μ, x, y, v) - f (t, μ, x, y, v),$ $(t, μ, x, y, v, p, q) \in [0, T] \times P_{2, 0} (R^{d} \times U) \times R^{d} \times R^{d} \times U \times R^{d} \times R^{k + r}$ .

The key point for the proof of the stochastic maximum principle is to show the second-order expansion of $X^{ε}$ : For all $t \in [0, T]$ , $X^{ε} (t) = X (t) + X^{1} (t) + X^{2} (t) + o (ε),$ where $X^{ε} ≔ X^{u^{ε}}$ denotes the state process corresponding to the spike variation of the optimal control $u$ : $u^{ε} (t) ≔ v (t) I_{E_{ε}} (t) + u (t) I_{E_{ε}^{c}} (t), t \in [0, T]$ , and $E_{ε} \subset [0, T]$ is a Borel subset such that the Borel measure $| E_{ε} | = ε$ ; $X^{1}, X^{2}$ are the solutions to the first and the second order variational equations, respectively, and $o (ε)$ stands for a remainder which tends in $L^{2}$ quicker to $0$ than $ε$ .

Since the coefficients $b$ and $σ$ depend not only on $E [X_{t}^{1} | F_{t}^{W^{1}}]$ , but also on the joint law $P_{(X, v)}$ , there are some technical difficulties to prove the estimate (1.3). It is worth emphasizing that the very technical results of Proposition 5.2 and Lemma 5.2 with their rather subtle proofs play a crucial role in dealing with the study and the estimates in the Taylor expansion. In particular, in the proof of Proposition 5.2 which shows the specificity of stochastic controlled systems with mean-field dependence, we develop an operator argument, which is totally new and different from the classical case, see [3].

The paper is organized as follows. In Section 2, the notion of differentiability with respect to a probability measure is recalled. In Section 3 the partial derivative with respect to the law conditioned to its second marginal is studied, and the obtained results are illustrated by two examples. Section 4 is devoted to the formulation of the control problem and the SMP. The (first and second order) variational equations and some crucial estimates are introduced in Section 5. In Section 6 we prove our main result for the stochastic maximum principle.

Section snippets

Preliminaries

Let $(Ω, F, P)$ be a complete probability space and $F$ be a filtration satisfying the usual assumptions. For any sub- $σ$ -field $G \subseteq F$ , we denote

$• L^{2} (G; R^{d})$ is the set of $R^{d}$ -valued, $G$ -measurable random variables $ξ$ with ${| ξ |}_{L^{2}} ≔ {(E [| ξ |^{2}])}^{\frac{1}{2}} < \infty$ , which is a Hilbert space with inner product $〈 ξ, η 〉 = E [ξ \cdot η]$ , $ξ, η \in L^{2} (G; R^{d})$ .

$• L^{2} ([0, T]; R^{d})$ is the set of $R^{d}$ -valued, $F$ -adapted processes $ψ$ on $[0, T]$ , such that $| {ψ |}_{T} ≔ {(E [\int_{0}^{T} | ψ_{t} |^{2} d t])}^{\frac{1}{2}} < + \infty .$

$• P_{2} (R^{d})$ is the collection of all probability measures with finite second moment over $(R^{d}, B (R^{d}))$

Partial derivative with respect to the law conditioned to its second marginal

Let $(E, E)$ be an arbitrary measurable space. By $P (R^{d} \times E)$ we denote the set of all probability measures over $(R^{d} \times E, B (R^{d}) \otimes E)$ , and we put $P_{2, 0} (R^{d} \times E) ≔ {μ \in P (R^{d} \times E) : \int_{R^{d} \times E} | x |^{2} μ (d x d z) < + \infty} .$ For $μ \in P_{2, 0} (R^{d} \times E)$ , we denote by $μ_{1} (A) ≔ μ (A \times E), A \in B (R^{d})$ , and $μ_{2} (B) ≔ μ (R^{d} \times B), B \in E$ , the marginals of $μ$ ; $μ \in P_{2, 0} (R^{d} \times E)$ means that $μ_{1} \in P_{2} (R^{d})$ . This space can be endowed with the metric $W_{2, T V} (μ, μ^{'}) = inf {{(\int_{{(R^{d} \times E)}^{2}} | x - x^{'} |^{2} ρ (d x d z, d x^{'} d z^{'}))}^{\frac{1}{2}} + \int_{{(R^{d} \times E)}^{2}} I_{{z \neq z^{'}}} ρ (d x d z, d x^{'} d z^{'}), ρ \in π (μ, μ^{'})},$ $μ, μ^{'} \in P_{2, 0} (R^{d} \times E)$ , where $π (μ, μ^{'})$ denotes the collection of

Problem formulation

Let $(U, G)$ be an arbitrary measurable space and $(Ω, F, P)$ a complete probability space such that $Ω$ equipped with its Borel $σ$ -field $B (Ω)$ is a Radon space, $F$ is the completion of $B (Ω)$ with respect to $P$ , and suppose that on $(Ω, F, P)$ is defined a $(k + r)$ -dimensional Brownian motion $W = (W^{1}, W^{2})$ , $k, r \geq 1$ . Given an arbitrary but fixed finite time horizon $T > 0$ , we denote by ${(F_{t}^{W^{1}})}_{0 \leq t \leq T}$ and ${(F_{t}^{W^{2}})}_{0 \leq t \leq T}$ the completed natural filtration generated by $W^{1}$ and $W^{2}$ , respectively. Let $F = {(F_{t})}_{t \in [0, T]}$ denote the completed

Variational equations

Now we introduce the first order and the second order variational equations. Since the control set $U$ is not necessarily convex, we shall use the spike variation method. More precisely, let $ε > 0$ , and choose a Borel subset $E_{ε} \subset [0, T]$ with $| E_{ε} | = ε$ . For an arbitrarily chosen but fixed $v \in U_{a d}$ , we define $u^{ε} (t) ≔ v (t) I_{E_{ε}} (t) + u (t) I_{E_{ε}^{c}} (t), t \in [0, T],$ which is called a spike variation of the optimal control $u$ .

The key point to prove the stochastic maximum principle stated in Theorem 4.1 is to show for the controlled

Proof of Theorem 4.1

From the definition of the cost functional $J (\cdot)$ , the optimality of $u \in U_{a d}$ and the definition of $α^{ε} (φ)$ in (5.31) for $φ = f$ as well as $α_{T}^{ε} (ϕ)$ in Corollary 5.1, we obtain $0 \leq J (u^{ε}) - J (u) = E [\int_{0}^{T} (f_{x} (t) (X_{t}^{1} + X_{t}^{2}) + f_{y} (t) E [X_{t}^{1} + X_{t}^{2} | F_{t}^{W^{1}}] + \hat{E} [{\hat{f}}_{μ} (t) ({\hat{X}}_{t}^{1} + {\hat{X}}_{t}^{2})]) d t] + E [ϕ_{x} (T) (X_{T}^{1} + X_{T}^{2}) + ϕ_{y} (T) E [X_{T}^{1} + X_{T}^{2} | F_{T}^{W^{1}}]] + E [\hat{E} [{\hat{ϕ}}_{μ} (T) ({\hat{X}}_{T}^{1} + {\hat{X}}_{T}^{2})]] + E [\int_{0}^{T} (δ f (t) I_{E_{ε}} (t) + L_{x x} (t, f, X_{t}^{1}) + 2 L_{x y} (t, f, X_{t}^{1}, E [X_{t}^{1} | F_{t}^{W^{1}}]) + L_{y y} (t, f, E [X_{t}^{1} | F_{t}^{W^{1}}]) + \hat{E} [L_{z μ} (t, \hat{f}, {\hat{X}}_{t}^{1})]) d t] + E [L_{x x} (T, ϕ, X_{T}^{1}) + 2 L_{x y} (T, ϕ, X_{T}^{1}, E [X_{T}^{1} | F_{T}^{W^{1}}]) + L_{y y} (T, ϕ, E [X_{T}^{1} | F_{T}^{W^{1}}])] + E [\hat{E} [L_{z μ} (T, \hat{ϕ}, {\hat{X}}_{T}^{1})]] + K_{1} (ε$

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (17)

BuckdahnR. et al.
Mean-field backward stochastic differential equations and related partial differential equations
Stochastic Process. Appl.
(2009)
LiJ.
Stochastic maximum principle in the mean-field controls
Automatica
(2012)
LiJ.
Mean-field forward and backward SDEs with jumps and associated nonlocal quasi-linear integral-PDEs
Stochastic Process. Appl.
(2018)
AcciaioB. et al.
Extended mean field control problems: stochastic maximum principle and transport perspective
SIAM J. Control Optim.
(2018)
BuckdahnR. et al.
A general stochastic maximum principle for SDEs of mean-field type
Appl. Math. Optim.
(2011)
BuckdahnR. et al.
A stochastic maximum principle for general mean-field systems
Appl. Math. Optim.
(2016)
BuckdahnR. et al.
A mean-field stochastic control problem with partial observations ann
Appl. Probab.
(2017)
BuckdahnR. et al.
Mean-field stochastic differential equations and associated PDEs
Ann. Probab.
(2017)

There are more references available in the full text version of this article.

Cited by (6)

A stochastic maximum principle for partially observed general mean-field control problems with only weak solution
2023, Stochastic Processes and their Applications
In this paper we focus on a general type of mean-field stochastic control problem with partial observation, in which the coefficients depend in a non-linear way not only on the state process $X_{t}$ and its control $u_{t}$ but also on the conditional law $E [X_{t} | F_{t}^{Y}]$ of the state process conditioned with respect to the past of observation process $Y$ . We first deduce the well-posedness of the controlled system by showing weak existence and uniqueness in law. Neither supposing convexity of the control state space nor differentiability of the coefficients with respect to the control variable, we study Peng’s stochastic maximum principle for our control problem. The novelty and the difficulty of our work stem from the fact that, given an admissible control $u$ , the solution of the associated control problem is only a weak one. This has as consequence that also the probability measure in the solution $P^{u} = L_{T}^{u} Q$ depends on $u$ and has a density $L_{T}^{u}$ with respect to a reference measure $Q$ . So characterizing an optimal control leads to the differentiation of non-linear functions $f (P^{u} \circ {E^{P^{u}} [X_{t} | F_{t}^{Y}]}^{- 1})$ with respect to $(L_{T}^{u}, X_{t})$ . This has as consequence for the study of Peng’s maximum principle that we get a new type of first and second order variational equations and adjoint backward stochastic differential equations, all with new mean-field terms and with coefficients which are not Lipschitz. For their estimates and for those for the Taylor expansion new techniques have had to be introduced and rather technical results have had to be established. The necessary optimality condition we get extends Peng’s one with new, non-trivial terms.
Incomplete Information Linear-Quadratic Mean-Field Games and Related Riccati Equations
2023, arXiv
Stochastic maximum principle for weighted mean-field system
2023, Discrete and Continuous Dynamical Systems - Series S
Mean-field backward stochastic differential equations and nonlocal PDEs with quadratic growth
2022, arXiv
Extended mean-field control problem with partial observation
2022, ESAIM - Control, Optimisation and Calculus of Variations
ON THE NEAR-VIABILITY PROPERTY OF CONTROLLED MEAN-FIELD FLOWS
2023, Numerical Algebra, Control and Optimization

^☆: The work has been supported by the NSF of P.R. China (No. 12031009, 11871037), National Key R and D Program of China (NO. 2018YFA0703900), NSFC-RS (No. 11661130148; NA150344), and also supported by the “FMJH Program Gaspard Monge in optimization and operation research”, and the ANR (Agence Nationale de la Recherche), France project ANR-16-CE40-0015-01.

View full text

Partial derivative with respect to the measure and its application to general controlled mean-field systems☆

Abstract

Introduction

Section snippets

Preliminaries

Partial derivative with respect to the law conditioned to its second marginal

Problem formulation

Variational equations

Proof of Theorem 4.1

Declaration of Competing Interest

Stochastic Process. Appl.

Automatica

Stochastic Process. Appl.

Extended mean field control problems: stochastic maximum principle and transport perspective

SIAM J. Control Optim.

A general stochastic maximum principle for SDEs of mean-field type

Appl. Math. Optim.

A stochastic maximum principle for general mean-field systems

Appl. Math. Optim.

A mean-field stochastic control problem with partial observations ann

Appl. Probab.

Mean-field stochastic differential equations and associated PDEs

Ann. Probab.