Partial derivative with respect to the measure and its application to general controlled mean-field systems

https://doi.org/10.1016/j.spa.2021.01.003Get rights and content

Abstract

Let (E,E) be an arbitrary measurable space. The paper first focuses on studying the partial derivative of a function f:P2,0(Rd×E)R defined on the space of probability measures μ over (Rd×E,B(Rd)E) whose first marginal μ1μ(×E) has a finite second order moment. This partial derivative is taken with respect to q(dx,z), where μ has the disintegration μ(dxdz)=q(dx,z)μ2(dz) with respect to its second marginal μ2()=μ(Rd×). Simplifying the language, we will speak of the derivative with respect to the law μ conditioned to its second marginal. Our results extend those of the derivative of a function g:P2(Rd)R over the space of probability measures with finite second order moment by P.L. Lions (see Lions (2013)) but cover also as a particular case recent approaches considering E=Rk and supposing the differentiability of f over P2(Rd×Rk), in order to use the derivative μf to define the partial derivative (μf)1. The second part of the paper focuses on investigating a stochastic maximum principle, where the controlled state process is driven by a general mean-field stochastic differential equation with partial information. The control set is just supposed to be a measurable space, and the coefficients of the controlled system, i.e., those of the dynamics as well as of the cost functional, depend on the controlled state process X, the control v, a partial information on X, as well as on the joint law of (X,v). Through considering a new second-order variational equation and the corresponding second-order adjoint equation, and a totally new method to prove the estimate for the solution of the first-order variational equation, the optimal principle is proved through spike variation of an optimal control and with the help of the tailor-made form of second-order expansion. We emphasize that in our assumptions we do not need any regularity of the coefficients neither in the control variable nor with respect to the law of the control process.

Introduction

Let T>0 be a fixed time horizon and (Ω,F,P) be a given complete probability space on which a (k+r)-dimensional Brownian motion W=(W1,W2) is defined. In this paper we are interested in studying the partial derivative of a function f:P2,0(Rd×E)R with respect to the law μ conditioned to its second marginal μ2=μ(Rd×), where μ is from the space of probability measures P2,0(Rd×E) over (Rd×E,B(Rd)E) whose first marginal μ1μ(×E) has a finite second order moment. In the second part, as an interesting application, we will study Peng’s maximum principle for a class of general optimal control problems with the McKean–Vlasov dynamics: dXtv=b(t,P(Xtv,vt),Xtv,E[Xtv|FtW1],vt)dt+σ(t,P(Xtv,vt),Xtv,E[Xtv|FtW1],vt)dWt,t[0,T],X0v=x0,where PξPξ1 denotes the law of the random variable ξ under P, (FtW1)0tT is the natural filtration generated by W1, and the coefficients b,σ are measurable functions with appropriate dimensions (The precise assumptions on them will be given in Section 4). The control process v takes its values in an arbitrary measurable space (U,G). Here, we will not assume any differentiability of the coefficients with respect to the control v, and neither with respect to the second marginal of the law of (X,v).

Stochastic control problems with controlled McKean–Vlasov dynamics have been used to describe the (Nash) equilibrium state of the symmetric game, see, for example, Chassagneux et al. [9]. In [8], with the help of a tailor-made stochastic maximum principle, the authors proved the existence of a mean-field game strategy. It is worth noting that their stochastic maximum principle is based on the assumption that the Hamiltonian is strictly convex with respect to the control v, which plays an important role in the discussion.

The main difficulty in dealing with the stochastic maximum principle (SMP) without convexity of the control state space U nor regularity assumptions of the coefficients with respect to the control variable is to find the variational equations and adjoint equations. Peng [17] was the first to get rid of the convexity assumption on the control state space by using the second-order term in the Taylor expansion and he proved the necessary condition of optimality for a control in the case where the diffusion coefficient σ depends on the control. On the other hand, since the pioneering works by J.M. Lasry and P.L. Lions [13] and Huang, Malhamé, Caines [11], the research on mean-field problems has attracted a lot of researchers. And so also mean-field stochastic control optimal problems have been studied by many authors; we refer to [1], [2], [3], [14] and references cited therein. Acciaio et al. [1] studied mean-field stochastic control problems where the cost functional and the state dynamics depend on the joint distribution of the controlled state and the control process. They proved the Pontryagin stochastic maximum principle with differentiability assumptions of the coefficients with respect to the control and its law. Buckdahn, Li and Ma [3] were the first to study the optimal control problem for a class of general mean-field stochastic differential equations with non-convex control domains and they obtained the related Peng’s stochastic maximum principle. We emphasize that in [3] the coefficients do not depend on the law of the control, that is, the cost functional and the state dynamics only depend on the law of the controlled state.

Strongly inspired by Buckdahn, Li and Ma [3], [4], our work investigates a generalized mean-field stochastic maximum principle for the optimal control problem where the coefficients not only depend on the law of (X,v) but also with partial information. Similar to the works of [2], [3], [17], the second-order variational equation and the second-order adjoint equation are obtained without convexity assumption on the control state space. In our setting, the coefficients depend on the joint law of (X,v) without assuming differentiability with respect to the law of control, which extends the existing works. As in the pioneering work of Peng [17] we do not require the differentiability of the coefficients in the control variable v, and in order to be coherent, we neither suppose it with respect to the second marginal of the law of (X,v). In the existing literature the partial derivative of a function f(μ),μP2(Rd×Rk) (the space of Borel probability measures with finite second order moment over Rd×Rk) has been introduced as the first component (μf)1(μ,y,z) of the derivative (μf)(μ,y,z)=((μf)1(μ,y,z),(μf)2(μ,y,z))Rd×Rk. However, when, for instance, μ is a probability measure over Rd×E, where (E,E) is an arbitrary measurable space, or simply when E=U is a control state space, such a global differentiability property is rather restrictive and shall be avoided. For this reason we study in Section 3 the partial differentiability with respect to the law μ conditioned to its second marginal μ2=μ(Rd×) without any assumption of regularity with respect to μ2. Our results cover the particular cases that f is differentiable over P2(Rd) and P2(Rd×Rk), respectively, and an example of the partial differentiability of f:P2(Rd×E)R is discussed, when f does not have any regularity with respect to the second marginal law.

In order to come back to our generalized mean-field stochastic maximum principle, in addition to the dependence of the coefficients of Eq. (1.1) of the joint law of (X,v), they also depend on partial information of the controlled state process, i.e., in our case, on the conditional expectation E[Xt|FtW1], which immediately leads to subtle difficulties in treating some appropriate estimates. A special case of our setting was studied in [2], where the dependence of the coefficients on P(Xt,vt) is reduced to that on E[Xt]. It is very natural to study optimal control problems under partial information since controllers can only get partial information in most cases, see, for example, Huang, Wang and Xiong [12] obtained the stochastic maximum principle for control problems under partial information with the differentiability of the coefficients with respect to the control v.

Our optimal control problem we study in this paper can be illustrated by the following motivating example:

Example 1.1

Let Wj,j0, be a sequence of independent 1-dimensional Brownian motions defined over a probability space (Ω,F,P). We consider the model of a financial market with a highly risky equity fund and a riskless asset with risk free rate r. The price St of a share of the equity fund at time t is given by St=S0exp{σ0Wt0+(μσ022)t}, where σ0>0 is the volatility and μR the return rate. Each of the N(1) investors holds a portfolio πti=(uti,ρti),t0 (portfolio of the ith investor) which he tries to optimize. The value at time t of the investor i is Vti=utiSt+ρtiert whose sell price Xti is perturbed by an independent Brownian motion σWi with volatility σ>0: Xti=Vti+σWti,t0. The portfolios are supposed to be self-financing, which implies that the dynamics of Xi is given by dXti=utidSt+rρtiertdt+σdWti=utidSt+r(XtiutiStσWti)dt+σdWti,t0,X0i=xR.The ith investors optimizes his portfolio by choosing the optimal process ui which is adapted with respect to the filtration Fi generated by (W0,Wi). His gain functional Ji(u(N)),u(N)=(u1,,uN), depends of his choice ui but also on the investment processes uj,ji, of the other investors and is given by Ji(u(N))=E[ψ(XTi,1Nj=1NXTj)]cE0T01utiI{uti(Fut(N)(N,i))1(1α)}ρ(α)dαdt,where ψ:R2R is a utility function satisfying suitable assumptions, indicating the utility of XTi for the ith investor, which he also measures in comparison with the average sell value 1Nj=1NXTj observed at time T, ρ()L1([0,1],R+), Fut(N)(N,i)(s)=1N1jN,jiI{utjs},sR,denotes the empirical cumulative distribution function for ut(N), and (Fut(N)(N,i))1(1α)=inf{sR:Fut(N)(N,i)(s)1α}is the left-inverse empirical cumulative distribution function at level 1α; (Fut(N)(N,i))1(1α) can be interpreted at the empirical value at risk at level α>0, VaRα(u(N,i))(Fut(N)(N,i))1(1α) (u(N,i)=(uj)ji)). This empirical value at risk allows to describe the risk exposition of the ith investor with comparing his investment process uti with that of the other N1 investors ut(N,i): R(utiut(N,i))=E[01utiI{uti(Fut(N)(N,i))1(1α)}ρ(α)dα]=01ESα[utiut(N,i)]ρ(α)dα,and ESα[utiut(N,i)]E[utiI{uti(Fut(N)(N,i))1(1α)}]can be regarded as the expected empirical shortfall at level α (expected empirical CVaR). Thus, the gain function is the difference between the expected utility of the sell value at a finite time horizon T and insurance prime for the risk run by the investor with his investment strategy, Ji(u(N))=Eψ(XTi)c0TR(utiut(N,i))dt. Each of the N investors optimizes his gain functional, and since of the symmetry of the control problem with respect to the investors, one can assume that there is a measurable, non anticipating functional u¯N:[0,T]×C([0,T];R2)R such that, for the optimal investment strategy ui of the ith investor, uti=u¯tN(Wt0,Wti),t[0,T],1iN. Moreover, when the number of investors N tends to +, one gets with the help of an adequate version of the law of large numbers the weak convergence, P-a.s., for the regular conditional law PW0=P{|W0} knowing W0, PW0[(uti,Fut(N)(N,i)(1α))]1PW0[(ut,FutW0(1α))]1, for some limit investment strategy u of the form ut=u¯t(Wt0,Wti),t[0,T], for some non anticipating, measurable functional u¯:[0,T]×C([0,T];R2)R, and for FutW0 denoting the conditional cumulative distribution function of ut knowing W0, FutW0(1α)=P{ut1α|W0}.

Then, observing that ϕ(x)01I{x1α}ρ(α)dα=0xρ(1α)dα,x[0,1], is continuous and bounded, and supposing, for simplicity, that 0utiC,t[0,T],i1, is bounded, using the conditional independence of uti and Fut(N)(N,i)() knowing W0, we have R(utiut(N,i))=E[01EW0[utiI{Fut(N)(N,i)(uti)1α}]ρ(α)dα]=E[010CvPW0{Fut(N)(N,i)(v)1α}PutiW0(dv)ρ(α)dα]=E[0CvEW0[ϕ(Fut(N)(N,i)(v))]PutiW0(dv)].But, as N tends to infinity, the latter expression converges to E[0Cvϕ(FutW0(v))PutW0(dv)], i.e., as N+, R(utiut(N,i))=E[0Cvϕ(FutW0(v))PutW0(dv)]=E[01EW0[utI{ut(FutW0)1(1α)}]ρ(α)dα]=E[01ESαW0[ut]ρ(α)dα],where ESαW0[ut]=EW0[utI{ut(FutW0)1(1α)}] is the expected shortfall at level α with respect to the regular conditional probability PW0 knowing W0. Hence, taking into account that, for the optimal controls, 1Nj=1NXTjEW0[XT1], PW0-a.s., this leads to the limit control problem for a typical investor i, when the number of investors is very large, dXti=utidSt+r(XtiutiStσWti)dt+σdWti,t0,X0i=xR.endowed with the gain functional J(u)=E[ψ(XTi,EW0[XTi])]cE[0T01ESαW0[ut]ρ(α)dαdt]. We observe here that ESαW0[ut] is a non differentiable function of the law Put of ut. On the other hand, we see the particular role played here by the conditional law PW0. Remarking that EW0[XTi]=E[XTi|FTW0], where FTW0 is the σ-field generated by {Wt0,t[0,T]}, we see that the above example generalizes in a direct way to control problems of the type (1.1) endowed with a gain or cost functional of the form (4.2).

Our main result for the SMP for our control problem with dynamics (1.1) can be stated roughly as follows. We assume that u() is an optimal control and X is the corresponding optimal controlled process. Then there exist two pairs of stochastic processes (p,q) and (P,Q), the solutions of the first-order and the second-order adjoint equations, respectively, such that, for all vUad, dtdP-a.e., H(t,P(Xt,vt),Xt,E[Xt|FtW1],vt,pt,qt)H(t,P(Xt,ut),Xt,E[Xt|FtW1],ut,pt,qt)+12Pt|σ(t,P(Xt,vt),Xt,E[Xt|FtW1],vt)σ(t,P(Xt,ut),Xt,E[Xt|FtW1],ut)|20.Moreover, on the classical Wiener space (Ω=C([0,T];R2),F=B(Ω)NP,P) with W=(W1,W2) as coordinate process, if the coefficients b(t,μ,x,y,v), σ(t,μ,x,y,v) and f(t,μ,x,y,v) are continuous with respect to μP2,0(Rd×E) in the W2,TV(,)-metric, for all (t,ω,x,y,v), with a continuous modulus ρ (ρ:R+R+increasing, continuous,ρ(0)=0) then, for all vUad and v̄U, dtdP-a.e., H(t,P(Xt,vt),Xt,E[Xt|FtW1],v̄,pt,qt)H(t,P(Xt,ut),Xt,E[Xt|FtW1],ut,pt,qt)+12Pt|σ(t,P(Xt,vt),Xt,E[Xt|FtW1],v̄)σ(t,P(Xt,ut),Xt,E[Xt|FtW1],ut)|20.where the Hamiltonian H is of the form H(t,μ,x,y,v,p,q)pb(t,μ,x,y,v)+qσ(t,μ,x,y,v)f(t,μ,x,y,v), (t,μ,x,y,v,p,q)[0,T]×P2,0(Rd×U)×Rd×Rd×U×Rd×Rk+r.

The key point for the proof of the stochastic maximum principle is to show the second-order expansion of Xε: For all t[0,T], Xε(t)=X(t)+X1(t)+X2(t)+o(ε),where XεXuε denotes the state process corresponding to the spike variation of the optimal control u: uε(t)v(t)IEε(t)+u(t)IEεc(t),t[0,T], and Eε[0,T] is a Borel subset such that the Borel measure |Eε|=ε; X1,X2 are the solutions to the first and the second order variational equations, respectively, and o(ε) stands for a remainder which tends in L2 quicker to 0 than ε.

Since the coefficients b and σ depend not only on E[Xt1|FtW1], but also on the joint law P(X,v), there are some technical difficulties to prove the estimate (1.3). It is worth emphasizing that the very technical results of Proposition 5.2 and Lemma 5.2 with their rather subtle proofs play a crucial role in dealing with the study and the estimates in the Taylor expansion. In particular, in the proof of Proposition 5.2 which shows the specificity of stochastic controlled systems with mean-field dependence, we develop an operator argument, which is totally new and different from the classical case, see [3].

The paper is organized as follows. In Section 2, the notion of differentiability with respect to a probability measure is recalled. In Section 3 the partial derivative with respect to the law conditioned to its second marginal is studied, and the obtained results are illustrated by two examples. Section 4 is devoted to the formulation of the control problem and the SMP. The (first and second order) variational equations and some crucial estimates are introduced in Section 5. In Section 6 we prove our main result for the stochastic maximum principle.

Section snippets

Preliminaries

Let (Ω,F,P) be a complete probability space and F be a filtration satisfying the usual assumptions. For any sub-σ-field GF, we denote

L2(G;Rd) is the set of Rd-valued, G-measurable random variables ξ with |ξ|L2(E[|ξ|2])12<, which is a Hilbert space with inner product ξ,η=E[ξη], ξ,ηL2(G;Rd).

L2([0,T];Rd) is the set of Rd-valued, F-adapted processes ψ on [0,T], such that |ψ|T(E[0T|ψt|2dt])12<+.

P2(Rd) is the collection of all probability measures with finite second moment over (Rd,B(Rd))

Partial derivative with respect to the law conditioned to its second marginal

Let (E,E) be an arbitrary measurable space. By P(Rd×E) we denote the set of all probability measures over (Rd×E,B(Rd)E), and we put P2,0(Rd×E){μP(Rd×E):Rd×E|x|2μ(dxdz)<+}. For μP2,0(Rd×E), we denote by μ1(A)μ(A×E),AB(Rd), and μ2(B)μ(Rd×B),BE, the marginals of μ; μP2,0(Rd×E) means that μ1P2(Rd). This space can be endowed with the metric W2,TV(μ,μ)=inf{((Rd×E)2|xx|2ρ(dxdz,dxdz))12+(Rd×E)2I{zz}ρ(dxdz,dxdz),ρπ(μ,μ)}, μ,μP2,0(Rd×E), where π(μ,μ) denotes the collection of

Problem formulation

Let (U,G) be an arbitrary measurable space and (Ω,F,P) a complete probability space such that Ω equipped with its Borel σ-field B(Ω) is a Radon space, F is the completion of B(Ω) with respect to P, and suppose that on (Ω,F,P) is defined a (k+r)-dimensional Brownian motion W=(W1,W2), k,r1. Given an arbitrary but fixed finite time horizon T>0, we denote by (FtW1)0tT and (FtW2)0tT the completed natural filtration generated by W1 and W2, respectively. Let F=(Ft)t[0,T] denote the completed

Variational equations

Now we introduce the first order and the second order variational equations. Since the control set U is not necessarily convex, we shall use the spike variation method. More precisely, let ε>0, and choose a Borel subset Eε[0,T] with |Eε|=ε. For an arbitrarily chosen but fixed vUad, we define uε(t)v(t)IEε(t)+u(t)IEεc(t),t[0,T],which is called a spike variation of the optimal control u.

The key point to prove the stochastic maximum principle stated in Theorem 4.1 is to show for the controlled

Proof of Theorem 4.1

From the definition of the cost functional J(), the optimality of uUad and the definition of αε(φ) in (5.31) for φ=f as well as αTε(ϕ) in Corollary 5.1, we obtain 0J(uε)J(u)=E[0T(fx(t)(Xt1+Xt2)+fy(t)E[Xt1+Xt2|FtW1]+Ê[f̂μ(t)(X̂t1+X̂t2)])dt]+E[ϕx(T)(XT1+XT2)+ϕy(T)E[XT1+XT2|FTW1]]+E[Ê[ϕ̂μ(T)(X̂T1+X̂T2)]]+E[0T(δf(t)IEε(t)+Lxx(t,f,Xt1)+2Lxy(t,f,Xt1,E[Xt1|FtW1])+Lyy(t,f,E[Xt1|FtW1])+Ê[Lzμ(t,f̂,X̂t1)])dt]+E[Lxx(T,ϕ,XT1)+2Lxy(T,ϕ,XT1,E[XT1|FTW1])+Lyy(T,ϕ,E[XT1|FTW1])]+E[Ê[Lzμ(T,ϕ̂,X̂T1)]]+K1(ε

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (17)

There are more references available in the full text version of this article.

Cited by (6)

  • Stochastic maximum principle for weighted mean-field system

    2023, Discrete and Continuous Dynamical Systems - Series S
  • Extended mean-field control problem with partial observation

    2022, ESAIM - Control, Optimisation and Calculus of Variations
  • ON THE NEAR-VIABILITY PROPERTY OF CONTROLLED MEAN-FIELD FLOWS

    2023, Numerical Algebra, Control and Optimization

The work has been supported by the NSF of P.R. China (No. 12031009, 11871037), National Key R and D Program of China (NO. 2018YFA0703900), NSFC-RS (No. 11661130148; NA150344), and also supported by the “FMJH Program Gaspard Monge in optimization and operation research”, and the ANR (Agence Nationale de la Recherche), France project ANR-16-CE40-0015-01.

View full text