Next Article in Journal
About the Equivalence of the Latent D-Scoring Model and the Two-Parameter Logistic Item Response Model
Next Article in Special Issue
Time-Consistency of an Imputation in a Cooperative Hybrid Differential Game
Previous Article in Journal
General Fractional Dynamics
Previous Article in Special Issue
Greenness as a Differentiating Strategy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Constrained Markovian Diffusion Model for Controlling the Pollution Accumulation

by
Beatris Adriana Escobedo-Trujillo
1,†,
José Daniel López-Barrientos
2,*,† and
Javier Garrido-Meléndez
1
1
Facultad de Ingeniería, Universidad Veracruzana, Xalapa de Enriquez 91090, Mexico
2
Facultad de Ciencias Actuariales, Universidad Anáhuac México, Naucalpan de Juárez 52786, Mexico
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2021, 9(13), 1466; https://doi.org/10.3390/math9131466
Submission received: 19 May 2021 / Revised: 16 June 2021 / Accepted: 16 June 2021 / Published: 22 June 2021

Abstract

:
This work presents a study of a finite-time horizon stochastic control problem with restrictions on both the reward and the cost functions. To this end, it uses standard dynamic programming techniques, and an extension of the classic Lagrange multipliers approach. The coefficients considered here are supposed to be unbounded, and the obtained strategies are of non-stationary closed-loop type. The driving thread of the paper is a sequence of examples on a pollution accumulation model, which is used for the purpose of showing three algorithms for the purpose of replicating the results. There, the reader can find a result on the interchangeability of limits in a Dirichlet problem.

1. Introduction

The aim of pollution accumulation models is to study the management of some goods to be consumed by a society. It is generally accepted that such consumption generates two byproducts: a social utility, and pollution. The difference between the utility and the disutility associated with the pollution is known as social welfare. The theory developed in this work enables the decision maker to find a consumption policy that maximizes an expected social welfare for the society, subject to a constraint that may represent, for example, that some costs of cleaning the environment are not to exceed some given quantity along time.
This paper deals with the problem of finding optimal controllers and values for a class of diffusions with unbounded coefficients on a finite-time horizon under the total payoff criterion subject to restrictions. It uses standard dynamic programming tools, the Lagrange multipliers approach, and a result on the interchangeability of limits in a Bellman equation. The driving thread of the paper is a sequence of examples on a pollution accumulation model, which is used for the purpose of showing how to replicate the theoretical results of the work.
The origin of the use of the optimal control theory in the context of stochastic diffusions on a finite-time horizon can be traced back to the works of Howard (see [1]), Fleming (see, for instance, [2,3,4]), Kogan (see [5]), and Puterman (cf. [6]). However, the stochastic optimization problem with constraints was attacked only in the late 90s and early 2000s, when some financial applications demanded the consideration of these models, under the hypothesis that the coefficients of all: the diffusion itself, the reward function, and the restrictions, are bounded (see, for Example [7,8,9,10]). Constrained optimal control under the discounted and ergodic criteria was studied in the seminal paper of Borkar and Ghosh (see [11]), the work of Mendoza-Pérez, Jasso-Fuentes, Prieto-Rumeau and Hernández-Lerma (see [12,13]), and the paper by Jasso-Fuentes, Escobedo-Trujillo and Mendoza-Pérez [14]. In fact, these works serve as an inspiration to pursue an extension of their research to the realm of non-stationary strategies.
Although this is not the first time that the problem of pollution accumulation has been studied from the point of view of dynamic optimization (for example, [15] uses an LQ model to describe this phenomenon, [16] deals with the average payoff in a deterministic framework, [17,18] extend the approach of the former to a stochastic context, and [19] uses a stochastic differential game against nature to characterize the situation), this paper contributes to the state-of-the-art by adding constraints to the reward function, and by taking into consideration a finite-time horizon. Moreover, this work profits from this fact by proposing a simulation scheme to test its analytic results. However, it would not be possible to find a suitable Lagrange multiplier for such simulations without the results presented in Example 3, and Theorem 2, below.
The relevance of this work lies in the applicability of its analytic results in a finite-time interval. Unlike the models under infinite-time criteria (i.e., discounted and average payoffs; and the refinements of the latter), which focus on finding optimal controllers in the set of (Markovian) stationary strategies, the criterion at hand considers as well the more general set of (Markovian) non-stationary strategies. This fact implies that the functional form of the Bellman equation includes a time-dependent term, and that the feedback controllers will depend explicitly on the time argument. Since the coefficients of the diffusions involved in this study are assumed to be unbounded, all of the points in R n will be attainable, and a verification result will be needed to ensure the existence of a solution to the Bellman equation that remains valid for all ( t , x ) in [ 0 ; T ] × R n , where T will be the horizon.
Significance and contributions.
  • This paper presents an application of two classic tools: the Lagrange multipliers approach, and Bellman optimization in a finite horizon for diffusions with possibly unbounded coefficients. This fact represents a major technical contribution with respect to the existing literature.
  • This study illustrates its results by means of the full development and implementation of an example on control of pollution accumulation. It also gives actual algorithms which can be used for the replication of the results presented along its pages.
  • This work lies within the framework of dynamic optimization. However, it considers a broader class of coefficients than, for instance, [15]. As is the case of [16], it presents a pollution accumulation model. However, it focuses on a stochastic context (as in [17,18]), with the difference that the present project does so in a finite-time horizon, and with restrictions on both the reward and the cost functions.
The rest of the paper is divided as follows. The next section gives the generalities of the model under consideration, i.e., the diffusion that drives the control problem, the total payoff criterion, the restrictions on the cost and the control policies at hand. Example 1 introduces the pollution model. Section 3 deals with the actual (analytic and simulated) solution of the problem. Examples 2, 3, 4, Lemma 2, Theorem 2 and Example 5 illustrate the analytic technique and serve the purpose of comparing it with some numeric simulations. Finally, Section 4 is devoted to the presentation of the final Remarks.
This section concludes by introducing some notation for spaces of real-valued functions on an open set R n . The space W , p ( R n ) stands for the Sobolev space consisting of all real-valued measurable functions h on R n such that D α h exists for all | α | in the weak sense, and it belongs to L p ( R n ) , where
D α h : = | α | h x 1 α 1 , , x n α n w i t h α = ( α 1 , , α n ) , a n d | α | : = i = 1 n α i .
Moreover, C κ ( R n ) is the space of all real-valued continuous functions on R n with continuous -th partial derivative in x i R , for i = 1 , . . . , N , = 0 , 1 , . . . , κ . In particular, when κ = 0 , C 0 ( R n ) stands for the space of real-valued continuous functions on R n . Now, C κ , η ( R n ) is the subspace of C κ ( R n ) consisting of all those functions h such that D α h satisfies a Hölder condition with exponent η ] 0 ; 1 ] , for all | α | κ ; that is, there exists a constant K 0 such that
| D α h ( x ) D α h ( y ) | K 0 | x y | η .
Define
W 1 , ; p ( [ 0 ; T ] × R n ) : = { h : [ 0 ; T ] × R n R , h ( t , · ) W ; p ( R n ) a n d h ( · , x ) C 1 ( [ 0 ; T ] ) } .
The space W 1 , ; p ( [ 0 ; T ] × R n ) is assumed to be endowed with the topology of W ; p ( [ 0 ; T ] × R n ) . Similarly, p [ 1 ; [ in C 1 ; κ ( R n ) and L p ( [ 0 ; T ] × R n ) .

2. Preliminaries

This work studies a finite-horizon optimal control problem with restrictions. In concrete, let ( Ω , F , F t : t 0 ) be a measurable space. Let there also be an F t -adapted stochastic differential system of the form
d x ( t ) = b ( x ( t ) , u ( t ) ) d t + σ ( x ( t ) ) d W ( t ) , x ( 0 ) = x , t 0 ,
where b : R n × U R n and σ : R n R n × d are the drift and diffusion coefficients, respectively; and W ( · ) is a d-dimensional standard Brownian motion. Here, the set U R m is a Borel set called the action (or control) set. Moreover, let u ( · ) be a U-valued stochastic process representing the controller’s action at each time t 0 .
Now, the profit that an agent can obtain from its activity in the system is measured with the performance index:
J T ( t , x , u , r ) : = E x u t T r s , x ( s ) , u ( s ) d s + r 1 ( T , x ( T ) ) ,
where r and r 1 are the running and terminal rewards, respectively; and the symbol E x u [ · ] stands for the conditional expectation of · given that x ( t ) = x , and the agent uses the sequence of controllers u.
The goal is to maximize (2) subject to a finite-horizon cost index of the operation:
J T ( t , x , u , c ) : = E x u t T c s , x ( s ) , u ( s ) d s + c 1 ( T , x ( T ) ) E x u t T θ s , x ( s ) d s + θ 1 ( T , x ( T ) ) ,
where c is a running-cost rate, c 1 is a terminal cost rate function; θ is a running constraint-rate function, and θ 1 is a terminal constraint-rate function. Observe that as the running reward-rate function r depends on the action of the controller; the running constraint-rate θ is independent of such variable.
The following is an assumption on the coefficients of the differential system (1).
Hypothesis (H1a).
The control set U is compact.
Hypothesis (H1b).
The drift coefficient b ( x , u ) is continuous on R n × U , and x b ( x , u ) satisfies a local Lipschitz condition on R n , uniformly on U; that is, for each R > 0 , there exists a constant K 1 ( R ) > 0 such that for all | x | , | y | R
sup u U | b ( x , u ) b ( y , u ) | K 1 ( R ) | x y | .
Hypothesis (H1c).
The diffusion coefficient σ satisfies a local Lipschitz condition on R n ; that is, for each R > 0 , there exists a constant K 2 ( R ) > 0 such that for all | x | , | y | R ; that is, there exists a positive constant K 2 such that for all x , y R n .
| σ ( x ) σ ( y ) | K 2 ( R ) | x y | .
Hypothesis (H1d).
The matrix a ( x ) : = σ ( x ) σ ( x ) satisfies auniform ellipticitycondition, i.e., for some constant K 3 > 0 ,
y a ( x ) y K 3 | y | 2 f o r   a l l x , y R n .
Remark 1.
The local Lipschitz conditions on the drift and diffusion coefficients referred to in Hypothesis 1b–1c, along with the compactness of the control set U, stated in Hypothesis 1a, yield that for each R > 0 , there exists a number K 4 ( R ) K 1 ( R ) + K 2 ( R ) such that
sup u U | b ( x , u ) | + | σ ( x ) | K 4 ( R ) ( 1 + | x | )
for all | x | R .
For u U , and h ( t , · ) W 2 , p ( R n ) for all t 0 , define:
L u h ( t , x ) : = h ( t , x ) , b x , u + 1 2 Tr [ H h ( t , x ) ] a ( x ) = i = 1 n b i x , u x i h ( t , x ) + 1 2 i , j = 1 n a i j ( x ) x i x j 2 h ( t , x ) ,
with a ( · ) as in Hypothesis 1d, and h , H representing the gradient and the Hessian matrix of h with respect to the state variable x, respectively.
The main application of this work is the pollution accumulation model. Although it will be possible to solve this problem within the realm of pure feedback strategies, this is not always the case. As a consequence, the set of actions needs to be widened.
Control Policies. Let M be the family of measurable functions f : [ 0 ; T ] × R n U . A strategy u ( t ) : = f ( t , x ( t ) ) , for some f M is called a Markov policy.
Definition 1.
Let ( U , B ( U ) ) be a measurable space, and P ( U ) be the family of probability measures supported on U. Arandomized policyis a family π : = ( π t : t 0 ) of stochastic kernels on B ( U ) × R n satisfying:
(a) 
for each t 0 and x R n , π t ( · | x ) P ( U ) such that π t ( U | x ) = 1 , and for each D B ( U ) , π t ( D | · ) is a Borel function on R n ; and
(b) 
for each D B ( U ) and x R n , the function π t ( D | x ) is a Borel-measurable in t 0 .
The set of randomized policies is denoted by Π.
Observe that every f M can be identified with a strategy in Π by means of the P ( U ) -valued trajectory δ f , where δ f represents the Dirac measure at f. When the controller operates policies π = ( π t : t 0 ) Π , both the drift coefficient b, and the operator L u defined in (1) and (4), respectively, are written as
b ( x , π t ) : = U b ( x , u ) π t ( d u | x ) , L π t ν ( t , x ) : = U L u ν ( t , x ) π t ( d u | x ) .
Under Hypothesis (H1a)–(H1d) and Remark 1, for each policy π Π there exists an almost surely unique strong solution x π ( · ) of (1), which is a Markov-Feller process. Furthermore, for each policy π = ( π t : t 0 ) Π , the operator t ν ( t , x ) + L π t ν ( t , x ) becomes the infinitesimal generator of the dynamics (1) (for more details, see the arguments in [20] (Theorem 2.2.7)). Moreover, by the same reasoning of Theorem 4.3 in [20], for each π Π , the associated probability measure P π ( t , x , · ) of x π ( · ) is absolutely continuous with respect to Lebesgue’s measure for every t 0 and x R n . Hence, there exists a transition density function p π ( t , x , y ) 0 such that
P π ( t , x , B ) = B p π ( t , x , y ) d y ,
for every Borel set B R n .
Topology of relaxed controls. The set Π is topologized as in [21]. Such a topology renders Π a compact metric space, and it is determined by the following convergence criterion (see [20,21,22]).
Definition 2
(Convergence criterion). It will be said that the sequence π m : m = 1 , 2 , . . . in Πconverges to π Π , and such convergence is denoted as π m W π , if and only if
R n 0 T U g ( t , x ) h ( t , x , u ) π t m ( d u | x ) d t d x R n 0 T g ( t , x ) U h ( t , x , u ) π t ( d u | x ) d t d x .
for all g L 1 ( [ 0 ; T ] × R n ) , and h C b ( [ 0 ; T ] × R n × U ) , i.e., in the set of continuous and bounded functions on [ 0 ; T ] × R n × U . Denoting h ( t , x , π t ) by U h ( t , x , u ) π t ( d u | x ) for each π = ( π t : t 0 ) Π , the convergence referred to in (5) reduces to
R n 0 T g ( t , x ) h ( t , x , π t m ) d t d x R n 0 T g ( t , x ) h ( t , x , π t ) d t d x .
Throughout this work, the convergence in Π is understood in the sense of the convergence criterion introduced in Definition 2.
The following Definition is this work’s version of the polynomial growth condition quoted in, for instance [18].
Definition 3.
Given a polynomial function of the form w ( x ) = 1 + | x | k (with k 2 ), and x R n , let the normed linear space B w ( [ 0 ; T ] × R n ) be that which consists of all real-valued measurable functions ν on [ 0 ; T ] × R n with finite w-norm given by
ν w : = sup ( t , x ) [ 0 ; T ] × R n | ν ( t , x ) | w ( x ) .
Remark 2.
(a) 
Observe that for any function ν B w ( [ 0 ; T ] × R n ) :
| ν ( t , x ) | ν w w ( x ) = ν w ( 1 + | x | k ) .
This last inequality implies that any function ν B w ( [ 0 ; T ] × R n ) satisfies the polynomial growth condition.
(b) 
Assuming that the initial data x ( s ) = x has finite absolute moments of every order (i.e., E | x ( s ) | k < for each k = 1 , 2 , )—see [23] [Theorem 4.2], gives that
E | x ( t ) | k C k ( 1 + E | x ( s ) | k ) , s t T ,
where the constant C k depends on k, T s , and the constant K 1 is as in Hypothesis (H1b).
(c) 
In the application developed throughout this paper, the constant initial data x ( s ) = x is considered. Then E | x ( t ) | k also has finite moments of every order (see Proposition 10.2.2 in [18]). Therefore, E | x ( t ) | k C k ( 1 + | x | k ) .
Now, hypotheses on the reward, cost and constraint rates from (2) and (3) are stated. These are very standard, and represent an extension of the ones used in classic works, such as p. 157 in [23] (Chapter VI.3) and p. 130 in [24] (Chapter 3).
Hypothesis (H2a).
The functions r , c : [ 0 ; T ] × R n × U R are continuous, and locally Lipschitz on R n , uniformly on U; that is, for each R > 0 , there exists a constant K 5 ( R ) > 0 such that for all | x | , | y | R
sup ( t , u ) [ 0 ; T ] × U | r ( t , x , u ) r ( t , y , u ) | + sup ( t , u ) [ 0 ; T ] × U | c ( t , x , u ) c ( t , y , u ) | K 5 ( R ) | x y | .
Hypothesis (H2b).
r ( · , · , u ) and c ( · , · , u ) are in B w ( [ 0 ; T ] × R n ) uniformly on U; in other words, there exists M > 0 such that for all ( t , x ) [ 0 ; T ] × R n ,
sup ( t , u ) [ 0 ; T ] × U | r ( t , x , u ) | + sup ( t , u ) [ 0 ; T ] × U | c ( t , x , u ) | M w ( x ) .
Hypothesis (H2c).
The terminal reward and cost rates r 1 ( · , · ) , c 1 ( · , · ) B w ( [ 0 ; T ] × R n ) ; andthe running and terminal constraint rates θ ( · , · ) , θ 1 ( · , · ) B w ( [ 0 ; T ] × R n ) are non-negative measurable functions which are locally Lipschitz on [ 0 ; T ] × R n , i.e., for each R > 0 , there exists a constant K ˜ 5 ( R ) > 0 such that for all | x | , | y | R ,
sup t 0 r 1 ( t , x ) r 1 ( t , y ) + c 1 ( t , x ) c 1 ( t , y ) + sup t 0 θ ( t , x ) θ ( t , y ) + θ 1 ( t , x ) θ 1 ( t , y ) K ˜ 5 ( R ) | x y | .
For π = ( π t : t 0 ) Π the reward and cost rates are written as
r ( t , x , π t ) : = U r ( t , x , u ) π t ( d u | x ) , c ( t , x , π t ) : = U c ( t , x , u ) π t ( d u | x ) .
To complete this section, the main application of this work is introduced. It consists of a pollution accumulation model. This application is inspired by the one presented in [17,18], and satisfies Hypotheses 1a–1d and 2a–2c.
Example 1.
Fix the probability space ( Ω , F , { F t : t 0 } , P ) , and let T > 0 be a given time horizon. Consider the pollution process defined by the controlled diffusion
d x ( s ) = [ u ( s ) η x ( s ) ] d s + σ d W ( s ) , x ( t ) = x > 0 ,
for s [ t ; T ] , where 0 u ( t ) γ < η 2 . Here u ( s ) represents the consumption flow at time t 0 , and γ is certain consumption restriction imposed by, for instance worldwide protocols. Additionally, the number η ] 0 ; 1 ] is the rate of pollution decay.
It is easy to see that the coefficients of (7) meet Hypothesis (H1a)–(H1c). A simple calculation yields that K 3 σ 2 c for any c ] 0 ; σ 2 [ .
Now, a simulation of the trajectories of the Itô’s diffusion (1) is presented. To this end, the extension of Euler’s method for solving first order differential equations known asEuler-Maruyama’s method(see, for instance [25] and Chapter 1 in [26]) is used. This technique is suitable for diffusions that meet Hypothesis (H1a)–(H1d). The focus is on the comparison between Vasicek’s model for interest rates in finance (see, for instance Chapter 5 in [27]):
d x ( s ) = [ μ η x ( s ) ] d s + σ d W ( s ) , x ( t ) = x > 0 ,
with s [ t ; T ] , and Kawaghuchi–Morimoto’s model (7).
Let z N : { 0 , 1 , . . . , N } × Ω R n , N N , be the Euler-Maruyama approximations for the stochastic differential Equation (1), recursively defined by z 0 N : = x and
z n + 1 N : = z n N + b ( z n N , u n ) T N + σ ( z n N ) W ( n + 1 ) T N W n T N
for all n { 0 , 1 , . . . , N } , with N N .
In Figure 1 and Figure 2, observe that Kawaguchi-Morimoto’s process allows one to choose a deterministic (implicit) function of t, whereas Vasicek’s series features what is known in the literature asmean reversion. The latter fact is clear from the choice of a constant parameter μ.
Let h W 1 , 2 ; p ( [ 0 ; T ] × R ) . After (4), the infinitesimal generator of (7) is given by
h t ( t , x ) + L u h ( t , x ) = h t ( t , x ) + ( u η x ) h x ( t , x ) + 1 2 σ 2 h x x ( t , x ) .
The polynomial function w ( x ) = x 2 + x + 1 satisfies Definition 3. Please note that this function does not depend on the time argument t.
The reward-rate function used in further developments represents the social welfare, is given by r : [ 0 ; T ] × R × U R , and is defined as:
r ( t , x , u ) : = F ( u ) a · x ,
where F C 2 ( R ) stands for the social utility of the consumption u, and a · x stands for the social disutility (so to speak) of the pollution stock x, for a > 0 fixed. It is assumed that
F 0 , F 0 , F ( ) = F ( 0 ) = 0 , F ( 0 + ) = F ( ) = .
The cost rate function will be given by
c ( t , x , u ) : = c 1 x + c 2 u f o r   a l l ( t , x , u ) [ 0 ; T ] × R × U ,
with c 1 > 0 , and c 2 R satisfying
c 1 + η c 2 > 0 .
Since the pollution stock x depends on the time variable t 0 , the functions defined in (9) and (11) also depend on this variable.
The running constraint-rate function has the form
θ ( t , x ) : = c 1 x η + q , f o r   a l l ( t , x ) [ 0 ; T ] × R ,
where q is a positive constant. (Here, as with the reward and cost functions, it is assumed that x implicitly depends on t.) Theterminalconstraint, cost and reward rates will be fixed at a level of zero. It is not difficult to see that if F meets Hypothesis (H2a)–(H2c), then so do the social welfare, the cost rate and the running constraint functions.

3. A Finite-Horizon Control Problem with Constraints

This section is devoted to the introduction of the study of the finite-horizon problem with constraints.
Definition 4.
For each π Π and T t , thetotal expected reward, cost and constraint rates over the time interval [ t ; T ] given that x ( t ) = x are, respectively,
J T ( t , x , π , r ) : = E x π t T r s , x ( s ) , π s d s + r 1 ( T , x ( T ) ) , J T ( t , x , π , c ) : = E x π t T c s , x ( s ) , π s d s + c 1 ( T , x ( T ) ) , θ ¯ T ( t , x , π ) : = E x π t T θ s , x ( s ) d s + θ 1 ( T , x ( T ) ) ,
with r s , x ( s ) , π s and c s , x ( s ) , π s as in (6).
The proof of the next result is an extension of [28] [Proposition 3.6].
Lemma 1.
Hypothesis (H2a)–(H2c) imply that the total expected reward J T ( t , x , π , r ) , the total expected cost J T ( t , x , π , c ) , and the constraint rate θ ¯ T ( t , x , π ) belong to the space B w ( [ 0 ; T ] × R n ) . In fact, for every ( t , x ) [ 0 ; T ] × R n ,
sup π Π , t [ 0 ; T ] J T ( t , x , π , r ) M 2 ( T , t ) w ( x ) ,
sup π Π , t [ 0 ; T ] J T ( t , x , π , c ) M 2 ( T , t ) w ( x ) ,
sup π Π , t [ 0 ; T ] θ ¯ T ( t , x , π ) M 2 ( T , t ) w ( x ) ,
where M 2 ( T , t ) : = M C k ( T t ) + ( T t ) + C k .
Proof Proof of Lemma 1.
The proof is presented only for J T ( t , x , π , r ) , for the line of reasoning is the same for J T ( t , x , π , c ) and θ ¯ T ( t , x , π ) . By Hypothesis (H2b), it is known that for every ( t , x ) [ 0 ; T ] × R n ,
J T ( t , x , π , r ) = E x π t T r ( s , x ( s ) , π s ) d s + r 1 ( T , x ( T ) ) M t T E x π w ( x π s ( s ) ) d s + w ( x ( T ) ) .
Now, Remark 2(b)–(c) gives that
J T ( t , x , π , r ) M C k ( | x | k + 1 ) ( T t ) + ( T t ) + C k ( | x | k + 1 ) .
Letting M 2 ( T , t ) : = M C k ( T t ) + ( T t ) + C k yields the result. □
For each T > 0 , and x R n , assume that the (running and terminal) constraint functions θ ( · , · ) and θ 1 ( · , · ) are given, and that they satisfy Hypothesis 2c. In this way, let
F θ T t , x : = { π Π : J T ( t , x , π , c ) θ ¯ T ( t , x , π ) } .
To avoid trivial situations, it is assumed that this set is not empty (see Remark 3.8 in [14]). To formally introduce what is meant when talking about the maximization of (2) subject to (3), the finite-horizon problem with constraints is defined.
Definition 5.
A policy π * Π is said to be optimal for thefinite-horizon problem with constraints(FHPC) with initial state x R n if π * F θ T t , x and, in addition,
J T t , x , π * , r = sup π F θ T t , x J T t , x , π , r .
In this case, J T * ( t , x , r ) : = J T ( t , x , π * , r ) is called theT-optimal rewardfor the FHPC.
Example 2
(Example 1 continued). One intends to find a strategy π * Π that maximizes the total expected reward
J T t , x , π , r = E x π t T ( F ( π s ) a x ( s ) ) d s
subject to
J T t , x , π , c = E x π t T ( c 1 x ( s ) + c 2 π s ) d s E x π t T c 1 x ( s ) η + q d s = : θ ¯ T t , x , π .
That is, find π * Π such that J T t , x , π * , r : = sup π F θ T t , x J T t , x , π , r .

3.1. Lagrange Multipliers

To solve the FHPC, the Lagrange multipliers approach and the dynamic programming technique are used to transform the original FHPC into an unconstrained finite-horizon problem, parametrized by the so-named Lagrange multipliers. To do this, take λ 0 and consider the new (running and terminal) reward rates
r λ ( t , x , u ) : = r ( t , x , u ) + λ c ( t , x , u ) θ ( t , x ) , r 1 λ ( x ( T ) ) : = r 1 ( T , x ( T ) ) + λ c 1 ( T , x ( T ) ) θ 1 ( x ( T ) ) .
Using the same notation from (6), write
r λ ( t , x , π t ) : = r ( t , x , π t ) + λ ( c ( t , x , π t ) θ ( t , x ) ) , π = ( π t : t 0 ) Π .
Observe also that for each λ < 0 , r λ ( · , · , π t ) B w ( [ 0 ; T ] × R n ) uniformly in Π , and r 1 λ w . Indeed,
| r λ ( t , x , π t ) | | r ( t , x , π t ) | + | λ | | c ( t , x , π t ) | + | λ | | θ ( t , x ) | M w ( x ) + M | λ | w ( x ) + | λ | | θ ( t , x ) | M + M | λ | + | λ | · θ w w ( x ) = N λ w ( x ) , | r 1 λ ( x ( T ) ) | ( M + M | λ | + | λ | θ 1 w ) w ( x ) = N 1 λ w ( x ) ,
where N λ : = M + M | λ | + | λ | · θ w , N 1 λ : = M + M | λ | + | λ | · θ 1 w , and M as in Hypothesis (H2b).
It is natural to let, for all ( t , x ) [ 0 ; T ] × R n ,
J T ( t , x , π , r λ ) : = E x π t T r λ ( s , x ( s ) , π s ) d s + r 1 λ ( T , x ( T ) ) .
Notice that
J T ( t , x , π , r λ ) = J T ( t , x , π , r ) + λ J T ( t , x , π , c ) θ ¯ T ( t , x , π ) .
Example 3
(Examples 1 and 2 continued). The performance index for the FHUP is given by
J T ( t , x , π , r λ ) = E x π t T F ( π s ) a x ( s ) + λ c 1 x ( s ) + c 2 π s c 1 x ( s ) η q d s .
Return now to Example 1, where a single trajectory of the processes (7) and (8) for certain parameters were simulated, and the policy u ( t ) = x ( t ) , for (7); and u ( t ) = μ , for (8). One’s aim is to compute (20) for a fixed value of λ < 0 , when the utility function derived from the consumption is given by F ( u ) = u , by means of Monte Carlo simulation. To this end, the following pseudocodes are presented.
Walkthrough of Algorithm 1.This pseudocode’s goal is to compute the integral inside (20).
  • Line 1 initializes the process.
  • Line 2 emphasizes the fact that λ < 0 is supposed to be given.
  • In lines 3–11, the algorithm decides if it will work with (7), or with (8).
  • Line 12 sets F = u and D = a · x , and computes initial values for r, c and θ according to (9), (11) and (13), respectively.
  • Line 13 computes the integrand in (20) for the initial step.
  • The while loop in lines 15–30 does the following:
    -
    For each step, lines 16–24 decide between (7) and (8).
    -
    Lines 25–26 implement Euler–Matuyama’s method.
    -
    Line 27 updates the values of F, D, r, c and θ,
    -
    Line 28 updates the value of the integrand.
  • Line 31 computes the integral in (20).
Walkthrough of Algorithm 2.The purpose of this pseudocode is to compute a 95%-confidence interval for the expectation of the result of Algorithm 1 according to Monte Carlo’s method.
  • Line 1 calls Algorithm 1 N times.
  • Line 2 computes an average of the iterations just performed.
  • Line 3 computes the sample mean of the iterations.
  • The Algorithm uses the results of lines 2–3 to return the desired interval.
Algorithm 1: Integral algorithm
Mathematics 09 01466 i001
Algorithm 1 receives the initial value x 0 , the step size d t , the time horizon T, and the parameters of the diffusion (7) (resp. (8)) to calculate the (Itô) integral inside the expectation operator in (20) when the process (7) (resp. (8)) is used; then, Algorithm 2 iterates this process and returns the average of such iteration, thus approximating the value of (20). These algorithms require a negative and constant value of the Lagrange multiplier. Later, in Example 5, a modification of Algorithm 1 that solves this situation will be proposed. For the sake of illustration, take the parameter values from Example 1 (that is x 0 = 5 , η = 1 , σ ( x ) 0.5 , μ = 5 , T = 1 , and N = 100 ), and use Algorithms 1 and 2 to compute an approximation to the value of (20) when one considers the diffusion (8) (that is, the diffusion (7) with u ( t ) μ ) for all t 0 ). Additionally, take
γ = 0.4 , c 1 = 0.1 , c 2 = 0.05 , q = 0.0195 , a n d a = 1.25 .
Algorithm 2: 95%-confidence interval for the expectation of an Itô’s integral using Monte Carlo’s method.
Mathematics 09 01466 i003
In this case, an arbitrary value of λ 0 = 40 is used. Taking 10,000 simulations, these values yield averages around
J T ( 0 , 5 , u , r λ 0 ) 6.6549064   a n d J T ( 0 , 5 , μ , r λ 0 ) 13.235737 ,
for (7), and (8), respectively.
Let τ be any stopping time valued in [ t ; T ] , and φ W 1 , 2 ; p ( [ 0 ; T ] × R n ) B w ( [ 0 ; T ] × R n ) . Should p > n , an application of Itô’s Lemma to φ ( T τ , x ( T τ ) ) yields the following result.
Proposition 1.
Suppose that Hypotheses 1 and 2 are met. Fix π Π and λ 0 ; assume that there is a function φ W 1 , 2 ; p ( [ 0 ; T ] × R n ) B w ( [ 0 ; T ] × R n ) satisfying:
r λ ( t , x , π t ) + t φ ( t , x ) + L π t φ ( t , x ) = 0 , f o r a l l x [ 0 ; T ] × R n ,
with boundary condition φ ( T , x ( T ) ) = r 1 λ ( x ( T ) ) . Then
φ ( t , x ) = J T ( t , x , π t , r λ ) .
Moreover, if the equality in (21) is replaced by “≤” or “≥”, then (22) holds with the corresponding inequality.
Notice that Proposition 1 does not assert the existence of a function that satisfies (21) (this is the purpose of Proposition 2 below). It rather motivates the definition of the finite-horizon unconstrained problem.
Definition 6.
A policy π * Π for which
J T ( t , x , π * , r λ ) = sup π Π J T ( t , x , π , r λ ) = : J T * ( t , x , r λ ) f o r a l l ( t , x ) [ 0 ; T ] × R n ,
is called finite-horizon optimalfor thefinite-horizon unconstrained problem(FHUP), and J T * ( · , · , r λ ) is referred to as thefinite-horizon optimal rewardfor the FHUP.
The first part of the following result is an extension of Proposition 1 and the verification result Theorem 3.5.2(i) in [29] to the realm of Sobolev spaces. The proof of the second part mimics that of Theorem 3.5.2(ii) in [29].
Proposition 2.
Suppose that Hypotheses 1 and 2 are met. Then:
(i) 
For each fixed λ 0 and all t [ 0 ; T ] , the finite-horizon optimal reward J T * ( · , · , · , r λ ) defined in (23) belongs to W 1 , 2 ; p ( [ 0 ; T ] × R n ) B w ( [ 0 ; T ] × R n ) , and verifies the total reward Hamilton-Jacobi-Bellman (HJB) equation; that is,
0 = sup π Π r λ ( t , x , π ) + t J T * ( t , x , r λ ) + L π J T * t , x , r λ f o r a l l ( t , x ) [ 0 ; T ] × R n .
with boundary condition J T * ( T , x ( T ) , r λ ) = r 1 λ ( x ( T ) ) . Conversely, if some function φ W 1 , 2 ; p ( [ 0 ; T ] × R n ) B w ( [ 0 ; T ] × R n ) verifies (24) with boundary condition φ ( T , x ( T ) ) = r 1 λ ( x ( T ) ) , then φ ( t , x ) = J T * ( t , x , r λ ) for all ( t , x ) [ 0 ; T ] × R n .
(ii) 
If there exists a Markovian policy f * M (depending on λ) that maximizes the right-hand-side of (24), i.e.,
0 = r λ ( t , x , f * ) + t J T * ( t , x , r λ ) + L f * J T * ( t , x , r λ ) , f o r a l l ( t , x ) [ 0 ; T ] × R n ;
and this policy is such that the boundary condition J T * ( T , x ( T ) , r λ ) = r 1 λ ( T , x ( T ) ) is met as well, then this policy is a finite-horizon optimal policy for the FHUP.
Use the former result to introduce the HJB equation for the FHUP for the examples presented along the paper.
Example 4
(Examples 1–3 continued). The HJB equation for the FHUP is given by:
h t ( t , x ) + sup y [ 0 ; γ ] F y a x + λ c 1 x + c 2 y c 1 x η q + L y h ( t , x ) = 0 , f o r   t < T ; h ( T , x ) = 0 ,
where h C 1 , 2 ( [ 0 ; T ] × R ) ; and
L y h ( t , x ) = ( y η x ) h x t , x + 1 2 σ 2 h x x ( t , x ) .
According to Proposition 2, a solution of the HJB equation (25) yields the finite-horizon optimal reward J T * ( t , x , r λ ) and the optimal policy π * for the FHUP over the interval [ t ; T ] .
Now use Definition 6 and Propositions 1 and 2 to set expressions for the optimal performance index, policies, and constraint rates from the examples presented along this work.
Lemma 2
(Examples 1–4 continued). Let Λ and I be the Lebesgue’s measure and the indicator function, respectively. Consider the planning horizon [ t ; T ] and assume the conditions in (7), (9)–(13) hold. Then,
(i) 
For every x > 0 and λ 0 , the value function J T * ( t , x , r λ ) in (23), becomes
J T * ( t , x , r λ ) = m 1 1 e η ( T t ) x + m 2 ( t ) ,
where
m 1 : = a η + λ c 1 η 2 λ c 1 η ,
m 2 ( t ) : = λ q ( T t ) + F ( γ ) + λ γ c 2 Λ { y [ t ; T ] : F ( γ ) a λ ( y ) } + y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) + λ c 2 I ( a λ ( y ) ) + k 1 1 e η ( T y ) I ( a λ ( y ) ) d y + m 1 γ y [ t ; T ] : F ( γ ) a λ ( y ) 1 e η ( T y ) d y ,
and a λ ( t ) : = λ c 2 m 1 1 e η ( T t ) , and I ( · ) is the inverse of F ( · ) . Moreover, this policy turns out to be optimal for the FHUP; i.e., it is such that (24) holds.
(ii) 
Define
f λ ( t ) : = I ( a λ ( t ) ) if F ( γ ) < a λ ( t ) , γ if F ( γ ) a λ ( t ) .
For every x > 0 and λ 0 , the total expected reward, cost and constraint, respectively J T ( t , x , f λ ( t ) , r ) , J T ( t , x , f λ ( t ) , c ) , and θ ¯ T ( t , x , f λ ( t ) ) ; defined in Example 2, take the form
J T ( t , x , f λ ( t ) , r ) = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) a I ( a λ ( y ) ) η a η x I ( a λ ( y ) ) η e η ( t y ) d y + F ( γ ) a γ η Λ y [ t ; T ] : F ( γ ) a λ ( y ) a η x γ η 2 e η [ T t ] 1 I { t : F ( γ ) a λ ( t ) } ,
J T ( t , x , f λ ( t ) , c ) = y [ t ; T ] : F ( γ ) < a λ ( y ) c 1 I ( a λ ( y ) ) η + η x I ( a λ ( y ) ) η e η ( t y ) + c 2 I ( a λ ( y ) ) d y + c 1 γ η + c 2 γ Λ y [ t ; T ] : F ( γ ) a λ ( y ) c 1 η x γ η 2 e η [ T t ] 1 I { t : F ( γ ) a λ ( t ) } ,
θ ¯ T ( t , x , f λ ( t ) ) = y [ t ; T ] : F ( γ ) < a λ ( y ) c 1 I ( a λ ( y ) ) η 2 + c 1 η x I ( a λ ( y ) ) η 2 e η ( t y ) d y + c 1 γ η 2 Λ y [ t ; T ] : F ( γ ) a λ ( y ) c 1 η x γ η 3 e η [ T t ] 1 I { t : F ( γ ) a λ ( t ) } + q ( T t ) .
Proof Proof of Lemma 2.
(i)
Start by making an informed guess of the solution of (25). Namely
h ( t , x ) : = p ( t ) x + m 2 ( t ) .
Observe that h t ( t , x ) = p ( t ) x m 2 ( t ) , h x ( t , x ) = p ( t ) , and h x x ( t , x ) = 0 . The substitution of these expressions in (25) yields
x a + λ c 1 λ c 1 η η p ( t ) + p ( t ) + sup 0 u γ F u + λ c 2 u + u p ( t ) λ q m 2 ( t ) = 0 ,
This means that
a + λ c 1 λ c 1 η η p ( t ) + p ( t ) = 0 ,
sup 0 u γ F ( u ) + λ c 2 u + u p ( t ) λ q m 2 ( t ) = 0 ,
Impose the terminal condition p ( T ) = 0 to (34) to obtain
p ( t ) = m 1 1 e η ( T t ) ,
where k 1 is as in (27). Now, from (35), write
m 2 ( t ) = λ q + sup 0 u γ F ( u ) + λ c 2 u + u p ( t ) = λ q + sup 0 u γ F ( u ) + λ c 2 u + u m 1 1 e η ( T t ) .
To find the supremum of the expression inside the braces, use a standard calculus argument to see that at a critical point u:
F ( u ) + λ c 2 + m 1 1 e η ( T t ) = 0 .
Next, since by (10), F ( u ) 0 , it turns out that
q λ c 2 + m 1 1 e η ( T t ) 0 .
Then, from (37):
F ( u ) = λ c 2 m 1 1 e η ( T t ) = : a λ ( t ) 0 ,
and
f λ ( t ) : = I ( a λ ( t ) ) if F ( γ ) < a λ ( t ) , γ if F ( γ ) a λ ( t ) .
With this in mind, (36) turns into
m 2 ( t ) = λ q + F ( f λ ( t ) ) + λ c 2 f λ ( t ) + m 1 1 e η ( T t ) f λ ( t ) .
Finally, m 2 ( t ) = λ q ( T t ) + t T F ( f λ ( y ) ) + λ c 2 f λ ( y ) + m 1 1 e η ( T y ) f λ ( y ) d y , which equals
λ q ( T t ) + y [ t ; T ] : F ( γ ) < a λ ( y ) F ( f λ ( y ) ) + λ c 2 f λ ( y ) + m 1 1 e η ( T y ) f λ ( y ) d y + y [ t ; T ] : F ( γ ) a λ ( y ) F ( f λ ( y ) ) + λ c 2 f λ ( y ) + m 1 1 e η ( T y ) f λ ( y ) d y = λ q ( T t ) + y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) + λ c 2 I ( a λ ( y ) ) + m 1 1 e η ( T y ) I ( a λ ( y ) ) d y + y [ t ; T ] : F ( γ ) a λ ( y ) F ( γ ) + λ c 2 γ + m 1 1 e η ( T y ) γ d y = λ q ( T t ) + y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) + λ c 2 I ( a λ ( y ) ) + m 1 1 e η ( T y ) I ( a λ ( y ) ) d y + F ( γ ) + λ γ c 2 Λ { y [ t ; T ] : F ( γ ) a λ ( y ) } + m 1 γ y [ t ; T ] : F ( γ ) a λ ( y ) 1 e η ( T y ) d y ,
where Λ ( · ) stands for Lebesgue’s measure. Therefore, from (33), obtain
h ( t , x ) : = p ( t ) x + m 2 ( t ) = J T * ( t , x , r λ ) = J T ( t , x , f λ ( t ) , r λ ) .
This proves (26)–(28). The optimality of (29) for the FHUP (20) follows from Proposition 2(ii).
(ii)
To see that (30) holds, use (17) to write
J T ( t , x , f λ ( t ) , r ) = E x f λ t T ( F ( f λ ( y ) ) a x ( y ) ) d y = t T ( F ( f λ ( y ) ) a E x f λ [ x ( y ) ] ) d y .
Here, the interchange of integrals is possible due to the finiteness of the interval [ t ; T ] , and Fubini’s rule. Now, since the solution of the controlled diffusion process (7) is given by
x ( t ) = e η ( t t 0 ) x + f λ η e η ( t t 0 ) 1 + σ t 0 T e η ( s t 0 ) d W ( s ) ,
where x ( t 0 ) = x and its expected value is
E x f λ [ x ( t ) ] = f λ η + η x f λ η e η ( t t 0 ) .
Now, by (29) observe that the former equals:
J T ( t , x , f λ ( t ) , r ) = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( f λ ( y ) ) d y + y [ t ; T ] : F ( γ ) a λ ( y ) F ( f λ ( y ) ) d y a E x f λ t T x ( y ) d y = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) d y + y [ t ; T ] : F ( γ ) a λ ( y ) F ( γ ) d y a E x f λ t T x ( y ) d y = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) d y + F ( γ ) Λ y [ t ; T ] : F ( γ ) a λ ( y ) a E x f λ t T x ( y ) d y = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) d y + F ( γ ) Λ y [ t ; T ] : F ( γ ) a λ ( y ) a t T f λ ( y ) η + η x f λ ( y ) η e η ( y t ) d y = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a λ ( y ) ) ) a I ( a λ ( y ) ) η a η x I ( a λ ( y ) ) η e η ( y t ) d y + F ( γ ) a γ η Λ y [ t ; T ] : F ( γ ) a λ ( y ) + a η x γ η 2 e η [ T t ] 1 I { t : F ( γ ) a λ ( t ) } .
To prove (31), use the two leftmost members in (18), and proceed as above to put:
J T ( t , x , f λ ( t ) , c ) = t T ( c 1 E x f λ [ x ( s ) ] + c 2 E x f λ [ f λ ( s ) ] d s = c 1 t T f λ ( y ) η + η x f λ ( y ) η e η ( y t ) d y + c 2 E x f λ t T f λ ( y ) d y = y [ t ; T ] : F ( γ ) < a λ ( y ) c 1 I ( a λ ( y ) ) η + η x I ( a λ ( y ) ) η e η ( y t ) + c 2 I ( a λ ( y ) ) d y + c 1 γ η + c 2 γ Λ y [ t ; T ] : F ( γ ) a λ ( y ) c 1 η x γ η 2 e η [ T t ] 1 I { t : F ( γ ) a λ ( t ) } .
Finally, by the two rightmost members of (18), write
θ ¯ T ( t , x , f λ ( t ) ) = E x f λ t T c 1 η x ( s ) d s + q ( T t ) = t T c 1 η E x f λ [ x ( s ) ] d s + q ( T t ) = t T c 1 η f λ ( y ) η + η x f λ ( y ) η e η ( y t ) d s + q ( T t ) = y [ t ; T ] : F ( γ ) < a λ ( y ) c 1 I ( a λ ( y ) ) η 2 + c 1 η x I ( a λ ( y ) ) η 2 e η ( y t ) d y + c 1 γ η 2 Λ y [ t ; T ] : F ( γ ) a λ ( y ) c 1 η x γ η 3 e η [ T t ] 1 I { t : F ( γ ) a λ ( t ) } + q ( T t ) .
This proves (32).
The proof is now complete. □
Remark 3.
The equality
h ( t , x ) = J T * ( t , x , r λ ) = J T ( t , x , f λ ( t ) , r ) + λ J T t , x , f λ ( t ) , c θ ¯ T t , x , f λ ( t )
follows from (30)–(32).

3.2. From an Unconstrained Problem, to a Problem with Restrictions

This section starts with an important observation on the set of strategies which will be used.
Remark 4.
For each λ 0 , define Π λ : = { π = ( π t : t 0 ) Π : 0 = r λ ( t , x , π t ) + t J T * ( t , x , r λ ) + L π t J T * ( t , x , r λ ) for all ( t , x ) [ 0 ; T ] × R n ; and J T * ( T , x ( T ) , r λ ) = r 1 λ ( x ( T ) ) } .
Since M can be thought of as a subset of Π, Proposition 2(ii) ensures that the set Π λ is nonempty.
Lemma 3.
Let ( λ m ) be a sequence in ] ; 0 ] converging to some λ * 0 , and assume that there exists a sequence π λ m Π λ m for each m 1 that converges to a policy π Π . Then π Π λ * ; that is, π satisfies
0 = r λ * ( x , π t ) + t J T * ( t , c , r λ * ) + L π t J T * ( t , x , r λ * ) f o r a l l ( t , x ) [ 0 ; T ] × R n .
Proof 
(Proof of Lemma 3.). Recall Definition 2. Take an arbitrary sequence π m Π λ such that π m W π . Observe that Proposition 2 ensures that for each m 1 , J T ( t , x , r λ m ) satisfies:
0 = r λ m ( t , x , π t m ) + t J T * ( t , x , r λ m ) + L π t m J T * ( t , x , r λ m ) f o r a l l ( t , x ) [ 0 ; T ] × R n .
In terms of the operator L ^ λ m π t m , defined in (A4), the former relation reduces to
0 = L ^ λ m π t m J T * ( t , x , r λ * ) f o r a l l ( t , x ) [ 0 ; T ] × R n ,
for the special case v 1 r , v 3 c , ρ ( t , x , u ) θ ( t , x ) , π t m π t λ m , h m ( t , x ) J T * ( t , x , r λ m ) , and λ m constant. A verification that the hypotheses of Appendix A follows. Specifically, part (a) trivially follows from (39). Then, the focus will be on checking that part (b) of Theorem A1 is met. To do that, for some R > 0 , take the ball B R : = { x R n : | x | < R } . By [30] [Theorem 9.11], there exists a constant C 0 (depending on R) such that for a fixed p > n :
J T * ( · , · , r λ m ) W 1 , 2 ; p ( [ 0 ; T ] × B R ) C 0 J T * ( · , · , r λ m ) L p ( [ 0 ; T ] × B 2 R ) + r λ m ( · , · , π · m ) L p ( [ 0 ; T ] × B 2 R ) C 0 M 2 ( T , t ) w L p ( [ 0 ; T ] × B 2 R ) + M w L p ( [ 0 ; T ] × B 2 R ) C 0 M 2 ( T , t ) + M T | B ¯ 2 R | 1 / p max x B ¯ 2 R w ( x ) < ,
where | B ¯ 2 R | represents the volume of the closed ball with radius 2 R ; M and M 2 ( x , T , t ) are the constants in Hypothesis (H2b), and in (14), respectively.
Notice that conditions (c) to (f) from Theorem A1 trivially hold, and that condition (g) is given as a part of the hypotheses just presented. Then, one can claim the existence of a function h λ * W 1 , 2 ; p ( [ 0 ; T ] × B R ) together with a subsequence m k such that J T * ( · , · , r λ m k ) = J T * ( · , · , π · m k , r λ m k ) h λ * ( · , · ) uniformly in [ 0 ; T ] × B R and pointwise on [ 0 ; T ] × R n as k and π m W π . Furthermore, h λ * satisfies:
0 = r λ * ( t , x , π t ) + L π t h λ * ( t , x ) , f o r ( t , x ) [ 0 ; T ] × B R .
Since the radius R > 0 was arbitrary, one can extend the analysis to all of x R n . Thus, Proposition 1 asserts that h λ * ( t , x ) coincides with J T * ( t , x , r λ * ) . This proves the result. □
Lemma 3 gives, in particular, the continuity of the mapping π t J T ( t , x , π t , r λ ) .
Lemma 4.
Assume the hypotheses of Proposition 1. Then:
(a) 
For each fixed ( t , x ) [ 0 ; T ] × R n , λ 0 , and η R under which λ + η 0 :
η J T ( t , x , π t λ , c ) θ ¯ T ( t , x , π t λ ) J T * ( t , x , r λ + η ) J T * ( t , x , r λ ) η J T ( t , x , π t λ + η , c ) θ ¯ T ( t , x , π t λ + η ) .
(b) 
The mapping λ J T * ( t , x , r λ ) is differentiable on ] ; 0 [ , for any ( t , x ) [ 0 ; T ] × R n ; in fact, for each λ < 0 ,
J T * ( t , x , r λ ) λ = J T ( t , x , π t λ , c ) θ ¯ T ( t , x , π t λ ) .
Proof Proof of Lemma 4.
(a)
Observe that from (19), (23), and the definition of r λ + η , one can assert that
J T * ( t , x , r λ + η ) J T ( t , x , π t λ , r λ + η ) = J T ( t , x , π t λ , r ) + λ + η J T ( t , x , π t λ , c ) θ ¯ T ( t , x , π t λ ) .
On the other hand, Proposition 2(ii) and the definition of π λ Π λ yield the equality
J T * ( t , x , r λ ) = J T ( t , x , π t λ , r λ ) = J T ( t , x , π t λ , r ) + λ J T ( t , x , π t λ , c ) θ ¯ T ( t , x , π t λ ) .
Subtracting (43) from (42) yields
J T * ( t , x , r λ + η ) J T * ( t , x , r λ ) η J T ( t , x , π t λ , c ) θ ¯ T ( t , x , π t λ ) .
Applying analogous arguments to those given in the above procedure, but taking J T * ( t , x , r λ ) and the policy π λ + η , it is possible to obtain
J T * ( t , x , r λ + η ) J T * ( t , x , r λ ) η J T ( t , x , π t λ + η , c ) θ ¯ T ( t , x , π t λ + η ) .
Hence (a) follows by combining (44) and (45).
(b)
By (15) and (16):
| J T ( t , x , π t λ + η , c ) θ ¯ T ( t , x , π t λ + η ) | M 2 w ( x ) .
Therefore, the continuity of λ J T * ( · , · , r α λ ) follows by letting η 0 in all of the terms of (40). Now let ( t , x ) [ 0 , [ × R n and λ < 0 be fixed, and consider a sequence of negative numbers η m such that η m 0 together with its associated sequence of policies π λ + η m , where π λ + η m Π λ + η m for each m. From the compactness of the metric space Π , there exists a subsequence π λ + η m k and π Π such that π λ + η m k W π as k . From Lemma 3, π belongs to Π λ , so, denote it by π λ : = π . By Lemma 3, the mapping π t J T ( t , x , π t , v ) is also continuous on Π , with v ( t , x , u ) = c ( t , x , u ) θ ( t , x ) . Please note that J T ( t , x , π t , v ) = J T ( t , x , π t , c ) θ ¯ T ( t , x , π t ) , which gives
J T t , x , π t λ + η m k , c θ ¯ T t , x , π t λ + η m k J T t , x , π t λ , c θ ¯ T t , x , π t λ , f o r ( t , x ) [ 0 ; [ × R n a s k .
Therefore, from part (a) of this result, it turns out that the limit
lim k J T * ( t , x , r λ + η m k ) J T * ( t , x , r λ ) η m k = J T ( t , x , π t λ , c ) θ ¯ T ( t , x , π t λ ) ,
for ( t , x ) [ 0 ; [ × R n . Similarly, if one considers a sequence of positive real numbers η m such that λ + η m 0 , it is possible to prove that there exists a subsequence λ + η m k such that (46) holds. This proves that λ J T * ( t , x , r λ ) is differentiable on ] ; 0 ] , with derivative given by (41).
The following is the main result of this section. It shows how to compute optimal policies for the FHPC.
Theorem 1.
Let Hypotheses 1 and 2 hold, and consider a point ( t , x ) [ 0 ; T ] × R n fixed. Then:
(a) 
If λ t , x * < 0 is a critical point of J T * ( t , x , r λ ) ; that is, if the derivative in (41) equals zero at λ = λ t , x * , then every π λ * = π t λ t , x * : t 0 Π λ * is optimal for the FHPC, and
J T ( t , x , π λ t , x * , c ) = θ ¯ T ( t , x , π λ t , x * ) .
Moreover, J T * ( t , x , r λ t , x * ) is the optimal value for the FHPC which in turn coincides with J T ( t , x , π t λ t , x * , r ) . In addition,
J T * ( t , x , r λ t , x * ) = inf λ < 0 J T * ( t , x , r λ ) .
(b) 
Case λ t , x * = 0 : If π 0 = π t 0 : t 0 Π 0 satisfies J T ( t , x , π t 0 , c ) θ ¯ T ( t , x , π t 0 ) ; i.e., π 0 F θ T t , x , then this policy is optimal for the FHPC. Moreover, J T * ( t , x , r 0 ) = J T * ( t , x , r ) becomes the optimal value for the FHPC and it coincides with J T ( t , x , π 0 , r ) . Furthermore,
J T * ( t , x , r 0 ) = min λ 0 J T * ( t , x , r λ ) .
Proof Proof of Theorem 1.
(a)
Since λ t , x * < 0 is a critical point of J T * ( t , x , r λ ) , the relation (41) yields:
J T * ( t , x , r λ ) λ | λ = λ t , x * = J T t , x , π t λ t , x * , c θ ¯ T t , x , π t λ t , x * = 0 f o r e v e r y π t λ * Π λ * .
Thus, using (19) and (49), it can be said that:
J T ( t , x , π λ t , x * , r λ t , x * ) = J T ( t , x , π t λ t , x * , r ) + λ t , x * J T ( t , x , π t λ t , x * , c ) θ ¯ T ( t , x , π t λ t , x * ) = J T ( t , x , π t λ t , x * , r ) .
Moreover, given that π λ * is in Π λ * , Proposition 2(ii) and Remark 4 yield
J T * ( t , x , r λ t , x * ) : = sup π Π J T ( t , x , π t , r λ t , x * ) = J T ( t , x , π t λ t , x * , r λ t , x * ) .
On the other hand, observe that for all π F θ T t , x , J T ( t , x , π t , c ) θ ¯ T ( t , x , π t ) 0 , implying that λ t , x * [ J T ( t , x , π t , c ) θ ¯ ( t , x , π t ) ] 0 . This last inequality, together with (19), (23), (50) and (51), leads to
J T ( t , x , π t λ t , x * , r ) = J T ( t , x , π t λ t , x * , r λ t , x * ) = J T * ( t , x , r λ t , x * ) J T ( t , x , π t , r λ t , x * ) = J T t , x , π t , r + λ t , x * J T ( t , x , π t , c ) θ ¯ T ( t , x , π t ) J T t , x , π t , r f o r a l l π F θ T t , x .
Therefore,
J T ( t , x , π t λ t , x * , r ) sup π F θ T t , x J T t , x , π t , r .
Finally, by (49):
J T ( t , x , π t λ t , x * , c ) = θ ¯ T ( t , x , π t λ t , x * ) ,
yielding that π λ * is in F θ T t , x . This fact, along with (52) and (53), gives that
J T * ( t , x , r λ t , x * ) = J T ( t , x , π t λ t , x * , r ) = sup π F θ T t , x J T t , x , π t , r ;
that is, π λ * is optimal for the FHPC, and J T * ( t , x , r λ t , x * ) coincides with the optimal reward for the FHPC.
To prove (47), observe that for each λ < 0 and for all π λ Π λ , Proposition 1 gives
J T * ( t , x , r λ ) J T ( t , x , π t , r λ ) = J T ( t , x , π t , r ) + λ J T ( t , x , π t , c ) θ ¯ T ( t , x , π t )
for all π Π , ( t , x ) [ 0 ; [ × R n , in particular, taking π : = π λ t , x * in the latter expression, and observing that the second term is zero (see (49)) yield
J T * ( t , x , r λ ) J T ( t , x , π t λ * , r ) + λ J T ( t , x , π t λ t , x * , c ) θ ¯ T ( t , x , π t λ t , x * ) = J T ( t , x , π t λ t , x * , r ) + λ t , x * J T ( t , x , π t λ t , x * , c ) θ ¯ T ( t , x , π t λ t , x * ) = J T * ( t , x , r λ t , x * ) .
Since λ < 0 was an arbitrary negative constant, then (47) holds.
(b)
It is clear that λ t , x * = 0 implies r ( t , x , π t ) = r 0 ( t , x , π t ) , for all ( t , x ) [ 0 ; [ × R n and π Π . Since Π 0 is nonempty (see Remark 4), Proposition 2(ii) ensures that π 0 Π 0 is optimal for the FHUP ( λ = 0 ). Given that π 0 F θ T t , x , then π 0 is optimal for the FHPC. Therefore,
J T * ( t , x , r 0 ) = J T ( t , x , π t 0 , r ) = sup π F θ T t , x J T ( t , x , π t , r ) .
Moreover, since J T ( t , x , π t 0 , c ) θ ¯ T ( t , x , π t 0 ) , one can take η < 0 . From (40):
0 η J T ( t , x , π t 0 , c ) θ ¯ T ( t , x , π t 0 ) J T * ( t , x , r η ) J T * ( t , x , r 0 ) .
This yields J T * ( t , x , r 0 ) J T * ( t , x , r η ) , for all η < 0 . Therefore, (48) follows trivially.
Theorem 2
(Examples 1–4, and Lemma 2 continued). Assume that K > 0 and let z > 0 fixed such that for all t [ 0 ; T ]
e η [ T t ] 1 c 1 K η 2 c 1 z η c 2 K η + c 1 K η 3 + c 1 z η 2 + e η ( T t ) ( T t ) c 1 K η c 1 K 2 η 2 q ( T t ) = 0 ,
and
0 < K e η ( T t ) < γ
(a) 
If F K e η ( T t ) > m 1 1 e η ( T t ) , then the mapping λ J T * ( t , z , r λ ) admits a critical point λ t * λ t * ( z ) < 0 satisfying
a λ t * ( t ) = λ t * c 2 m 1 1 e η ( T t ) = F K e η ( T t ) ,
where m 1 is as in (27). Hence, every π λ t * Π λ t * is optimal for the constrained control problem and J T ( t , z , π λ t * , c ) = θ ¯ T ( t , z , π λ t * ) ; in particular, the corresponding f λ t * M Π λ * defined in (29) becomes the policy
f ( t ) : = K e η ( T t ) ,
and the optimal value for the FHPC is given by
J T ( t , z , r λ t * ) = J T ( t , z , f λ t * , r ) = t T F K e η ( T y ) d y + a K η 2 e η ( T t ) 1 a z η e η [ T t ] 1 + K η e η ( T t ) ( T t ) .
Moreover,
J T ( t , z , f λ t * ( t ) , c ) θ ¯ T ( t , z , f λ t * ( t ) ) = e η [ T t ] 1 c 1 K η 2 c 1 z η c 2 K η + c 1 K η 3 + c 1 z η 2 + e η [ T t ] [ T t ] c 1 K η c 1 K 2 η 2 q [ T t ] = 0 .
(b) 
If F K e η ( T t ) m 1 1 e η ( T t ) , then
f 0 ( t ) = I m 1 1 e η ( T t ) [ 0 ; γ ]
defines a policy which belongs to Π 0 and J T ( t , z , f 0 ( t ) , c ) θ ¯ T ( t , z , f 0 ( t ) ) ; that is f 0 M Π 0 . Moreover, f 0 is an optimal policy for the FHPC with optimal value
J T * ( t , z , r 0 ) = J T * ( t , z , r ) = J T * ( t , z , f 0 ( t ) , r ) = y [ t ; T ] : F ( γ ) < a λ ( y ) F ( I ( a 0 ( y ) ) ) a I ( a 0 ( y ) ) η a η z I ( a 0 ( y ) ) η e η ( y t ) d y .
Proof Proof of Theorem 2.
(a)
Consider λ t * R from (56). Then it satisfies the following inequality too
λ t * : = F K e η ( T t ) m 1 1 e η ( T t ) c 2 < 0 .
From (55):
0 < K e η ( T t ) < γ .
Since F ( · ) is a strictly decreasing function, then
F ( γ ) < F K e η ( T t ) = a λ t * ( t ) .
Hence, from (29), f λ t * ( t ) = I ( a λ t * ( t ) ) Π λ * takes the form (57). On the other hand, from Lemma 4(b), the mapping λ J T * ( t , z , r λ ) is differentiable in λ t * < 0 , with
J T * ( t , z , r λ ) λ | λ = λ t * = J T ( t , z , π λ * , c ) θ ¯ T ( t , z , π λ * ) f o r a l l π λ * Π λ t * .
In particular, if one considers π λ t * : = f λ t * as given by (57), and then replaces it in (31) and (32), one obtains that J T ( t , z , f λ t * , c ) = θ ¯ T ( t , z , f λ t * ) using the condition (54), i.e., λ t * is a critical point of the function λ J T * ( t , z , r λ ) . Thus, from Theorem 1(b), every π λ t * Π λ * is an optimal policy for the control problem with constraints, and J T ( t , z , π λ t * , c ) = θ ¯ T ( t , z , π λ t * ) , with optimal value J T * ( t , z , r λ t * ) = J T ( t , z , π λ t * , r ) = J T ( t , z , f λ t * , r ) .
(b)
Observe that
F ( γ ) < F K e η ( T t ) m 1 1 e η ( T t ) = a 0 ( t ) ,
which implies that
I a 0 ( t ) = I m 1 [ 1 e η ( T t ) ] K e η ( T t ) < γ .
From (29), it follows that f 0 ( t ) = I a 0 ( t ) M Π 0 . Moreover, by (61)–(62)
J T t , z , f 0 , c θ ¯ T t , z , f 0 = 1 1 η y [ t ; T ] : F ( γ ) < a 0 ( y ) c 1 I ( a 0 ( y ) ) η + c 1 η z I ( a 0 ( y ) ) η e η ( y t ) d y + c 2 y [ t ; T ] : F ( γ ) < a 0 ( y ) I ( a 0 ( y ) ) d y q ( T t ) 1 1 η y [ t ; T ] : F ( γ ) < a 0 ( y ) c 1 K e η ( T y ) η + c 1 η z K e η ( T y ) η e η ( y t ) d y + c 2 y [ t ; T ] : F ( γ ) < a 0 ( y ) K e η ( T y ) d y q ( T t ) = e η ( T t ) 1 c 1 K η 2 c 1 z η c 2 K η + c 1 K η 3 + c 1 z η 2 + e η [ T t ] ( T t ) c 1 K η c 1 K 2 η 2 q ( T t ) = 0 ,
that is, J T t , x , f 0 , c θ ¯ T t , x , f 0 0 . Hence, Theorem 1(b) ensures that f 0 is an optimal policy for the FHPC with optimal value J T * ( t , z , r ) = J T ( t , z , f 0 , r ) . Henceforth, replacing f 0 into (30), one easily deduces (59).
Remark 5.
If the opposite condition in (54) occurs, then the existence of a critical point of the mapping λ J T * ( t , z , r λ ) implies necessarily that
F ( γ ) a 0 ( t ) = m 1 1 e η ( T t ) a n d f λ ( t ) = γ f o r a l l λ 0 .
In this case, every λ 0 is a critical point of λ J T * ( t , z , r λ ) and f λ ( t ) = γ is an optimal policy for the FHPC. To avoid this trivial situation, under the fact F ( ) = 0 , choose γ large enough such that
F ( γ ) < m 1 1 e η ( T t ) .
Now use Theorem 2 to propose a modification of Algorithm 1 to compute the integral inside (20). Observe that it is no longer needed to include the computation of the Vasicek process (8) because the optimal values of the controllers f λ * —given by (57), and the Lagrange multipliers λ t * —given by (60)- are non-stationary along time.
Example 5
(Examples 1–4, Lemma 2, and Theorem 2 continued). Algorithms 2 and 3 can be used to compare the Monte Carlo simulations for the integral inside the expectation operator (20) with the results (formula (58)) from Theorem 2. To this end, recall from Example 1, the choice made for the parameters of (7) (that is: x 0 = 5 , σ ( x ) 0.5 , η = 1 and T = 1 ). In addition, choose constants that meet (12): these are a = 1.25 , γ = 1 , c 1 = 0.1 , c 2 = 0.05 , and q = 0.0195 . With this configuration, condition (54) holds for all t [ 0 ; 1 ] with an error of, at most 0.004 (see Figure 3). With all these in mind, formula (58) in Theorem 2 yields an optimal value for the FHPC of
J 1 * ( 0 , 5 , r λ t * ) = J 1 * ( 0 , 5 , f λ t * , r ) = 3.58813 .
Algorithm 3: Integral algorithm
Mathematics 09 01466 i002
The use of Algorithms 2 and 3 (with 10,000 simulations) gives optimal values for the FHUP around
J 1 * ( 0 , 5 , r λ t * ) 3.3231104 .
The relative error implied by the latter numeric expression and (64) is about 7.3%. The step size used, along with the error involved in hypothesis (54) explain this difference. Figure 4 shows the resulting pollution stock along time when the optimal strategy is implemented.

4. Concluding Remarks

This paper studies a stochastic system on a finite-time horizon under the criterion of the total performance with restrictions with unbounded coefficients of all: the diffusion, the reward and the constraints. The results have been illustrated by means of a sequence of examples, a Lemma and a Theorem. The approach is based on the use of some classic dynamic programming tools, and the Lagrange multipliers technique for optimization with restrictions.
The results of this work represent a natural extension of the ones introduced in [12], to the non-stationary case. All these can also be applied to the control of pollution accumulation as presented in [17,18]. An additional contribution of this presentation is given by the optimal controllers –and objective function- for a finite-time horizon under constraints. Moreover, this work used the tools presented in [25], and the Monte Carlo simulation technique to test its analytic findings. This represents a major implication of this work concerning the current methodology for resource management and consumption when pollution has an active role. Indeed, the model presented along this paper can be used for the purpose of decision-making when the social welfare, and the cost and rewards constraints are known and parametrized.
A plausible extension of this paper could be related to looking for optimal controllers on a random horizon with a constrained performance index, in the fashion of [31].

Author Contributions

Conceptualization: B.A.E.-T. and J.D.L.-B.; methodology and software: J.D.L.-B.; investigation, validation and resources: B.A.E.-T., J.D.L.-B. and J.G.-M.; formal analysis: B.A.E.-T. and J.D.L.-B.; writing—original draft preparation: B.A.E.-T. and J.G.-M.; writing—review and editing: B.A.E.-T. and J.D.L.-B.; funding acquisition: B.A.E.-T., J.D.L.-B. and J.G.-M. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Universidad Veracruzana and Universidad Anáhuac México.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Technical Complements

In this appendix, an extension of Theorem 5.1 from [32] to the non-stationary case with only one controller, in a finite horizon is introduced.
For x R n , t [ 0 ; T ] , u U , λ 0 ; and assuming the existence of the functions λ B w ( [ 0 ; T ] × R n ) , h W 1 , 2 ; p ( [ 0 ; T ] × R n ) , v 1 , v 3 , ρ : [ 0 ; T ] × R n × U R . Now define
Ψ ( t , x , u , λ , h ) : = v 1 ( t , x , u ) + λ ( t , x ) [ v 3 ( t , x , u ) ρ ( t , x , u ) ] + h ( t , x ) , b ( x , u )
L ^ λ u h ( t , x ) : = Ψ ( t , x , u , λ , h ) + t h ( t , x ) + 1 2 Tr [ [ H h ( t , x ) ] a ( x ) ] .
Furthermore, for π = ( π t : t 0 ) Π , define
Ψ ( t , x , π t , λ , h ) : = U Ψ ( t , x , u , λ , h ) π t ( d u | x ) ,
L ^ λ π t h ( t , x ) : = Ψ ( t , x , π t , λ , h ) + t h ( t , x ) + 1 2 Tr [ [ H h ( t , x ) ] a ( x ) ] .
Definitions (A1)–(A4) will be used in the next couple of results.
Theorem A1.
Let R n be a C 2 -class bounded domain and suppose that Hypotheses 1 and 2 hold. Moreover, assume the existence of sequences h m W 1 , 2 ; p ( [ 0 ; T ] × R n ) , ε m L p ( [ 0 ; T ] × R n ) with p > n , λ m B w ( [ 0 ; T ] × R n ) , π m Π satisfying that
(a) 
L ^ λ m π m h m = ε m [ 0 ; T ] × R n , for m = 1 , 2 ,
(b) 
There exists a constant M ¯ 1 such that h m W 1 , 2 ; p ( [ 0 ; T ] × R n ) M ¯ 1 for m = 1 , 2 ,
(c) 
ε m converges in L p ( [ 0 ; T ] × R n ) to some function ε.
(e) 
λ m converges uniformly to some function λ.
(f) 
π m W π Π .
Then, there exists a function h W 1 , 2 ; p ( [ 0 ; T ] × R n ) and a sequence m k { 1 , 2 , } such that for t [ 0 ; T ] fixed, h m k ( t , · ) h ( t , · ) in the norm of C 1 ; η ( R n ) for η < 1 n p as k ; and for x R n fixed, h m k ( · , x ) h ( · , x ) in the norm of C 1 ( [ 0 ; T ] ) . Moreover,
L ^ λ π h = ε i n [ 0 ; T ] × R n .
Proof of Theorem A1.
The first step is to prove the existence of a function h W 1 , 2 ; p ( [ 0 ; T ] × R n ) , and a subsequence h m k h m such that h m k h as k weakly in W 1 , 2 ; p ( [ 0 ; T ] × R n ) , and for t [ 0 ; T ] fixed, h m k ( t , · ) h ( t , · ) in the norm of C 1 ; η ( R n ) for η < 1 n p as k ; and for x R n fixed, h m k ( · , x ) h ( · , x ) in the norm of C 1 ( [ 0 ; T ] ) .
As W 2 ; p ( R n ) is a reflexive space (see [33] [Theorem 3.5]), then, by [33] [Theorem 1.17], the ball
H ( t ) : = { h ( t , · ) W 2 ; p ( R n ) : h W 2 ; p ( R n ) M ¯ }
is sequentially compact for each t [ 0 ; T ] fixed. Since p > n , by [33] [Theorem 6.2, part III], the mapping W 2 ; p ( R n ) C 1 ; η ( R n ) , for η 1 n p is compact (and continuous too), so the subset H ( t ) in (A5) is relatively compact in C 1 ; η ( R n ) . This ensures the existence of a function h ( t , · ) W 2 ; p ( R n ) and a subsequence h m k ( t , · ) h m ( t , · ) H ( t ) such that
h m ( t , · ) h ( t , · ) w e a k l y i n W 2 ; p ( R n ) , a n d s t r o n g l y i n C 1 ; η ( R n )
for each t [ 0 ; T ] . Now, since [ 0 ; T ] is a compact set, h m k h as k weakly in W 1 , 2 ; p ( [ 0 ; T ] × R n ) , and for t [ 0 ; T ] fixed, h m k ( t , · ) h ( t , · ) in the norm of C 1 ; η ( R n ) for η < 1 n p as k ; and for x R n fixed, h m k ( · , x ) h ( · , x ) in the norm of C 1 ( [ 0 ; T ] ) .
Now, it is needed to prove that
R n 0 T g ( t , x ) Ψ ( t , x , π t m , λ m , h m ) ) d t d x m R n 0 T g ( t , x ) Ψ ( t , x , π t , λ , h ) d t d x f o r a l l g L 1 ( [ 0 ; T ] × R n ) .
To this end, use (A1), and the triangle’s inequality, to write
R n 0 T g ( t , x ) Ψ ( t , x , π t m , λ m , h m ) d t d x R n 0 T g ( t , x ) Ψ ( t , x , π t , λ , h ) d t d x R n 0 T g ( t , x ) [ v 1 ( t , x , π t m ) v 1 ( t , x , π t ) ] d t d x + R n 0 T g ( t , x ) { λ m ( t , x ) [ v 3 ( t , x , π t m ) ρ ( t , x , π t m ) ] λ ( t , x ) [ v 3 ( t , x , π t ) ρ ( t , x , π t ) ] } d t d x + R n 0 T g ( t , x ) h m ( t , x ) , b ( x , π t m h ( t , x ) , b ( x , π t d t d x .
Now work with the terms of the right-hand-side separately.
R n 0 T g ( t , x ) { λ m ( t , x ) [ v 3 ( t , x , π t m ) ρ ( t , x , π t m ) ] λ ( t , x ) [ v 3 ( t , x , π t ) ρ ( t , x , π t ) ] } d t d x R n 0 T g ( t , x ) [ λ m ( t , x ) λ ( t , x ) ] v 3 ( t , x , π t ) d t d x + R n 0 T g ( t , x ) λ m ( t , x ) [ v 3 ( t , x , π t m ) v 3 ( t , x , π t ) ] d t d x + R n 0 T g ( t , x ) ρ ( t , x , π t ) [ λ m ( t , x ) λ ( t , x ) ] d t d x + R n 0 T g ( t , x ) λ m ( t , x ) [ ρ ( t , x , π t m ) ρ ( t , x , π t ) ] d t d x ,
and
R n 0 T g ( t , x ) h m ( t , x ) , b ( x , π t m h ( t , x ) , b ( x , π t d t d x R n 0 T g ( t , x ) h m ( t , x ) , b ( x , π t m ) b ( x , π t ) d t d x + R n 0 T g ( t , x ) h m ( t , x ) h ( t , x ) , b ( x , π t ) d t d x .
Now write
R n 0 T g ( t , x ) Ψ ( t , x , π t m , λ m , h m ) d t d x R n 0 T g ( t , x ) Ψ ( t , x , π t , λ , h ) d t d x R n 0 T g ( t , x ) [ v 1 ( t , x , π t m ) v 1 ( t , x , π t ) ] d t d x + R n 0 T g ( t , x ) [ λ m ( t , x ) λ ( t , x ) ] v 3 ( t , x , π t ) d t d x + R n 0 T g ( t , x ) λ m ( t , x ) [ v 3 ( t , x , π t m ) v 3 ( t , x , π t ) ] d t d x + R n 0 T g ( t , x ) ρ ( t , x , π t ) [ λ m ( t , x ) λ ( t , x ) ] d t d x + R n 0 T g ( t , x ) λ m ( t , x ) [ ρ ( t , x , π t m ) ρ ( t , x , π t ) ] d t d x + R n 0 T g ( t , x ) h m ( t , x ) , b ( x , π t m ) b ( x , π t ) d t d x + R n 0 T g ( t , x ) h m ( t , x ) h ( t , x ) , b ( x , π t ) d t d x .
Since the mapping W 2 ; p ( R n ) C 1 ; η ( R n ) is continuous, Hypothesis 2b yields that for each t [ 0 ; T ] fixed:
max | h m ( t , · ) | , max 1 i n | i h m ( t , · ) | h m ( t , · ) C 1 ; η ( R n ) M ¯ h m ( t , · ) W 2 ; p ( R n ) M ¯ · M ¯ 1 .
Since t [ 0 ; T ] , remove the time argument from the latter expression by merely substituting the constants M ¯ and M ¯ 1 by another constants. To keep the notation as straightforward as possible, this will not be done. Now, Hypothesis (H1b) gives the existence of a constant K 1 ( R n ) , such that | b ( x , π ) | K 1 ( R n ) . Moreover, there also exists a positive constant k ( [ 0 ; T ] × R n ) such that
| v 1 ( t , x , π ) | + | v 3 ( t , x , π ) | k ( [ 0 ; T ] × R n ) .
Take all of these facts, and observe that:
R n 0 T g ( t , x ) Ψ ( t , x , π t m , λ m , h m ) d t d x R n 0 T g ( t , x ) Ψ ( t , x , π t , λ , h ) d t d x R n 0 T g ( t , x ) [ v 1 ( t , x , π t m ) v 1 ( t , x , π t ) ] d t d x + k ( [ 0 ; T ] × R n ) · g L 1 ( [ 0 ; T ] × R n ) · λ m λ B w ( [ 0 ; T ] × R n ) + λ m B w ( [ 0 ; T ] × R n ) · R n 0 T g ( t , x ) [ v 3 ( t , x , π t m ) v 3 ( t , x , π t ) ] d t d x + g L 1 ( [ 0 ; T ] × R n ) · ρ ( · , · , π t ) B w ( [ 0 ; T ] × R n ) · λ m λ B w ( [ 0 ; T ] × R n ) + λ m B w ( [ 0 ; T ] × R n ) · R n 0 T g ( t , x ) [ ρ ( t , x , π t m ) ρ ( t , x , π t ) ] d t d x + n M ¯ M ¯ 1 R n 0 T g ( t , x ) b ( x , π t m ) b ( x , π t ) d t d x + K 1 ( R n ) · g B w ( [ 0 ; T ] × R n ) · sup t [ 0 ; T ] h m ( t , · ) h ( t , · ) C 1 ; η ( [ 0 ; T ] × R n ) .
The boundedness of v 1 and v 3 in [ 0 ; T ] × R n ; and the convergence of π m in the topology of relaxed controls yield that the right hand of the latter expression equals zero when m . Use Theorem 2.10 in [34] to see that
L ^ λ π h = ε i n [ 0 ; T ] × R n .
This proves the result. □

References

  1. Howard, R. Dynamic Programming and Markov Processes; John Wiley and Sons: New York, NY, USA, 1960. [Google Scholar]
  2. Fleming, W.H. Some Markovian Optimization Problems. J. Math. Mech. 1963, 12, 131–140. [Google Scholar]
  3. Fleming, W.H. The Cauchy Problem for Degenerate Parabolic Equations. J. Math. Mech. 1964, 13, 987–1008. [Google Scholar]
  4. Fleming, W.H. Optimal Continuous-Parameter Stochastic Control. SIAM Rev. 1969, 11, 470–509. [Google Scholar] [CrossRef]
  5. Kogan, Y. On Optimal Control of a Non-Terminating Diffusion Process with Reflection. Theory Probab. Appl. 1969, 14, 496–502. [Google Scholar] [CrossRef]
  6. Puterman, M.L. Optimal control of diffusion processes with reflection. J. Optim. Theory Appl. 1977, 22, 103–116. [Google Scholar] [CrossRef]
  7. Broadie, M.; Cvitanic, J.; Soner, H.M. Optimal replication of contingent claims under portfolio constraints. Rev. Financ. Stud. 1998, 11, 59–79. [Google Scholar] [CrossRef]
  8. Cvitanic, J.; Pham, H.; Touzi, N. A closed form solution for the super-replication problem under transaction costs. Financ. Stoch. 1999, 3, 35–54. [Google Scholar] [CrossRef]
  9. Cvitanic, J.; Pham, H.; Touzi, N. Superreplication in stochastic volatility models under portfolio constraints. J. Appl. Probab. 1999, 36, 523–545. [Google Scholar] [CrossRef]
  10. Soner, M.; Touzi, N. Super replication under gamma constraints. SIAM J. Control Optim. 2000, 39, 73–96. [Google Scholar] [CrossRef] [Green Version]
  11. Borkar, V.; Ghosh, M. Controlled diffusions with constraints. J. Math. Anal. Appl. 1990, 151, 88–108. [Google Scholar] [CrossRef] [Green Version]
  12. Mendoza-Pérez, A.; Jasso-Fuentes, H.; Hernández-Lerma, O. The Lagrange approach to ergodic control of diffusions with cost constraints. Optimization 2015, 64, 179–196. [Google Scholar] [CrossRef]
  13. Prieto-Rumeau, T.; Hernández-Lerma, O. The vanishing discount approach to constrained continuous-time controlled Markov chains. Syst. Control Lett. 2010, 59, 504–509. [Google Scholar] [CrossRef]
  14. Jasso-Fuentes, H.; Escobedo-Trujillo, B.A.; Mendoza-Pérez, A. The Lagrange and the vanishing discount techniques to controlled diffusion with cost constraints. J. Math. Anal. Appl. 2016, 437, 999–1035. [Google Scholar] [CrossRef]
  15. Jiang, K.; You, D.; Li, Z.; Shi, S. A differential game approach to dynamic optimal control strategies for watershed pollution across regional boundaries under eco-compensation criterion. Ecol. Indic. 2019, 105, 229–241. [Google Scholar] [CrossRef]
  16. Kawaguchi, K. Optimal Control of Pollution Accumulation with Long-Run Average Welfare. Environ. Resour. Econ. 2003, 26, 457–468. [Google Scholar] [CrossRef]
  17. Kawaguchi, K.; Morimoto, H. Long-run average welfare in a pollution accumulation model. J. Econom. Dyn. Control 2007, 31, 703–720. [Google Scholar] [CrossRef]
  18. Morimoto, H. Optimal Pollution Control with Long-Run Average Criteria. In Stochastic Control and Mathematical Modeling: Applications in Economics; Encyclopedia of Mathematics and Its Applications, Cambridge University Press: Cambridge, UK, 2010; pp. 237–251. [Google Scholar] [CrossRef]
  19. Jasso-Fuentes, H.; López-Barrientos, J.D. On the use of stochastic differential games against nature to ergodic control problems with unknown parameters. Int. J. Control 2015, 88, 897–909. [Google Scholar] [CrossRef]
  20. Arapostathis, A.; Ghosh, M.K.; Borkar, V.S. Ergodic Control of Diffusion Processes; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  21. Borkar, V. A topology for Markov controls. Appl. Math. Optim. 1989, 20, 55–62. [Google Scholar] [CrossRef]
  22. Warga, J. Optimal Control of Differential and Functional Equations; Academic Press: New York, NY, USA, 1972. [Google Scholar]
  23. Fleming, W.H.; Rishel, R.W. Deterministic and Stochastic Optimal Control; Springer: Berlin/Heidelberg, Germany, 1975. [Google Scholar]
  24. Krylov, N.; Aries, A. Controlled Diffusion Processes; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  25. Higham, D. An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations. SIAM Rev. 2001, 43, 525–546. [Google Scholar] [CrossRef]
  26. Hutzrnthaler, M.; Jentzen, A. Numerical Approximations of Stochastic Differential Equations with Non-Globally Lipschitz Continuous Coefficients; American Mathematical Society: Providence, RI, USA, 2015; Volume 236. [Google Scholar] [CrossRef]
  27. ksendal, B. Stochastic Differential Equations: An Introduction with Applications; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  28. Jasso-Fuentes, H.; Hernández-Lerma, O. Ergodic control, bias, and sensitive discount optimality for Markov diffusion processes. Stoch. Anal. Appl. 2009, 27, 363–385. [Google Scholar] [CrossRef]
  29. Pham, H. Continuous-Time Stochastic Control and Optimization with Financial Applications; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  30. Gilbarg, D.; Trudinger, N.S. Elliptic Partial Differential Equations of Second Order; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  31. López-Barrientos, J.D.; Gromova, E.V.; Miroshnichenko, E.S. Resource exploitation in a stochastic horizon under two parametric interpretations. Mathematics 2020, 8, 1081. [Google Scholar] [CrossRef]
  32. Alaffita-Hernández, F.A.; Escobedo-Trujillo, B.A.; López-Martínez, R. Constrained stochastic differential games with additive structure: average and discount payoffs. J. Dyn. Games 2018, 5, 109–141. [Google Scholar]
  33. Adams, R. Sobolev Spaces; Academic Press: New York, NY, USA, 1975. [Google Scholar]
  34. Folland, G. Real Analysis. Modern Techniques and Their Applications; John Wiley and Sons: New York, NY, USA, 1999. [Google Scholar]
Figure 1. A realization of a trajectory of (7) with x 0 = 5 , η = 1 , σ 0.5 , u ( t ) = x ( t ) , T = 1 , and N = 100 .
Figure 1. A realization of a trajectory of (7) with x 0 = 5 , η = 1 , σ 0.5 , u ( t ) = x ( t ) , T = 1 , and N = 100 .
Mathematics 09 01466 g001
Figure 2. A realization of a trajectory of (8) with x 0 = 5 , η = 1 , σ 0.5 , μ = 5 , T = 1 , and N = 100 .
Figure 2. A realization of a trajectory of (8) with x 0 = 5 , η = 1 , σ 0.5 , μ = 5 , T = 1 , and N = 100 .
Mathematics 09 01466 g002
Figure 3. Error in the approximation of (57).
Figure 3. Error in the approximation of (57).
Mathematics 09 01466 g003
Figure 4. A realization of a trajectory of (7) with x 0 = 5 , η = 1 , σ ( x ) 0.5 , u ( t ) = f λ * ( t ) , T = 1 , and N = 100 .
Figure 4. A realization of a trajectory of (7) with x 0 = 5 , η = 1 , σ ( x ) 0.5 , u ( t ) = f λ * ( t ) , T = 1 , and N = 100 .
Mathematics 09 01466 g004
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Escobedo-Trujillo, B.A.; López-Barrientos, J.D.; Garrido-Meléndez, J. A Constrained Markovian Diffusion Model for Controlling the Pollution Accumulation. Mathematics 2021, 9, 1466. https://doi.org/10.3390/math9131466

AMA Style

Escobedo-Trujillo BA, López-Barrientos JD, Garrido-Meléndez J. A Constrained Markovian Diffusion Model for Controlling the Pollution Accumulation. Mathematics. 2021; 9(13):1466. https://doi.org/10.3390/math9131466

Chicago/Turabian Style

Escobedo-Trujillo, Beatris Adriana, José Daniel López-Barrientos, and Javier Garrido-Meléndez. 2021. "A Constrained Markovian Diffusion Model for Controlling the Pollution Accumulation" Mathematics 9, no. 13: 1466. https://doi.org/10.3390/math9131466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop