Skip to main content
Log in

Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

This paper suggests a procedure to construct the Pareto frontier and efficiently computes the strong Nash equilibrium for a class of time-discrete ergodic controllable Markov chain games. The procedure finds the strong Nash equilibrium, using the Newton optimization method presenting a potential advantage for ill-conditioned problems. We formulate the solution of the problem based on the Lagrange principle, adding a Tikhonov’s regularization parameter for ensuring both the strict convexity of the Pareto frontier and the existence of a unique strong Nash equilibrium. Then, any welfare optimum arises as a strong Nash equilibrium of the game. We prove the existence and characterization of the strong Nash equilibrium, which is one of the main results of this paper. The method is validated theoretically and illustrated with an application example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Clempner, J.B., Poznyak, A.S.: Convergence method, properties and computational complexity for Lyapunov games. Int. J. Appl. Math. Comput. Sci. 21(2), 349–361 (2011)

    Article  MathSciNet  Google Scholar 

  2. Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the stackelberg/nash equilibria using the extraproximal method: convergence analysis and implementation details for Markov chains games. Int. J. Appl. Math. Comput. Sci. 25(2), 337–351 (2015)

    Article  MathSciNet  Google Scholar 

  3. Nash, J.F.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)

    Article  MathSciNet  Google Scholar 

  4. Aumann, R.: Acceptable points in general cooperative n-person games. In: Contributions to the Theory of Games IV, volume 40 of Annals of Mathematics Study, pp. 287–324 (1959)

  5. Ichiishi, T.: A social coalitional equilibrium existence lemma. Econometrica 49, 369–377 (1981)

    Article  MathSciNet  Google Scholar 

  6. Guesnerie, R., Oddou, C.: Second best taxation as a game. J. Econ. Theory 60, 67–91 (1981)

    Article  MathSciNet  Google Scholar 

  7. Greenberg, J., Weber, S.: Stable coalition structures with unidimensional set of alternatives. J. Econ. Theory 60, 62–82 (1993)

    Article  MathSciNet  Google Scholar 

  8. Demange, G.: Intermediate preferences and stable coalition structures. J. Math. Econ. 23, 45–58 (1994)

    Article  MathSciNet  Google Scholar 

  9. Konishi, H., Le Breton, M., Weber, S.: Equilibria in a model with partial rivalry. J. Econ. Theory 72, 225–237 (1997)

    Article  MathSciNet  Google Scholar 

  10. Rozenfeld, O., Tennenholtz, M.: Strong and correlated strong equilibria in monotone congestion games. In: The 2nd Workshop on Internet and Network Economics (WINE 06), pp. 74–86 (2006)

  11. Clempner, J.B., Poznyak, A.S.: Computing the strong nash equilibrium for Markov chains games. Appl. Math. Comput. 265, 911–927 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Trejo, K.K., Clempner, J.B., Poznyak, A.S.: An optimal strong equilibirum solution for cooperative multi-leader-follower Stackelberg Markov chains games. Kibernetika 52(2), 258–279 (2016)

    MathSciNet  MATH  Google Scholar 

  13. Clempner, J.B., Poznyak, A.S.: Simple computing of the customer lifetime value: a fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 23(4), 439–459 (2014)

    Article  Google Scholar 

  14. Clempner, J.B., Poznyak, A.S.: Solving the Pareto front for nonlinear multiobjective Markov chains using the minimum Euclidian distance optimization method. Math. Comput. Simul. 119, 142–160 (2016)

    Article  Google Scholar 

  15. Clempner, J.B., Poznyak, A.S.: Constructing the Pareto front for multi-objective Markov chains handling a strong Pareto policy approach. Comput. Appl. Math. 37(1), 567–591 (2018)

    Article  MathSciNet  Google Scholar 

  16. Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Dierential Equations. Yale University Press, New Haven (1923)

    MATH  Google Scholar 

  17. Clempner, J.B., Poznyak, A.S.: A Tikhonov regularized penalty function approach for solving polylinear programming problems. J. Comput. Appl. Math. 328, 267–286 (2018)

    Article  MathSciNet  Google Scholar 

  18. Tikhonov, A.N., Goncharsky, A.V., Stepanov, V.V., Yagola, A.G.: Numerical Methods for the Solution of Ill-Posed Problems. Kluwer Academic Publishers, Berlin (1995)

    Book  Google Scholar 

  19. Clempner, J.B., Poznyak, A.S.: A Tikhonov regularization parameter approach for solving lagrange constrained optimization problems. Eng. Optim. 50(11), 1996–2012 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julio B. Clempner.

Additional information

Communicated by Kyriakos G. Vamvoudakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 2.1

Proof

Indeed, any non-stationary bounded (defined on a compact set) policy c(n) by the Weierstrass theorem obligatory contains a convergent subsequent realizing the relations

$$\begin{aligned}&\underset{t\rightarrow \infty }{\limsup }\frac{1}{n}\sum \limits _{t=1}^{n}V^{l}\left( c(t)\right) \le \underset{n\rightarrow \infty }{\limsup }\frac{1}{n}\sum \limits _{t=1}^{n}\underset{k=1,\ldots ,t}{ \max }V^{l}\left( c(k)\right) \\&\quad \le \underset{t\rightarrow \infty }{\limsup }\frac{1}{n}\sum \limits _{t=1}^{n} \underset{k\rightarrow \infty }{\limsup } \, V^{l}\left( c(n_{k})\right) =V^{l}\left( c^{**}\right) , \end{aligned}$$

where \(V^{l}\left( c\right) \) is assumed to be monotonically increasing functional of each component \(c_{ik}^{l^{\prime }}\) when other ones are fixed, and \( \underset{k\rightarrow \infty }{\limsup }V^{l}\left( c(n_{k})\right) {:}{=}V^{l}\left( c^{**}\right) \). This upper bound is reached, taking \(c(t)=\)\(c^{*}=c^{**}\) since

$$\begin{aligned} \underset{t\rightarrow \infty }{\limsup }\frac{1}{t}\sum \limits _{h=1}^{t}V^{l}\left( c^{*}\right) =V^{l}\left( c^{*}\right) =V^{l}\left( c^{**}\right) . \square \end{aligned}$$

Appendix B: Proof of Theorem 4.1

Proof

  1. (a)

    First, let us prove that the Hessian matrix \(H{:}{=}\dfrac{\partial ^{2}}{\partial x\partial x^{\intercal }}{\mathcal {L}}_{\theta ,\delta }\left( x,\mu _{0},\mu _{1}\right) \) is strictly positive definite for all \(x\in {\mathbb {R}}^{n}\) and for some positive \(\theta \) and \(\delta \), \(H>0.\) We have

    $$\begin{aligned} \dfrac{\partial ^{2}}{\partial x^{2}}{\mathcal {L}}_{\theta ,\delta }\left( x,\mu _{0},\mu _{1}\right)= & {} \theta \dfrac{\partial ^{2}}{\partial x^{2}} V^{l}(x)+\delta I_{N\times N}\\\ge & {} \delta \left( 1+\dfrac{\theta }{\delta }\zeta ^{-}\right) I_{N\times N}>0 \, {\ \forall } \, \delta >\theta \left| \zeta ^{-}\right| ,\\ \zeta ^{-}&{:=}&\underset{x\in X_{adm}}{\min }\zeta _{\min }\left( \dfrac{ \partial ^{2}}{\partial x^{2}}V^{l}(x)\right) , \end{aligned}$$

    (\(\zeta _{\min }\) is the minimum eigenvalue) such that \(H>0\) if \(\delta >\theta \left| \zeta ^{-}\right| \). This means that RLF ( 11) is strongly convex on x and it has a unique minimal point defined below as \(x^{*}\).

  2. (b)

    In view of the properties

    $$\begin{aligned} \begin{array}{cc} \left( \nabla V^{l}\left( x\right) ,\left( y-x\right) \right) \le V^{l}\left( y\right) -V^{l}\left( x\right)&\, \left( \nabla V^{l}\left( x\right) ,\left( x-y\right) \right) \ge V^{l}\left( x\right) -V^{l}\left( y\right) , \end{array} \end{aligned}$$

    valid for any convex function \(V^{l}\left( x\right) \) and any xy,  for RLF at any admissible points x,\(\mu _{0},\mu _{1}\), and \(x_{t}^{*}=x^{*}\left( \theta _{t},\delta _{t}\right) \), \(\mu _{0,t}^{*}=\mu _{0}^{*}\left( \theta _{t},\delta _{t}\right) ,\)\(\mu _{1,t}^{*}=\mu _{1}^{*}\left( \theta _{t},\delta _{t}\right) \), we have

    $$\begin{aligned}&\left( x-x_{t}^{*},\dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{n},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) - \left( \mu _{0}-\mu _{0,t}^{*},\dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}} _{\theta _{t},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) \nonumber \\&\quad -\left( \mu _{1}-\mu _{1,t}^{*},\dfrac{\partial }{\partial \mu _{1}} {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) = {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x,\mu _{0,t}^{*},\mu _{1,t}^{*}\right) \nonumber \\&\quad -{\mathcal {L}}_{\theta ,\delta }\left( x_{t}^{*},\mu _{0},\mu _{1}\right) + \, \dfrac{\delta _{t}}{2}\left( \left\| x-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}-\mu _{0,t}^{*}\right\| ^{2}+\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}\right) , \nonumber \\ \end{aligned}$$
    (30)

    which by the saddle-point condition Eq. (14) implies

    $$\begin{aligned}&\theta _{t}\left( x-x_{t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial x}V^{l}\left( x\right) +\left( x-x_{t}^{*}\right) ^{\intercal } \left[ A_\mathrm{eq}^{\intercal }\mu _{0}+A_\mathrm{ineq}^{\intercal }\mu _{1}+\delta _{t}x \right] \nonumber \\&\quad +\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \delta _{t}-A_\mathrm{eq}x+b_\mathrm{eq}\right) +\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \delta _{t}-A_\mathrm{ineq}x+b_\mathrm{ineq}\right) \nonumber \\&\quad \ge \dfrac{\delta _{t}}{2}\left( \left\| x-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}-\mu _{0,t}^{*},\right\| ^{2}+\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}\right) . \end{aligned}$$
    (31)
  3. (c)

    Selecting in Eq. (31) \(x{:}{=}x^{*}\in X^{*}\), \(\mu _{0}=\mu _{0}^{*},\)\(\mu _{1}=\mu _{1}^{*}\), and the complementary slackness conditions \( \left( \mu _{1}^{*}\right) _{i}\left( A_\mathrm{ineq}x^{*}-b_\mathrm{ineq}\right) _{i}=\left( \mu _{1,t}^{*}\right) _{i}\left( A_\mathrm{ineq}x_{t}^{*}-b_\mathrm{ineq}\right) _{i}=0 \), we obtain

    $$\begin{aligned}&\theta _{t}\left( x^{*}-x_{t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}V^{l}\left( x^{*}\right) +\left( x^{*}-x_{t}^{*}\right) ^{\intercal }\left[ A_\mathrm{eq}^{\intercal }\mu _{0}^{*}+A_\mathrm{ineq}^{\intercal }\mu _{1}^{*}+\delta _{t}x^{*}\right] \\&\qquad +\left( \mu _{0}^{*}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \delta _{n}\mu _{0}^{*}-A_\mathrm{eq}x^{*}+b_\mathrm{eq}\right) \\&\qquad + \left( \mu _{1}^{*}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \delta _{t}\mu _{1}^{*}-A_\mathrm{ineq}x^{*}+b_\mathrm{ineq}\right) \\&\quad \ge \dfrac{\delta _{t}}{2}\left( \left\| x^{*}-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}^{*}-\mu _{0,t}^{*}\right\| ^{2}+\left\| \mu _{1}^{*}-\mu _{1,t}^{*}\right\| ^{2}\right) \ge 0. \end{aligned}$$

    Simplifying the last inequality, we have

    $$\begin{aligned} \begin{array}{c} \theta _{t}\left( x^{*}\text {-}x_{t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}V^{l}\left( x^{*}\right) \text {+}\delta _{t}\left( x^{*}\text {-}x_{t}^{*}\right) ^{\intercal }x^{*}\text {+}\delta _{t}\left( \mu _{0}^{*}\text {-}\mu _{0,t}^{*}\right) ^{\intercal }\mu _{0}^{*}\text {+} \left( \mu _{1}^{*}\text {-}\mu _{1,t}^{*}\right) ^{\intercal }\delta _{t}\mu _{1}^{*}\!\ge \! 0. \end{array} \end{aligned}$$

    Dividing both sides of this inequality by \(\delta _{t}\) and taking \(\dfrac{ \theta _{t}}{\delta _{t}}\underset{t\rightarrow \infty }{\rightarrow }0\), we get

    $$\begin{aligned} 0\le \,\underset{t\rightarrow \infty }{\limsup }\left[ \left( x^{*}-x_{t}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-\mu _{0,t}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-\mu _{1,t}^{*}\right) ^{\intercal }\mu _{1}^{*}\right] . \end{aligned}$$
    (32)

    Then, there exist subsequences \(\delta _{k}\) and \( \theta _{k}\)\(\left( k\rightarrow \infty \right) \) on which there exist the limits

    $$\begin{aligned}&x_{k}^{*}=x^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{x}}^{*}, \, \\&\mu _{0,k}^{*}=\mu _{0}^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{\mu }}_{0}^{*}, \\&\mu _{1,k}^{*}=\mu _{1}^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{\mu }}_{1}^{*}\text { {\ as} }k\rightarrow \infty . \end{aligned}$$

    Suppose that there exist two limit points for two different convergent subsequences, i.e., there exist the limits

    $$\begin{aligned}&x_{k^{\prime }}^{*}=x^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow \bar{x}^{*}, \, \\&\mu _{0,k^{\prime }}^{*}=\mu _{0}^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow {\bar{\mu }}_{0}^{*}, \\&\mu _{1,k^{\prime }}^{*}=\mu _{1}^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow {\bar{\mu }}_{1}^{*}\text { {\ as } }k\rightarrow \infty . \end{aligned}$$

    Then, on these subsequences one has

    $$\begin{aligned} \begin{array}{c} 0\le \left( x^{*}-{\tilde{x}}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-{\tilde{\mu }}_{0}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-{\tilde{\mu }}_{1}^{*}\right) ^{\intercal }\mu _{1}^{*}, \\ 0\le \left( x^{*}-\bar{x}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-{\bar{\mu }}_{0}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-{\bar{\mu }}_{1}^{*}\right) ^{\intercal }\mu _{1}^{*}. \end{array} \end{aligned}$$

    It follows that points \(\left( {\tilde{x}}^{*}, {\tilde{\mu }}_{0}^{*},{\tilde{\mu }}_{1}^{*}\right) \) and \(\left( \bar{x} ^{*},{\bar{\mu }}_{0}^{*},{\bar{\mu }}_{1}^{*}\right) \) correspond to the minimum point of the function \( s\left( x^{*},\mu _{0}^{*},\mu _{1}^{*}\right) {:}{=}\dfrac{1}{2} \left( \left\| x^{*}\right\| ^{2}+\left\| \mu _{0}^{*}\right\| ^{2}+\left\| \mu _{1}^{*}\right\| ^{2}\right) \) defined on \(X^{*}\otimes \Lambda ^{*}\) for all possible saddle-points of the non-regularized Lagrange function. But \( s\left( x^{*},\mu _{0}^{*},\mu _{1}^{*}\right) \) is strictly convex, and its minimum is unique that gives \({\tilde{x}}^{*}\)\(=\)\(\bar{x}^{*},\)\({\tilde{\mu }}_{0}^{*}={\bar{\mu }}_{0}^{*},\)\({\tilde{\mu }}_{0}^{*}={\bar{\mu }}_{0}^{*}\). \(\square \)

Appendix C: Proof of Lemma 4.1

Proof

It follows from Eq. (30) for the points \(x_{t}^{*}=x^{*}\left( \theta _{t},\delta _{t}\right) \), \(\mu _{0,t}^{*}=\mu _{0}^{*}\left( \theta _{t},\delta _{t}\right) ,\)\(\mu _{1,t}^{*}=\mu _{1}^{*}\left( \theta _{t},\delta _{t}\right) \) to be the extremal points of the function \({\mathcal {L}}_{\theta _{t},\delta _{n}}\left( x,\mu _{0},\mu _{1}\right) \). \(\square \)

Appendix D: Proof of Theorem 4.2

Proof

In view of Eq. (21), it follows

$$\begin{aligned}&F_{t+1}\le F_{t}+\left\| x_{t}^{*}-x_{t+1}^{*}\right\| ^{2}+\left\| \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right\| ^{2}+\left\| \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right\| ^{2} \nonumber \\&\quad +\gamma _{t}^{2}\left\| \dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}+\gamma _{t}^{2}\nonumber \\&\qquad \left\| \dfrac{\partial }{\partial \mu _{0}}\mathcal {L }_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2} \nonumber \\&\quad +\,\gamma _{t}^{2}\left\| \dfrac{\partial }{\partial \mu _{1}}{\mathcal {L}} _{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}- 2\gamma _{t}\nonumber \\&\qquad \left( x_{t}-x_{t}^{*}\right) ^{\intercal } \dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( \mu _{0,t}-\mu _{0,t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( \mu _{1,t}-\mu _{1,t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial \mu _{1}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( x_{t}^{*}-x_{t+1}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) ^{\intercal }\dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}}_{\theta _{n},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\,2\gamma _{t}\left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) ^{\intercal }\dfrac{\partial }{\partial \mu _{1}}{\mathcal {L}}_{\theta _{t},\delta _{t}} \left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\,2\left( x_{t}-x_{t}^{*}\right) ^{\intercal }\left( x_{t}^{*}-x_{t+1}^{*}\right) +2\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) \nonumber \\&\quad +\,2\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) . \end{aligned}$$
(33)

For strongly convex (concave) functions, the following inequalities hold

$$\begin{aligned} \begin{array}{c} \left( x_{t}-x_{t}^{*}\right) ^{\intercal }\dfrac{\partial }{\partial x} {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \ge \delta _{t}\left( 1+\dfrac{\theta _{t}}{\delta _{t}}\zeta ^{-}\right) \left\| x-x_{t}^{*}\right\| ^{2} ,\\ \delta _{t}\left( 1-\epsilon \right) \left\| x-x_{t}^{*}\right\| ^{2}, \, \left| \dfrac{\theta _{t}}{\delta _{t}}F^{-}\right| \le \epsilon , \\ \left( \mu _{0,t}-\mu _{0,t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \le -\delta _{t}\left\| \mu _{0,t}-\mu _{0,t}^{*}\right\| ^{2}, \\ \left( \mu _{0,t}-\mu _{0,t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \le -\delta _{t}\left\| \mu _{0,t}-\mu _{0,t}^{*}\right\| ^{2}. \end{array} \end{aligned}$$

By the Lipschitz property for the gradients of \({\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \), we also have

$$\begin{aligned}&\left\| \dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}\le L_{x}\left\| x-x_{t}^{*}\right\| ^{2}, \\&\left\| \dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}\le L_{\mu _{0}}\left\| \mu _{0,t}-\mu _{0,t}^{*}\right\| ^{2}, \\&\left\| \dfrac{\partial }{\partial \mu _{1}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}\le L_{\mu _{1}}\left\| \mu _{1,t}-\mu _{1,t}^{*}\right\| ^{2}. \end{aligned}$$

By the \(\Lambda \)-inequality \(2\left( a,b\right) \le \left( a,\Lambda a\right) +\left( b,\Lambda ^{-1}b\right) \) valid for any vectors ab and any matrix \(\Lambda >0,\) we get for \(\Lambda =I_{n\times n}\)

$$\begin{aligned} \begin{array}{c} 2\gamma _{t}\left( x_{t}^{*}-x_{t+1}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \le \left( 1+L_{\mu _{1}}\right) \gamma _{t}\left\| \mu _{1,t}-\mu _{1,t}^{*}\right\| ^{2}, \end{array} \end{aligned}$$

and for \(\Lambda =\varepsilon _{t}I\)

$$\begin{aligned}&\left| \left( x_{t}-x_{t}^{*}\right) ^{\intercal }\left( x_{t}^{*}-x_{t+1}^{*}\right) \right| \le \varepsilon _{t}\left\| x_{t}-x_{t}^{*}\right\| ^{2}+\varepsilon _{t}^{-1}\left\| x_{t}^{*}-x_{t+1}^{*}\right\| ^{2} \\&\quad \left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) \le \varepsilon _{t}\left\| \mu _{0}-\mu _{0,t}^{*}\right\| ^{2}+\varepsilon _{t}^{-1}\left\| \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right\| ^{2} \\&\quad \left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) \le \varepsilon _{t}\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}+\varepsilon _{t}^{-1}\left\| \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right\| ^{2}, \end{aligned}$$

which lead to the following estimate

$$\begin{aligned} \begin{array}{c} 2\left( x_{t}-x_{t}^{*}\right) ^{\intercal }\left( x_{t}^{*}-x_{t+1}^{*}\right) +2\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) +2\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\cdot \\ \left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) \le \varepsilon _{t}F_{t}+2\varepsilon _{t}^{-1}\left( C_{\theta }^{2}\left| \theta _{t}-\theta _{m}\right| ^{2}+C_{\delta }^{2}\left| \delta _{n}-\delta _{m}\right| ^{2}\right) . \end{array} \end{aligned}$$

Replacing into Eq. (33) implies (\( L{:}{=}\max \left\{ L_{x},L_{\mu _{0}},L_{\mu _{1}}\right\} ,\)\(C{:}{=}4\max \left\{ C_{\theta }^{2},C_{\delta }^{2}\right\} \))

$$\begin{aligned}&F_{t+1}\le W_{t}\left[ 1-2\gamma _{t}\delta _{t}\left( 1-\epsilon \right) \left( 1- \dfrac{1}{1-\epsilon }\dfrac{\varepsilon _{t}}{\delta _{t}}-\dfrac{1}{ 2\left( 1-\epsilon \right) }\dfrac{\varepsilon _{t}}{\gamma _{t}\delta _{t}}- \dfrac{\left( 1+L\right) }{2\left( 1-\epsilon \right) }\dfrac{\gamma _{t}}{ \delta _{t}}\right) \right] \\&\quad + \, C\gamma _{t}\varepsilon _{t}^{-1}\left( \left| \theta _{t}-\theta _{m}\right| ^{2}+\left| \delta _{t}-\delta _{m}\right| ^{2}\right) . \end{aligned}$$

If a nonnegative \(\left\{ u_{t}\right\} \) sequence satisfies the recurrent inequality

$$\begin{aligned} \begin{array}{c} u_{t+1}\le u_{t}\left( 1-\alpha _{t}\right) +\beta _{t}, \, 0<\alpha _{t}\le 1, \, \sum \limits _{t=0}^{\infty }\alpha _{t}=\infty , \, \dfrac{\beta _{t}}{\alpha _{t}}\underset{t\rightarrow \infty }{\rightarrow }p, \end{array} \end{aligned}$$

then \(u_{t}\underset{t\rightarrow \infty }{\rightarrow }p\). Defining

$$\begin{aligned} \alpha _{t}&{:=}&2\gamma _{t}\delta _{t}\left( 1-\epsilon \right) \left( 1- \dfrac{1}{1-\epsilon }\dfrac{\varepsilon _{t}}{\delta _{t}}-\dfrac{1}{ 2\left( 1-\epsilon \right) }\dfrac{\varepsilon _{t}}{\gamma _{t}\delta _{t}}- \dfrac{\left( 1+L\right) }{2\left( 1-\epsilon \right) }\dfrac{\gamma _{t}}{ \delta _{t}}\right) , \\ \beta _{t}&{:=}&C\gamma _{t}\varepsilon _{t}^{-1}\left( \left| \theta _{t}-\theta _{t+1}\right| ^{2}+\left| \delta _{t}-\delta _{t+1}\right| ^{2}\right) , \end{aligned}$$

and applying Eq. (23) of this theorem for \(p=0\), we obtain the desired result. \(\square \)

Appendix E: Description of the Newton Method

To find \(\lambda _{\delta }^{**}\), let us apply Newton’s optimization method related to the following procedure

$$\begin{aligned} \begin{array}{c} \lambda _{t+1}={\mathrm {Pr}}_{\Delta ^{n}}\left[ \lambda _{t}-\gamma _{t} \left[ \Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) +\varepsilon \right] ^{-1}\Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) \right] , \, \\ {\lambda }_{0}=(1/t,\ldots 1/t), \, t=0,1,\ldots , \, \gamma _{t}>0, \, \sum \limits _{t=0}^{\infty }\gamma _{t}=\infty , \end{array} \end{aligned}$$

where \({\mathrm {Pr}}_{\Delta ^{n}}\) is the projection operator into the simplex. The derivative \(\Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) \) is given by

$$\begin{aligned} \begin{array}{c} \Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) =\frac{\hbox {d}}{ \hbox {d}\lambda }\Phi _{\theta ,\delta }\left( \lambda _{t}\right) =\delta \left( 2\lambda _{t}-1\right) +\theta \sum \limits _{l=1}^{n}\dfrac{\hbox {d}}{\hbox {d}\lambda } V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \cdot \\ \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+} \right] _{+}^{1+\varepsilon }+\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{1+\varepsilon }\right) , \end{array} \end{aligned}$$

where the terms \(\dfrac{\hbox {d}}{\hbox {d}\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \) may be approximated by the Euler method as

$$\begin{aligned} \dfrac{\hbox {d}}{\hbox {d}\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \simeq \varepsilon ^{-1}\left[ V^{l}\left( x^{*}\left( \lambda _{t}+\varepsilon \right) \right) -V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] {:}{=}\Psi \left( \lambda _{t},\varepsilon \right) , \, 0<\varepsilon \ll 1, \end{aligned}$$

and the second derivative \(\Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) \) for two players is given by

$$\begin{aligned}&\Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) = 2\delta +\theta \sum \limits _{l=1}^{n}\left[ \dfrac{d^{2}}{d\lambda ^{2}} V^{l}\right. \\&\qquad \left. \left( x^{*}\left( \lambda _{t}\right) \right) \right] \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+}\right] _{+}^{1+\varepsilon }+\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{n}\right) \right) \right] _{+}^{1+\varepsilon }\right) \\&\quad +\,\theta \left( 1+\varepsilon \right) \sum \limits _{l=1}^{n}\left[ \dfrac{d}{ d\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] ^{2} \\&\qquad \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+}\right] _{+}^{\varepsilon }-\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{\varepsilon }\right) , \end{aligned}$$

where the terms \(\dfrac{\hbox {d}^{2}}{\hbox {d}\lambda ^{2}}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \) may be approximated by the Euler method as

$$\begin{aligned}&\dfrac{\hbox {d}^{2}}{\hbox {d}\lambda ^{2}}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \simeq \frac{1}{\varepsilon ^{2}}\sum \limits _{l=1}^{n}\left[ V^{l}\right. \\&\quad \left. \left( x^{*}\left( \lambda _{t}+2\varepsilon \right) \right) -2V^{l}\left( x^{*}\left( \lambda _{t}+\varepsilon \right) \right) +V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] , \, 0<\varepsilon \ll 1. \end{aligned}$$

Finally, the suggested numerical procedure with \(\Gamma _{t}=\gamma \) for finding \(\lambda _{\delta }^{**}\) for the first derivative

$$\begin{aligned} {\lambda }_{t+1}= & {} {\mathrm {Pr}}_{\Delta ^{n}}\left\{ {\lambda }_{t}-\gamma _{t}\nabla {\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) \right\} , \, {\lambda } _{0}=(1/t,\ldots 1/t), \, t=0,1,\ldots , \\&\quad \nabla {\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda } _{t}\right) {:}{=}\left( \frac{\partial }{\partial \lambda _{1}}{\tilde{\Phi }} _{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\frac{ \partial }{\partial \lambda _{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\ldots ,\frac{\partial }{\partial \lambda _{N}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\right. \\&\quad \left. \left( {\lambda } _{t}\right) \right) ^{\intercal }, \text {where }\frac{\partial }{\partial \lambda _{i}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) {:}{=}\varepsilon ^{-1}\sum \limits _{l=1}^{n}\left[ \nabla _{i}V^{l}\left( {\lambda } _{t},\varepsilon \right) \right] \cdot \\&\quad \left( \left[ V^{l}\left( x^{*}\left( {\lambda }_{t}\right) \right) -V^{l+}\right] _{+}^{1+\varepsilon }-\left[ V^{l-}-V^{l}\left( x^{*}\left( {\lambda }_{t}\right) \right) \right] _{+}^{1+\varepsilon }\right) +\frac{\delta }{2}\frac{\partial }{\partial \lambda _{i}}\left\| {\lambda }\right\| ^{2}, \end{aligned}$$

and for the second derivative

$$\begin{aligned}&{\lambda }_{t+1}={\mathrm {Pr}}_{\Delta ^{n}}\left\{ {\lambda } _{t}-\gamma _{t}\nabla ^{2}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) \right\} , \, {\lambda } _{0}=(1/t,\ldots 1/t), \, t=0,1,\ldots , \\&\nabla ^{2}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) :=\left( \frac{\partial }{\partial \lambda _{1}^{2}}{\tilde{\Phi }} _{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\frac{ \partial }{\partial \lambda _{2}^{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\ldots ,\frac{\partial }{ \partial \lambda _{N}^{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) \right) ^{\intercal },\\&\text {where }\frac{\partial }{\partial \lambda _{i}^{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) := 2\delta +\theta \sum \limits _{l=1}^{n}\left[ \nabla _{i}^{2}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] \cdot \\&\quad \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+} \right] _{+}^{1+\varepsilon }+\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{1+\varepsilon }\right) \\&\qquad +\,\theta \left( 1+\varepsilon \right) \sum \limits _{l=1}^{n}\left[ \nabla _{i}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] ^{2}\left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+}\right] _{+}^{\varepsilon }\right. \\&\qquad \left. -\,\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{\varepsilon }\right) .\square \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Clempner, J.B., Poznyak, A.S. Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games. J Optim Theory Appl 186, 1029–1052 (2020). https://doi.org/10.1007/s10957-020-01729-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01729-3

Keywords

Mathematics Subject Classification

Navigation