A differentiable path-following method to compute subgame perfect equilibria in stationary strategies in robust stochastic games and its applications

doi:10.1016/j.ejor.2021.06.059

European Journal of Operational Research

Volume 298, Issue 3, 1 May 2022, Pages 1032-1050

https://doi.org/10.1016/j.ejor.2021.06.059 Get rights and content

Highlights

•
Compute subgame perfect equilibria in stationary strategies in stochastic games.
•
Develop a logarithmic-barrier differentiable path-following method.
•
Attain a convex-quadratic-penalty differentiable path-following method.
•
Apply the proposed method to solve the problems on medical waste recycling.

Abstract

As an effective paradigm to address uncertainty in payoffs and transition probabilities, robust stochastic games have been formulated in the literature. This paper is concerned with the computation of subgame perfect equilibria in stationary strategies (SSPEs) in robust stochastic games. To tackle this problem, we develop in this paper a globally convergent differentiable path-following method by exploiting the structures of the games. Incorporating a logarithmic-barrier term into each player’s payoff function with an extra variable between zero and one, we constitute a logarithmic-barrier robust stochastic game in which each player solves in each state a convex optimization problem. An application of the optimality conditions to the barrier game together with a fixed-point argument yields a polynomial equilibrium system for the barrier game. As a result of this system, we establish the existence of a smooth path that starts from an arbitrary mixed strategy profile and ends at an SSPE as the extra variable descends from one to zero. As an alternative scheme, we make up a convex-quadratic-penalty robust stochastic game and attain a globally convergent convex-quadratic-penalty differentiable path-following method for SSPEs in robust stochastic games. Numerical comparisons show that the logarithmic-barrier path-following method significantly outperforms the convex-quadratic-penalty path-following method. To further evince the value of the proposed methods, we apply the logarithmic-barrier path-following method to solve a supply chain configuration problem and a market entry problem from medical waste recycling.

Introduction

Stochastic games model situations where competing players take actions over time to achieve their individual objectives. At each stage, the players choose their actions simultaneously and independently of each other and receive instantaneous payoffs. The game then moves into a next state according to a transition probability distribution, and continues thereon. Stochastic games were first formulated by Shapley (1953) with only two players. The extensions to multi-players were carried out by Fink (1964) and Takahashi (1964). Stochastic games have been widely applied in economic analysis. Ericson & Pakes (1995) analyze the behavior of individual firms in an evolving market place. Flesch, Thuijsman, & Vrieze (2007) deal with stochastic games in which the transition probabilities for an action profile in the current state can be decomposed into player-dependent components. Fershtman & Pakes (2012) develop a framework for empirical work of stochastic games and employ a heuristic iterative procedure to compute equilibrium strategies of oligopolies. Parilina, Sedakov, & Zaccour (2017) consider a stochastic dynamic game to determine the price of anarchy. Mandel & Venel (2020) formulate the problem of dynamic competition over social networks as a stochastic game. Garrec & Scarsini (2020) apply stochastic games to searching for an immobile hider on a stochastic network. Solan & Vieille (2015) summarize the historical developments of stochastic games and emphasize the importance of the seminal work in Shapley (1953). To address uncertainty on payoffs and transition probabilities, Kardeş, Ordóñez, & Hall (2011) incorporate robust optimization into a finite discounted stochastic game and formulate a robust stochastic game, which can be regarded as an extension of robust normal-form games introduced by Aghassi & Bertsimas (2006). An application of robust games to a decision problem is presented in Caballero, Lunday, & Uber (2021) where players are uncertain of the opponents’ reasoning abilities. Liu, Xu, Yang, & Zhang (2018) develop several distributionally robust equilibrium models in which players lack complete information on the true probability distribution of uncertainty. Shapiro (2021) extends distributionally robust approaches to multistage stochastic programming. Other applications of robust games can be found in Jiang, Netessine, & Savin (2011) and Zhu, Zhang, & Ye (2013). In robust games, each player accepts a robust optimization approach towards the uncertainty and intends to optimize the worst-case performance, where the worst-case occurs over the ambiguity sets for uncertain parameters.

This paper is concerned with the computation of subgame perfect equilibria in stationary strategies (SSPEs) in robust stochastic games. The concept of SSPE was characterized in Maskin & Tirole (2001), which plays a significant role in stochastic games as demonstrated by Adlakha, Johari, & Weintraub (2015). A subgame perfect equilibrium in stationary strategies depends only on the current state. As mentioned in Herings & Peeters (2004), a stationary strategy profile is an SSPE if each player learns to response optimally in all states. There has been much interest in the computation of SSPEs in stochastic games. Filar, Schultz, Thuijsman, & Vrieze (1991) convert SSPEs in finite discounted stochastic games as global optima of certain nonlinear programs. Herings & Peeters (2004) develop a stochastic tracing procedure to compute SSPEs in stochastic games, where one needs to apply an iterative method several times to find a starting point. To mitigate this deficiency, Li & Dang (2020) present an arbitrary starting stochastic tracing procedure. Borkovsky, Doraszelski, & Kryukov (2010) give a guide to computing SSPEs in stochastic games by a general differentiable homotopy method. Although these methods have significantly advanced the applications of stochastic games, they cannot be directly applied for computing SSPEs in robust stochastic games.

Since an SSPE is a Nash equilibrium in stationary strategies, the computation of SSPEs is closely related to the computation of Nash equilibria. As a class of the most effective mechanisms for computing Nash equilibria, globally convergent path-following methods have been developed in the literature. The first path-following method was described in the seminal work of Lemke & Howson (1964) to compute a Nash equilibrium of a two-person game. The existence of Nash equilibrium was established by Nash (1951) through an equivalent fixed point problem. To compute fixed points of continuous mappings, simplicial path-following methods were pioneered by Scarf (1967) and substantially developed in the literature such as Allgower & Georg (2003), Dang (1991), Eaves (1972), Eaves & Saigal (1972), Kojima & Yamamoto (1984), van der Laan & Talman (1979), Scarf (1973) and Todd (1976). To enjoy differentiability of a mapping, differentiable path-following methods were invented by Kellogg, Li, & Yorke (1976) through a constructive proof to Brouwer’s fixed-point theorem. Some further developments of differentiable path-following methods can be found in Garcia & Zangwill (1981), Kubler, Renner, & Schmedders (2014) and Watson (2001) and the references therein. Nash equilibria can also be reformulated as solutions to a variational inequality problem or a stationary point problem of a smooth mapping on a polytope. Simplicial path-following methods are expanded to computing a stationary point in Dai, van der Laan, Talman, & Yamamoto (1991). Differentiable path-following methods are developed for computing stationary points in the literature such as Ding (1993), Hale, Yin, & Zhang (2008), Shang, Xu, & Yu (2011) and Zhou & Yu (2014). Recently, Migot & Cojocaru (2020) propose a parameterized variational inequality scheme to track the solution set for a generalized Nash equilibrium problem. To take advantage of special structures of games, several path-following methods have been specifically devised for computing Nash equilibria in the literature. By subdividing the product of strategy spaces into simplices, simplicial path-following methods have been tailored for computing Nash equilibria in Doup & Talman (1987), van den Elzen & Talman (1999), Govindan & Wilson (2010) and von Stengel, van den Elzen, & Talman (2002). Integrating a linear term into the payoff of each player, Herings & Peeters (2001) derive a differentiable path-following method to select a Nash equilibrium. Utilizing a structure theorem for the Nash equilibrium correspondence in Kohlberg & Mertens (1986), Govindan & Wilson (2003) attain a piece-wise differentiable path-following method to compute Nash equilibria. By constructing a convex-quadratic-penalty game, a differentiable path-following method is proposed in Chen & Dang (2019) for a refinement of the Nash equilibrium, which significantly outperforms a simplicial path-following method especially when the problem is large. As a result of the special structure of a two-person inspection game, Deutsch (2021) recently develops a polynomial-time method to compute all Nash equilibria solutions.

To the best of our knowledge, there is no method specifically designated for computing SSPEs in robust stochastic games. Kardeş et al. (2011) employ the LOQO software package to compute SSPEs in robust stochastic games. However, the package is for solving continuously differentiable constrained optimization problems with interior-point methods and a sequence of quadratic approximations and may fail to converge to an SSPE. Since the ambiguity sets of payoffs and transition probabilities in robust stochastic games induce a rather large number of variables, one can expect that simplicial path-following methods would take much more time than differentiable path-following methods especially when the problems are large. Inspired by this fact, this paper intends to develop a differentiable path-following method by capitalizing on special structures of robust stochastic games. Incorporating logarithmic-barrier terms into each player’s payoff function with an extra variable ranging between zero and one, we constitute a logarithmic-barrier robust stochastic game in which each player solves in each state against a given mixed strategy profile a convex optimization problem. An exploitation of the optimality conditions to the optimization problem together with a fixed point argument gives us a polynomial equilibrium system of the barrier game. The set of solutions to the equilibrium system contains a smooth path, which starts from an arbitrary given point and ends at an SSPE of the prime game as the extra variable descends from one to zero. As an alternative scheme, we also make up a convex-quadratic-penalty robust stochastic game and establish the existence of a smooth path to an SSPE. Numerical results show that the logarithmic-barrier differentiable path-following method significantly outperforms the convex-quadratic-penalty differentiable path-following method. To further evince its value, we apply the logarithmic-barrier differentiable path-following method to solve two robust stochastic games arising from supply chain configuration and market entry in area forwarding medical waste recycling.

The remaining of this paper is organized as follows. In Section 2, we introduce some notations in robust stochastic games and formulate the equilibrium system. In Section 3, we develop a logarithmic-barrier differentiable path-following method to find SSPEs in robust stochastic games. As an alternative scheme, a convex-quadratic-penalty differentiable path-following method is proposed in Section 4. Numerical performance and applications are reported in Section 5. We conclude this study in Section 6.

Section snippets

SSPE in robust stochastic games

As in the literature of game theory, we need the following notations for our further developments. Let $Ω = {ω_{1}, ω_{2}, \dots, ω_{d}}$ be the state space and $N = {1, 2, \dots, n}$ the set of players. We denote by $▵ (Ω)$ the family of probability distributions on $Ω$ . The pure action set of player $i$ in state $ω$ equals $S_{ω}^{i} = {s_{ω j}^{i} | j \in M_{ω}^{i}}$ with $M_{ω}^{i} = {1, 2, \dots, m_{ω}^{i}}$ . The set of pure action profiles in state $ω$ is $S_{ω} = \prod_{i = 1}^{n} S_{ω}^{i}$ . We write an element of $S_{ω}$ as $s_{ω} = (s_{ω j_{1}}^{1}, s_{ω j_{2}}^{2}, \dots, s_{ω j_{n}}^{n})$ . The set of mixed actions of player $i$ in state $ω$ equals $X_{ω}^{i} = {$

A logarithmic-barrier robust stochastic game and a smooth path to an SSPE

We develop in this section a logarithmic-barrier differentiable path-following method to find an SSPE in $Γ$ . Let $x_{ω}^{0 i} \in X_{ω}^{i}$ be a given totally mixed strategy and $π_{ω}^{0 i}$ a given vector such that $Q_{ω} π_{ω}^{0 i} = e_{ω}^{i}$ and $π_{ω}^{0 i} (\tilde{ω} | s_{ω}) > 0$ for all $\tilde{ω} \in Ω$ and $s_{ω} \in S_{ω}$ . Let $y_{ω}^{0 i} \in R^{q_{ω}^{i}}$ and $z_{ω}^{0 i} \in R^{f_{ω}^{i}}$ be two given positive vectors and $ξ_{ω}^{0 i} \in R^{p_{ω}^{i}}$ a given vector. Let $b_{ω}^{0 i} = A_{ω}^{i} ξ_{ω}^{0 i} - e_{ω}^{i} - b_{ω}^{i} \in R^{q_{ω}^{i}}$ and $g_{ω}^{0 i} = F_{ω}^{i} π_{ω}^{0 i} - e_{ω}^{i} - g_{ω}^{i} \in R^{f_{ω}^{i}}$ , where $e_{ω}^{i}$ is a vector of ones with appropriate dimension. Let $κ_{0}$ be a given positive number. Let ${\hat{μ}}^{i} = ({\hat{μ}}_{ω}^{i} : ω \in$

A convex-quadratic-penalty robust stochastic game and a smooth path to an SSPE

As an alternative scheme, we develop in this section a convex-quadratic-penalty path-following method. Let ${\hat{μ}}^{i} = ({\hat{μ}}_{ω}^{i} : ω \in Ω)$ be the unique solution to the linear system with a given tuple $({\hat{x}}_{ω j}^{i}, {\hat{w}}_{ω}^{i}, {\hat{π}}_{ω}^{i})$ , $\begin{matrix} (1 - t) (u_{ω}^{i} ({\hat{x}}_{ω}; {\hat{w}}_{ω}^{i}) + δ_{i} \sum_{\tilde{ω} \in Ω} {\hat{π}}_{ω}^{i} (\tilde{ω} | {\hat{x}}_{ω}) μ_{\tilde{ω}}^{i}) - t \sum_{j \in M_{ω}^{i}} {\hat{x}}_{ω j}^{i} ({\hat{x}}_{ω j}^{i} - x_{ω j}^{0 i}) \\ - μ_{ω}^{i} = 0, ω \in Ω . \end{matrix}$ For $t \in [0, 1]$ , we form with the problem (6) a convex-quadratic-penalty robust stochastic game $Γ_{P} (t)$ in which player $i$ solves in state $ω$ against a given tuple of $({\hat{x}}_{ω}, {\hat{w}}_{ω}, {\hat{π}}_{ω})$ the convex optimization problem, $\begin{matrix} max_{x_{ω}^{i}, y_{ω}^{i},} \end{matrix}$

Numerical performance and applications

This section makes numerical comparisons between two differentiable path-following methods proposed in this paper. We denote by LBPM and CQPPM the logarithm-barrier path-following method and the convex-quadratic-penalty path-following method, respectively. We have coded the two methods in MATLAB and the computation is carried out on a Dell workstation with Windows Server 2018: Intel(R) Core(TM) i7-8700 CPU @ 3.20 gigahertz 3.19 gigahertz RAM 16.0 gigabyte. The values for parameters of the

Concluding remarks

We have developed in this paper a logarithmic-barrier differentiable path-following method to compute SSPEs in robust stochastic games. The method aims to fully exploit special structures of the games in the computation. By incorporating with an extra variable $t \in [0, 1]$ logarithmic-barrier terms into payoff functions of players, we have constituted a logarithmic-barrier robust stochastic game in which each player solves in each state a convex optimization problem. As a result of the optimality

References (68)

S. Adlakha et al.
Equilibria of dynamic games with many players: Existence, approximation, and market structure
Journal of Economic Theory
(2015)
W.N. Caballero et al.
Identifying behaviorally robust strategies for normal form games under varying forms of uncertainty
European Journal of Operational Research
(2021)
Y. Deutsch
A polynomial-time method to compute all nash equilibria solutions of a general two-person inspection game
European Journal of Operational Research
(2021)
J. Ding
A continuation algorithm for a class of linear complementarity problems using an extrapolation technique
Linear Algebra and its Applications
(1993)
B.C. Eaves et al.
General equilibrium models and homotopy methods
Journal of Economic Dynamics and Control
(1999)
A. van den Elzen et al.
An algorithmic approach toward the tracing procedure for bi-matrix games
Games and Economic Behavior
(1999)
J. Flesch et al.
Stochastic games with additive transitions
European Journal of Operational Research
(2007)
T. Garrec et al.
Search for an immobile hider on a stochastic network
European Journal of Operational Research
(2020)
S. Govindan et al.
A global newton method to compute nash equilibria
Journal of Economic Theory
(2003)
E. Kardeş
On discounted stochastic games with incomplete information on payoffs and a security application
Operations Research Letters
(2014)

F. Kubler et al.

European Journal of Operational Research

(2018)

A. Mandel et al.

Dynamic competition over social networks

European Journal of Operational Research

(2020)

E. Maskin et al.

Markov perfect equilibrium: I. Observable actions

Journal of Economic Theory

(2001)

T. Migot et al.

A parametrized variational inequality approach to track the solution set of a generalized nash equilibrium problem

European Journal of Operational Research

(2020)

K. Palmer et al.

Optimal policies for solid waste disposal taxes, subsidies, and standards

Journal of Public Economics

(1997)

E. Parilina et al.

Price of anarchy in a linear-state stochastic dynamic game

European Journal of Operational Research

(2017)

L.S. Shapley

Stochastic games

Proceedings of the National Academy of Sciences of the USA 39 (Chapter 1 in this volume)

(1953)

M. Aghassi et al.

Robust game theory

Mathematical Programming

(2006)

Allgower, E. L., & Georg, K. (2003). Introduction to numerical continuation...

Z.M. Avsar et al.

Inventory control under substitutable demand: A stochastic game application

Naval Research Logistics (NRL)

(2002)

J. Beliën et al.

Municipal solid waste collection and management problems: a literature review

Transportation Science

(2014)

D. Besanko et al.

Learning-by-doing, organizational forgetting, and industry dynamics

Econometrica

(2010)

R.N. Borkovsky et al.

A user’s guide to solving dynamic stochastic games using the homotopy method

Operations Research

(2010)

Y. Chen et al.

A differentiable homotopy method to compute perfect equilibria

Mathematical Programming

(2019)

Y. Dai et al.

A simplicial algorithm for the nonlinear stationary point problem on an unbounded polyhedron

SIAM Journal on Optimization

(1991)

C. Dang

The $D_{1}$ -triangulation of $R^{n}$ for simplicial algorithms for computing solutions of nonlinear equations

Mathematics of Operations Research

(1991)

T. Doup et al.

A new simplicial variable dimension algorithm to find equilibria on the product space of unit simplices

Mathematical Programming

(1987)

B.C. Eaves

Homotopies for computation of fixed points

Mathematical Programming

(1972)

B.C. Eaves et al.

Homotopies for computation of fixed points on unbounded regions

Mathematical Programming

(1972)

R. Ericson et al.

Markov-perfect industry dynamics: A framework for empirical work

The Review of Economic Studies

(1995)

C. Fershtman et al.

Dynamic games with asymmetric information: A framework for empirical work

The Quarterly Journal of Economics

(2012)

A.V. Fiacco

Introduction to sensitivity and stability analysis in nonlinear programming

(1983)

J.A. Filar et al.

Nonlinear programming and stationary equilibria in stochastic games

Mathematical Programming

(1991)

Cited by (1)

A differentiable path-following method to compute Nash equilibria in robust normal-form games
2023, Optimization

^☆: We are very grateful to the editor and two anonymous reviewers for their valuable comments and suggestions, which have significantly enhanced the quality of this paper. This work was partially supported by GRF: CityU 11304620 of the Government of Hong Kong SAR.

View full text

Innovative Applications of O.R.A differentiable path-following method to compute subgame perfect equilibria in stationary strategies in robust stochastic games and its applications☆

Highlights

Abstract

Introduction

Section snippets

SSPE in robust stochastic games

A logarithmic-barrier robust stochastic game and a smooth path to an SSPE

A convex-quadratic-penalty robust stochastic game and a smooth path to an SSPE

Numerical performance and applications

Concluding remarks

Journal of Economic Theory

European Journal of Operational Research

European Journal of Operational Research

Linear Algebra and its Applications

Journal of Economic Dynamics and Control

Games and Economic Behavior

European Journal of Operational Research

European Journal of Operational Research

Journal of Economic Theory

Operations Research Letters

European Journal of Operational Research

European Journal of Operational Research

Journal of Economic Theory

European Journal of Operational Research

Journal of Public Economics

European Journal of Operational Research

Proceedings of the National Academy of Sciences of the USA 39 (Chapter 1 in this volume)

Robust game theory

Mathematical Programming

Inventory control under substitutable demand: A stochastic game application

Naval Research Logistics (NRL)

Municipal solid waste collection and management problems: a literature review

Transportation Science

Learning-by-doing, organizational forgetting, and industry dynamics

Econometrica

A user’s guide to solving dynamic stochastic games using the homotopy method

Operations Research

A differentiable homotopy method to compute perfect equilibria

Mathematical Programming

A simplicial algorithm for the nonlinear stationary point problem on an unbounded polyhedron

SIAM Journal on Optimization

The D1-triangulation of Rn for simplicial algorithms for computing solutions of nonlinear equations

Mathematics of Operations Research

A new simplicial variable dimension algorithm to find equilibria on the product space of unit simplices

Mathematical Programming

Homotopies for computation of fixed points

Mathematical Programming

Homotopies for computation of fixed points on unbounded regions

Mathematical Programming

Markov-perfect industry dynamics: A framework for empirical work

The Review of Economic Studies

Dynamic games with asymmetric information: A framework for empirical work

The Quarterly Journal of Economics

Introduction to sensitivity and stability analysis in nonlinear programming

Nonlinear programming and stationary equilibria in stochastic games

Mathematical Programming

Innovative Applications of O.R.
A differentiable path-following method to compute subgame perfect equilibria in stationary strategies in robust stochastic games and its applications☆

The $D_{1}$ -triangulation of $R^{n}$ for simplicial algorithms for computing solutions of nonlinear equations