Innovative Applications of O.R.
A differentiable path-following method to compute subgame perfect equilibria in stationary strategies in robust stochastic games and its applications

https://doi.org/10.1016/j.ejor.2021.06.059Get rights and content

Highlights

  • Compute subgame perfect equilibria in stationary strategies in stochastic games.

  • Develop a logarithmic-barrier differentiable path-following method.

  • Attain a convex-quadratic-penalty differentiable path-following method.

  • Apply the proposed method to solve the problems on medical waste recycling.

Abstract

As an effective paradigm to address uncertainty in payoffs and transition probabilities, robust stochastic games have been formulated in the literature. This paper is concerned with the computation of subgame perfect equilibria in stationary strategies (SSPEs) in robust stochastic games. To tackle this problem, we develop in this paper a globally convergent differentiable path-following method by exploiting the structures of the games. Incorporating a logarithmic-barrier term into each player’s payoff function with an extra variable between zero and one, we constitute a logarithmic-barrier robust stochastic game in which each player solves in each state a convex optimization problem. An application of the optimality conditions to the barrier game together with a fixed-point argument yields a polynomial equilibrium system for the barrier game. As a result of this system, we establish the existence of a smooth path that starts from an arbitrary mixed strategy profile and ends at an SSPE as the extra variable descends from one to zero. As an alternative scheme, we make up a convex-quadratic-penalty robust stochastic game and attain a globally convergent convex-quadratic-penalty differentiable path-following method for SSPEs in robust stochastic games. Numerical comparisons show that the logarithmic-barrier path-following method significantly outperforms the convex-quadratic-penalty path-following method. To further evince the value of the proposed methods, we apply the logarithmic-barrier path-following method to solve a supply chain configuration problem and a market entry problem from medical waste recycling.

Introduction

Stochastic games model situations where competing players take actions over time to achieve their individual objectives. At each stage, the players choose their actions simultaneously and independently of each other and receive instantaneous payoffs. The game then moves into a next state according to a transition probability distribution, and continues thereon. Stochastic games were first formulated by Shapley (1953) with only two players. The extensions to multi-players were carried out by Fink (1964) and Takahashi (1964). Stochastic games have been widely applied in economic analysis. Ericson & Pakes (1995) analyze the behavior of individual firms in an evolving market place. Flesch, Thuijsman, & Vrieze (2007) deal with stochastic games in which the transition probabilities for an action profile in the current state can be decomposed into player-dependent components. Fershtman & Pakes (2012) develop a framework for empirical work of stochastic games and employ a heuristic iterative procedure to compute equilibrium strategies of oligopolies. Parilina, Sedakov, & Zaccour (2017) consider a stochastic dynamic game to determine the price of anarchy. Mandel & Venel (2020) formulate the problem of dynamic competition over social networks as a stochastic game. Garrec & Scarsini (2020) apply stochastic games to searching for an immobile hider on a stochastic network. Solan & Vieille (2015) summarize the historical developments of stochastic games and emphasize the importance of the seminal work in Shapley (1953). To address uncertainty on payoffs and transition probabilities, Kardeş, Ordóñez, & Hall (2011) incorporate robust optimization into a finite discounted stochastic game and formulate a robust stochastic game, which can be regarded as an extension of robust normal-form games introduced by Aghassi & Bertsimas (2006). An application of robust games to a decision problem is presented in Caballero, Lunday, & Uber (2021) where players are uncertain of the opponents’ reasoning abilities. Liu, Xu, Yang, & Zhang (2018) develop several distributionally robust equilibrium models in which players lack complete information on the true probability distribution of uncertainty. Shapiro (2021) extends distributionally robust approaches to multistage stochastic programming. Other applications of robust games can be found in Jiang, Netessine, & Savin (2011) and Zhu, Zhang, & Ye (2013). In robust games, each player accepts a robust optimization approach towards the uncertainty and intends to optimize the worst-case performance, where the worst-case occurs over the ambiguity sets for uncertain parameters.

This paper is concerned with the computation of subgame perfect equilibria in stationary strategies (SSPEs) in robust stochastic games. The concept of SSPE was characterized in Maskin & Tirole (2001), which plays a significant role in stochastic games as demonstrated by Adlakha, Johari, & Weintraub (2015). A subgame perfect equilibrium in stationary strategies depends only on the current state. As mentioned in Herings & Peeters (2004), a stationary strategy profile is an SSPE if each player learns to response optimally in all states. There has been much interest in the computation of SSPEs in stochastic games. Filar, Schultz, Thuijsman, & Vrieze (1991) convert SSPEs in finite discounted stochastic games as global optima of certain nonlinear programs. Herings & Peeters (2004) develop a stochastic tracing procedure to compute SSPEs in stochastic games, where one needs to apply an iterative method several times to find a starting point. To mitigate this deficiency, Li & Dang (2020) present an arbitrary starting stochastic tracing procedure. Borkovsky, Doraszelski, & Kryukov (2010) give a guide to computing SSPEs in stochastic games by a general differentiable homotopy method. Although these methods have significantly advanced the applications of stochastic games, they cannot be directly applied for computing SSPEs in robust stochastic games.

Since an SSPE is a Nash equilibrium in stationary strategies, the computation of SSPEs is closely related to the computation of Nash equilibria. As a class of the most effective mechanisms for computing Nash equilibria, globally convergent path-following methods have been developed in the literature. The first path-following method was described in the seminal work of Lemke & Howson (1964) to compute a Nash equilibrium of a two-person game. The existence of Nash equilibrium was established by Nash (1951) through an equivalent fixed point problem. To compute fixed points of continuous mappings, simplicial path-following methods were pioneered by Scarf (1967) and substantially developed in the literature such as Allgower & Georg (2003), Dang (1991), Eaves (1972), Eaves & Saigal (1972), Kojima & Yamamoto (1984), van der Laan & Talman (1979), Scarf (1973) and Todd (1976). To enjoy differentiability of a mapping, differentiable path-following methods were invented by Kellogg, Li, & Yorke (1976) through a constructive proof to Brouwer’s fixed-point theorem. Some further developments of differentiable path-following methods can be found in Garcia & Zangwill (1981), Kubler, Renner, & Schmedders (2014) and Watson (2001) and the references therein. Nash equilibria can also be reformulated as solutions to a variational inequality problem or a stationary point problem of a smooth mapping on a polytope. Simplicial path-following methods are expanded to computing a stationary point in Dai, van der Laan, Talman, & Yamamoto (1991). Differentiable path-following methods are developed for computing stationary points in the literature such as Ding (1993), Hale, Yin, & Zhang (2008), Shang, Xu, & Yu (2011) and Zhou & Yu (2014). Recently, Migot & Cojocaru (2020) propose a parameterized variational inequality scheme to track the solution set for a generalized Nash equilibrium problem. To take advantage of special structures of games, several path-following methods have been specifically devised for computing Nash equilibria in the literature. By subdividing the product of strategy spaces into simplices, simplicial path-following methods have been tailored for computing Nash equilibria in Doup & Talman (1987), van den Elzen & Talman (1999), Govindan & Wilson (2010) and von Stengel, van den Elzen, & Talman (2002). Integrating a linear term into the payoff of each player, Herings & Peeters (2001) derive a differentiable path-following method to select a Nash equilibrium. Utilizing a structure theorem for the Nash equilibrium correspondence in Kohlberg & Mertens (1986), Govindan & Wilson (2003) attain a piece-wise differentiable path-following method to compute Nash equilibria. By constructing a convex-quadratic-penalty game, a differentiable path-following method is proposed in Chen & Dang (2019) for a refinement of the Nash equilibrium, which significantly outperforms a simplicial path-following method especially when the problem is large. As a result of the special structure of a two-person inspection game, Deutsch (2021) recently develops a polynomial-time method to compute all Nash equilibria solutions.

To the best of our knowledge, there is no method specifically designated for computing SSPEs in robust stochastic games. Kardeş et al. (2011) employ the LOQO software package to compute SSPEs in robust stochastic games. However, the package is for solving continuously differentiable constrained optimization problems with interior-point methods and a sequence of quadratic approximations and may fail to converge to an SSPE. Since the ambiguity sets of payoffs and transition probabilities in robust stochastic games induce a rather large number of variables, one can expect that simplicial path-following methods would take much more time than differentiable path-following methods especially when the problems are large. Inspired by this fact, this paper intends to develop a differentiable path-following method by capitalizing on special structures of robust stochastic games. Incorporating logarithmic-barrier terms into each player’s payoff function with an extra variable ranging between zero and one, we constitute a logarithmic-barrier robust stochastic game in which each player solves in each state against a given mixed strategy profile a convex optimization problem. An exploitation of the optimality conditions to the optimization problem together with a fixed point argument gives us a polynomial equilibrium system of the barrier game. The set of solutions to the equilibrium system contains a smooth path, which starts from an arbitrary given point and ends at an SSPE of the prime game as the extra variable descends from one to zero. As an alternative scheme, we also make up a convex-quadratic-penalty robust stochastic game and establish the existence of a smooth path to an SSPE. Numerical results show that the logarithmic-barrier differentiable path-following method significantly outperforms the convex-quadratic-penalty differentiable path-following method. To further evince its value, we apply the logarithmic-barrier differentiable path-following method to solve two robust stochastic games arising from supply chain configuration and market entry in area forwarding medical waste recycling.

The remaining of this paper is organized as follows. In Section 2, we introduce some notations in robust stochastic games and formulate the equilibrium system. In Section 3, we develop a logarithmic-barrier differentiable path-following method to find SSPEs in robust stochastic games. As an alternative scheme, a convex-quadratic-penalty differentiable path-following method is proposed in Section 4. Numerical performance and applications are reported in Section 5. We conclude this study in Section 6.

Section snippets

SSPE in robust stochastic games

As in the literature of game theory, we need the following notations for our further developments. Let Ω={ω1,ω2,,ωd} be the state space and N={1,2,,n} the set of players. We denote by (Ω) the family of probability distributions on Ω. The pure action set of player i in state ω equals Sωi={sωji|jMωi} with Mωi={1,2,,mωi}. The set of pure action profiles in state ω is Sω=i=1nSωi. We write an element of Sω as sω=(sωj11,sωj22,,sωjnn). The set of mixed actions of player i in state ω equals Xωi={

A logarithmic-barrier robust stochastic game and a smooth path to an SSPE

We develop in this section a logarithmic-barrier differentiable path-following method to find an SSPE in Γ. Let xω0iXωi be a given totally mixed strategy and πω0i a given vector such that Qωπω0i=eωi and πω0i(ω˜|sω)>0 for all ω˜Ω and sωSω. Let yω0iRqωi and zω0iRfωi be two given positive vectors and ξω0iRpωi a given vector. Let bω0i=Aωiξω0ieωibωiRqωi and gω0i=Fωiπω0ieωigωiRfωi, where eωi is a vector of ones with appropriate dimension. Let κ0 be a given positive number. Let μ^i=(μ^ωi:ω

A convex-quadratic-penalty robust stochastic game and a smooth path to an SSPE

As an alternative scheme, we develop in this section a convex-quadratic-penalty path-following method. Let μ^i=(μ^ωi:ωΩ) be the unique solution to the linear system with a given tuple (x^ωji,w^ωi,π^ωi),(1t)(uωi(x^ω;w^ωi)+δiω˜Ωπ^ωi(ω˜|x^ω)μω˜i)tjMωix^ωji(x^ωjixωj0i)μωi=0,ωΩ.For t[0,1], we form with the problem (6) a convex-quadratic-penalty robust stochastic game ΓP(t) in which player i solves in state ω against a given tuple of (x^ω,w^ω,π^ω) the convex optimization problem,maxxωi,yωi,

Numerical performance and applications

This section makes numerical comparisons between two differentiable path-following methods proposed in this paper. We denote by LBPM and CQPPM the logarithm-barrier path-following method and the convex-quadratic-penalty path-following method, respectively. We have coded the two methods in MATLAB and the computation is carried out on a Dell workstation with Windows Server 2018: Intel(R) Core(TM) i7-8700 CPU @ 3.20 gigahertz 3.19 gigahertz RAM 16.0 gigabyte. The values for parameters of the

Concluding remarks

We have developed in this paper a logarithmic-barrier differentiable path-following method to compute SSPEs in robust stochastic games. The method aims to fully exploit special structures of the games in the computation. By incorporating with an extra variable t[0,1] logarithmic-barrier terms into payoff functions of players, we have constituted a logarithmic-barrier robust stochastic game in which each player solves in each state a convex optimization problem. As a result of the optimality

References (68)

  • F. Kubler et al.

    Computing all solutions to polynomial equations in economics

    Handbook of computational economics

    (2014)
  • Y. Liu et al.

    Distributionally robust equilibrium for continuous games: Nash and Stackelberg models

    European Journal of Operational Research

    (2018)
  • A. Mandel et al.

    Dynamic competition over social networks

    European Journal of Operational Research

    (2020)
  • E. Maskin et al.

    Markov perfect equilibrium: I. Observable actions

    Journal of Economic Theory

    (2001)
  • T. Migot et al.

    A parametrized variational inequality approach to track the solution set of a generalized nash equilibrium problem

    European Journal of Operational Research

    (2020)
  • K. Palmer et al.

    Optimal policies for solid waste disposal taxes, subsidies, and standards

    Journal of Public Economics

    (1997)
  • E. Parilina et al.

    Price of anarchy in a linear-state stochastic dynamic game

    European Journal of Operational Research

    (2017)
  • L.S. Shapley

    Stochastic games

    Proceedings of the National Academy of Sciences of the USA 39 (Chapter 1 in this volume)

    (1953)
  • M. Aghassi et al.

    Robust game theory

    Mathematical Programming

    (2006)
  • Allgower, E. L., & Georg, K. (2003). Introduction to numerical continuation...
  • Z.M. Avsar et al.

    Inventory control under substitutable demand: A stochastic game application

    Naval Research Logistics (NRL)

    (2002)
  • J. Beliën et al.

    Municipal solid waste collection and management problems: a literature review

    Transportation Science

    (2014)
  • D. Besanko et al.

    Learning-by-doing, organizational forgetting, and industry dynamics

    Econometrica

    (2010)
  • R.N. Borkovsky et al.

    A user’s guide to solving dynamic stochastic games using the homotopy method

    Operations Research

    (2010)
  • Y. Chen et al.

    A differentiable homotopy method to compute perfect equilibria

    Mathematical Programming

    (2019)
  • Y. Dai et al.

    A simplicial algorithm for the nonlinear stationary point problem on an unbounded polyhedron

    SIAM Journal on Optimization

    (1991)
  • C. Dang

    The D1-triangulation of Rn for simplicial algorithms for computing solutions of nonlinear equations

    Mathematics of Operations Research

    (1991)
  • T. Doup et al.

    A new simplicial variable dimension algorithm to find equilibria on the product space of unit simplices

    Mathematical Programming

    (1987)
  • B.C. Eaves

    Homotopies for computation of fixed points

    Mathematical Programming

    (1972)
  • B.C. Eaves et al.

    Homotopies for computation of fixed points on unbounded regions

    Mathematical Programming

    (1972)
  • R. Ericson et al.

    Markov-perfect industry dynamics: A framework for empirical work

    The Review of Economic Studies

    (1995)
  • C. Fershtman et al.

    Dynamic games with asymmetric information: A framework for empirical work

    The Quarterly Journal of Economics

    (2012)
  • A.V. Fiacco

    Introduction to sensitivity and stability analysis in nonlinear programming

    (1983)
  • J.A. Filar et al.

    Nonlinear programming and stationary equilibria in stochastic games

    Mathematical Programming

    (1991)
  • We are very grateful to the editor and two anonymous reviewers for their valuable comments and suggestions, which have significantly enhanced the quality of this paper. This work was partially supported by GRF: CityU 11304620 of the Government of Hong Kong SAR.

    View full text