Abstract
This paper presents a dynamic Bayesian–Stackelberg incentive-compatible mechanism, in which multiple agents observe private information and learn their behavior through a sequence of interactions in a repeated game, for a class of controllable homogeneous Markov games. We assume that the leaders can ex ante commit to their disclosure strategy and mechanism, and affect followers’ actions. Along the paper, leaders possess and benefit from some commitment leadership, which describes the distinctive nature of a Stackelberg game. In this dynamics, leaders and followers together are in a Stackelberg game where actions are taken in a sequential way in the two layers of the hierarchy, but independently leaders and followers are involved non-cooperativelyin two (Nash) games where actions are taken simultaneously. This game considers an ex-ante incentive-compatible mechanism, which in equilibrium maximizes the reward while the agents are learning their actions over a countable number of periods. The formulation of the problem considers a Bayesian–Stackelberg equilibrium in the context of Reinforcement Learning. We propose an algorithm supported by the extraproximal method and show that it converges. The Tikhonov’s regularization technique is employed for ensuring the existence and uniqueness of the Bayesian–Stackelberg equilibrium. We show and guarantee the convergence of the method to a single incentive-compatible mechanism. We derive the analytical expressions for computing the mechanism in a Stackelberg game, which is one of the main results of this work. We demonstrate the efficiency of the method by an experiment drawn from an electric power problem represented by an oligopolistic market structure dominated by a small number of large sellers (oligopolists).
Similar content being viewed by others
References
Antipin AS (2005) An extraproximal method for solving equilibrium programming problems and games. Computational Mathematics and Mathematical Physics 45(11), 1893–1914
Asiain E, Clempner JB, Poznyak AS (2019) Controller exploitation-exploration: A reinforcement learning architecture. Soft Comput 23(11):3591–3604
Athey S, Segal I (2013) An efficient dynamic mechanism. Econometrica 81(6), 2463–2485
Baron D, Besanko D (1984) Regulation and information in a continuing relationship. Information Economics and Policy 1:447–470
Battaglini M (2005) Long-term contracting with Markovian consumers. Am Econ Rev 95(3):637–658
Bergemann D, Said M (2011) Wiley Encyclopedia Of Operations Research And Management Science. Wiley, Hoboken, pp 1511–1522
Bergemann D, Välimäki J (2008) New Palgrave Dictionary of Economics. Palgrave Macmillan, chap Bandit Problems, New York, pp 336–340
Bergemann D, Välimäki J (2010) The dynamic pivot mechanism. Econometrica 78(2), 771–789
Berry CA, Hobbs BF, Meroney WA, O’Neill RP, Stewart WR Jr (1999) Understanding how market power can arise in network competition: a game theoretic approach. Utili Policy 8:139–158
Besanko D (1985) Multiperiod contracts between principal and agent with adverse selection. Economics Letters 17:33–37
Board S (2007) Selling options”, journal of economic theory. Journal of Economic Theory 136:324–340.
Board S, Skrzypacz A (2016) Revenue management with forward-looking buyers forward-looking buyers. Journal of Political Economy. 124(4):1046–1087
Clempner JB, Poznyak AS (2016a) Analyzing an optimistic attitude for the leader firm in duopoly models: A strong stackelberg equilibrium based on a lyapunov game theory approach. Economic Computation & Economic Cybernetics Studies & Research 50(4):41–60
Clempner JB, Poznyak AS (2016b) Convergence analysis for pure stationary strategies in repeated potential games: nash, lyapunov and correlated equilibria. Expert Syst Appl 46:474–484
Clempner JB, Poznyak AS (2018a) A tikhonov regularization parameter approach for solving lagrange constrained optimization problems. Eng Optim 50(11):1996–2012
Clempner JB, Poznyak AS (2018b) A tikhonov regularized penalty function approach for solving polylinear programming problems. J Comput Appl Math 328:267–286
Clempner JB, Poznyak AS (2019a) Observer and control design in partially observable finite Markov chains. Automatica (To be publushed)
Clempner JB, Poznyak AS (2019b) Observer and control design in partially observable finite markov chains. Automatica 110:110
Clempner JB, Poznyak AS (2020) A nucleus for bayesian partially observable markov games: Joint observer and mechanism design. Engineering Applications of Artificial Intelligence 95:103876
Clempner JB, Poznyak AS (2021) Analytical method for mechanism design in partiallyobservable markov games. Mathematics (To be published)
Courty P, Li H (2000) Sequential screening. Review of Economic Studies 67:697–717
Esö P, Szentes B (2007) Optimal information disclosure in auctions and the handicap auction. Review of Economic Studies 74(3), 705–731
Garg D, Narahari Y (2008) Mechanism design for single leader stackelberg problems and application to procurement auction design. IEEE Transactions On Automation Science And Engineering 5(3):377–393
Gershkov A, Moldovanu B (2009) Dynamic revenue maximization with heterogenous objects: A mechanism design approach. American Economic Journal: Microeconomics 1(2):168–198
Golosov M, Skreta V, Tsyvinski A, Wilson A (2014) Dynamic strategic information transmission. Journal of Economic Theory 151:304–341
Hartline JD, Lucier B (2015) Non-optimal mechanism design. American Economic Review 105(10):3102–3124
Hobbs B, Metzler C, Pang J (2000) Strategic gaming analysis for electric power networks: an mpec approach. IEEE Trans Power Syst 15:638–645
Hu M, Fukushima M (2011) Variational inequality formulation of a class of multi-leader-follower games. Journal of Optimization Theory and Applications 151:455–473
Hurwicz L (1960) Optimality and informational efficiency in resource allocation processes. In: Arrow KJ, Karlin S, Suppes P (eds) Mathematical methods in the social sciences. Stanford University Press, California, pp 27–46
Kakade S, Lobel I, Nazerzadeh H (2013) Optimal dynamic mechanism design and the virtual-pivot mechanism. Operations Research 61(4), 837–854
Myerson RB (1983) Mechanism design by an informed principal. Econometrica 51(6), 1767–1797
Myerson RB (1989) Allocation, information and markets, The New Palgrave. Palgrave Macmillan, London, pp 191–206
Pang J, Fukushima M (2005) Quasi-variational inequalities, generalized nash equilibria, and multileader-follower games. Computational Management Science 2:21–56
Pavan A, Segal I, Toikka J (2014) Dynamic mechanism design: A myersonian approach. Econometrica 82(2):601–653
Saari DG (1988) On the types of information and mechanism design. Journal of Computational and Applied Mathematics 22(2–3), 231–242
Solis C, Clempner JB, Poznyak AS (2019) Robust extremum seeking for a second order uncertain plant using a sliding mode controller. International Journal of Applied Mathematics and Computer Science 29(4), 703–712
Trejo KK, Clempner JB, Poznyak AS (2015a) Computing the stackelberg/nash equilibria using the extraproximal method: Convergence analysis and implementation details for markov chains games. Int J Appl Math Comput Sci 25(2):337–351
Trejo KK, Clempner JB, Poznyak AS (2015b) A stackelberg security game with random strategies based on the extraproximal theoretic approach. Eng Appl Artif Intell 37:145–153
Wang X, Chin KS, Yin H (2011) Design of optimal double auction mechanism with multi-objectives. Expert Systems with Applications 38:13749–13756
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jorge Zubelli.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Proof of Theorem 2
A Proof of Theorem 2
.
Following (Antipin 2005; Trejo et al. 2015a), let us consider \(\eta =\gamma , \text { } z={\tilde{w}},\text { }x= {\tilde{v}}_{n}\) and \(z^{*} =\hat{v}_{n}\)
we obtain
Now, considering \(\eta =\gamma , \small z={\tilde{w}},\text { }x={\tilde{v}}_{n}\) and \( z^{*}={\tilde{v}}_{n+1}\)
we obtain
Choosing \({{\tilde{w}}}={\tilde{v}}_{n+1}\) in (34) and \({{\tilde{w}} }= \hat{v}_{n}\) in (35) we obtain
Adding (36) with (37) and considering \( {\tilde{w}}+h={\tilde{v}}_{n+1}\text { }{\tilde{w}}=\hat{v}_{n},\text { }{\tilde{v}}+q={\tilde{v}}_{n},\text { }{\tilde{v}}=\hat{ v}_{n}, h={\tilde{v}}_{n+1}-\hat{v}_{n}\) and \(q={\tilde{v}}_{n} -\hat{v}_{n}\) then, we have
which implies
Now, considering \({\tilde{w}}={\tilde{v}}_{n+1}\) in (34) and \({\tilde{w}}= {\tilde{v}}_{\delta }^{*}\) in (35) we have
Adding the previous inequalities and multiplying by two we obtain
Adding and subtracting the term \({\mathcal {L}}_{\delta }(\hat{v}_{n}, \hat{v}_{n})\) we get
Considering \({\tilde{w}}+h={\tilde{v}}_{n+1}\), \({\tilde{w}}=\hat{v}_{n}\), \({\tilde{v}}+k= {\tilde{v}}_{n}\) and \({\tilde{v}}=\hat{v}_{n}\) we have \(h={\tilde{v}}_{n+1}- \hat{v}_{n}\) and \(k={\tilde{v}}_{n}-\hat{v}_{n}\), then the resulting equation is as follows
Using Eq. (38) in the last term on the left-hand side and given the strict convexity of \( {\mathcal {L}}_{\delta }\) where
we obtain
Applying the identity \(2\langle a-c,c-b\rangle = \Vert a-b\Vert ^{2}- \Vert a-c\Vert ^{2} -\Vert c-b\Vert ^{2}\) with \(a=\hat{v}_{n}\), \(b={\tilde{v}}_{\delta }^{*}\) and \(c={\tilde{v}}_{n},\) to the left-hand side of the last inequality we have
Let \(p=1+2 \gamma \delta -2 \gamma ^{2}L^{2}\) and considering the square form of the third and fourth terms yields
and
considering \(k=1-2\gamma \delta +\dfrac{(2\gamma \delta )^{2}}{p}\in \left( 0,1\right) \) we have that
q.e.d..
Rights and permissions
About this article
Cite this article
Clempner, J.B. A Markovian Stackelberg game approach for computing an optimal dynamic mechanism. Comp. Appl. Math. 40, 186 (2021). https://doi.org/10.1007/s40314-021-01578-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40314-021-01578-4
Keywords
- Dynamic mechanism design
- Incentive-compatible mechanisms
- Stackelberg games with private information
- Bayesian equilibrium
- Markov games