Regularized stochastic team problems

https://doi.org/10.1016/j.sysconle.2021.104876Get rights and content

Abstract

In this paper, we introduce regularized stochastic team problems. Under mild assumptions, we prove that there exists a unique fixed point of the best response operator, where this unique fixed point is the optimal regularized team decision rule. Then, we establish an asynchronous distributed algorithm to compute this optimal strategy. We also provide a bound that shows how the optimal regularized team decision rule performs in the original stochastic team problem.

Introduction

Team decision theory has been introduced by Marschak [1] to study decisions of agents that are acting collectively based on their private information to optimize a common reward function. Radner [2] proved fundamental results for static teams and in particular established connections between Nash equilibrium and team-optimality. Witsenhausen’s seminal papers [3], [4], [5], [6], [7], [8] on characterization and classification of information structures have been crucial in the progress of our understanding of teams. We refer the reader to [9] for a more comprehensive overview of team decision theory and a detailed literature review.

In teams, due to its decentralized nature, computing the optimal decision rule is a NP-hard problem [10]. Indeed, even establishing the existence and the structure of optimal policies is a challenging problem. Existence of optimal policies for static teams and a class of sequential dynamic teams has been shown recently in [11], [12], [13]. In the literature, there are mainly three approaches to compute the optimal or sub-optimal team decision rules [14]: (i) the common information approach [15], [16], (ii) the designer’s approach [6], [13], [17], [18], and (iii) the person-by-person approach [2], [19]. In the common information approach, it is assumed that agents share some common information with each other (i.e., delayed observation sharing or periodic observation sharing). Therefore, one can partition the information of each agent into two as the common information and the private information. In other words, there is a coordinator that observes the common information and shares this information with other agents. In the common information approach, the idea is to formulate the problem as a centralized stochastic control problem from the viewpoint of a coordinator. With this viewpoint, one can then use classical stochastic control techniques (such as dynamic programming) to compute the optimal team decision rule.

The designer’s approach is very similar to the common information approach. Namely, although the original problem has decentralized information structure, it is a centralized decision problem from the viewpoint of a system designer that (centrally) chooses the policies of all the agents. Hence, one can obtain a dynamic programming recursion of this centralized decision problem by identifying an appropriate information state for the designer. However, in this approach, the action space of the designer is in general too large, and so, computing the optimal policy is mostly unfeasible.

The person-by-person approach is the technique that isadopted from game theory for computing the Nash equilibrium. This approach can be described as follows. Fix policies of all agents except Agent i and consider the sub-problem of optimally choosing the best policy of Agent i against the policies of other agents. This is indeed a centralized control problem as the policies of other agents are fixed. Hence, one can use classical stochastic control techniques to arrive at the best response policy of Agent i. Iterating in this manner for each agent, the computed policies eventually converge to the Nash equilibrium. However, although the optimal team decision rule is a Nash equilibrium, in team problems, there are in general more than one Nash equilibrium (sometimes we may have infinitely many Nash equilibria), and so, this procedure mostly converges to a sub-optimal team decision rule (see [20] and [9, Section 2.6] for the conditions of existence of the unique Nash equilibrium). Note that if one introduce some regularization term to the reward function, it is possible to prove that there exists a unique Nash equilibrium. If the optimal team decision rule exists, this unique Nash equilibrium is also the unique optimal team decision rule. Therefore, the person-by-person approach converges to the optimal team decision rule. This is indeed the approach adapted in this paper.

Note that if there is a misspecification in decentralized control models, above-mentioned algorithms often results in policies that are far from optimal or sub-optimal. This is due to the lack of continuity of the reward function and the optimal policy with respect to the components (i.e., observation channels) of the problem. Therefore, making use of regularization provides a way to overcome this robustness problem. Most recent learning algorithms for control problems also use regularization to increase the robustness, and this regularization is generally established via entropy or relative entropy. We refer the reader to [21] for an exhaustive review of the literature on regularized centralized stochastic control problems and [22] for a general framework on entropy-regularized centralized stochastic control problems.

In this paper, we introduce regularized team problems with finite observation and action spaces, analogous to regularized centralized stochastic control problems. We introduce regularization as an additive term to the reward function. We then define best response operator on the set of policies as described above and prove that it has a unique fixed point, where this unique fixed point is the unique optimal team decision rule. Then, we establish an asynchronous distributed algorithm to compute this optimal policy. We also provide a bound that shows how the optimal regularized team decision rule performs in the original team problem. Therefore, the solution of the regularized team problem provides an upper bound and a lower bound to the original team problem, which can be used to analyze the performance of certain numerical algorithms developed in the literature.

The paper is organized as follows. In Section 2, we formulate classical stochastic dynamic team model and its static reduction. In Section 3, we introduce regularized stochastic teams. In Section 4, we define the best response operator for regularized teams and prove the existence of optimal regularized team decision rules via establishing uniqueness of the fixed point of the best response operator. In Section 5, we propose an asynchronous distributed algorithm to compute the optimal regularized team decision rule. Section 7 concludes the paper.

Section snippets

Unregularized stochastic team model

To model unregularized stochastic teams, we use Witsenhausen’s intrinsic model [4]. In this model, we have the following components: {(X,X),P,Ui,Yi,i=1,,N}where (X,X) is a Borel space (Borel subset of complete and separable metric space) denoting the state space, the finite sets1

Regularized stochastic team model

In this section, we introduce regularized version of thestochastic team model in Section 2.2. Indeed, the only difference between that model and the regularized one is the reward function. The rest of the components and the definitions are the same. To define regularized reward function, for each Agent i, let Ωi:ΔiR be a ρi-strongly convex function with respect to l1-norm 1. In regularized stochastic team model, the reward function is given by Rreg(y,δ)R(y,δ)i=1NΩi(δi).Here, i=1NΩi(δi)

Best response operator

In this section, we introduce best response operator and establish that there exists a unique fixed point of this operator, where this unique fixed point is proved to be optimal regularized team decision rule.

For each j=1,,N, we define supy,ujsupujr(y,u)infujr(y,u)λj(r),where uj(ui)ij. Here, λj(r) gives the local oscillation of the function r with respect to uj.

Lemma 1

Given y, we have the following bound |R(y,δ)R(y,ξ)|i=1Nλi(r)2δiξi1.

Proof

The proof is given in Appendix A.1. 

Note that one can

Asynchronous iterative algorithm

In this section, we propose an asynchronous iterative algorithm for computing the optimal team decision rule γ and prove its convergence. This algorithm was first introduced in [30] to find fixed points of vector-valued functions. A similar asynchronous iterative algorithm was introduced in [31] to compute Nash equilibrium in games.

In this algorithm, at each iteration, Agent j can be in one of three possible states {compute,transmit,idle}. In the compute state, Agent j computes a new policy γj

Extension to continuous observation spaces

Since the main motivation of the paper is to compute the optimal regularized team decision rule, we therefore assume that the observation spaces are finite. However, one can do the same analysis for the stochastic teams with Borel observation spaces {Yi,i=1,,N} under the following absolute continuity conditions on the observation channels:

  • (AC)

    For each i=1,,N, there exists a probability measure πi on Yi such that Wi(dyi|x,u(1:i1)) is absolutely continuous with respect to πi for any (x,u(1:i1)),

Conclusion

In this paper, we introduced regularized stochastic team problems. We established that the best response operator has a unique fixed point and this unique fixed point is the optimal regularized team decision rule. Then, we introduced an asynchronous iterative algorithm for the computation of this unique fixed point.

One interesting future direction is to study regularizedstochastic team problems with abstract observation and action spaces. In this case, to obtain similar results, one needs to

CRediT authorship contribution statement

Naci Saldi: Conceptualization, Methodology.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The author is grateful to Professor Serdar Yüksel and Tamer Başar for their constructive comments.

References (33)

  • LiS. et al.

    Distributed algorithms for the computation of noncooperative equilibria

    Automatica

    (1987)
  • MarschakJ.

    Elements for a theory of teams

    Manage. Sci.

    (1955)
  • RadnerR.

    Team decision problems

    Ann. Math. Stat.

    (1962)
  • WitsenhausenH.S.

    Separation of estimation and control for discrete time systems

    Proc. IEEE

    (1971)
  • WitsenhausenH.S.

    The intrinsic model for discrete stochastic control: Some open problems

    Lecture Notes in Econom. and Math. Systems

    (1975)
  • WitsenhausenH.S.

    Equivalent stochastic control problems

    Math. Control Signals Syst.

    (1988)
  • WitsenhausenH.S.

    A standard form for sequential stochastic control

    Math. Syst. Theory

    (1973)
  • WitsenhausenH.S.

    On information structures, feedback and causality

    SIAM J. Control

    (1971)
  • WitsenhausenH.S.

    A counterexample in stochastic optimum control

    SIAM J. Control Optim.

    (1968)
  • YükselS. et al.

    Stochastic Networked Control Systems: Stabilization and Optimization under Information Constraints

    (2013)
  • PapadimitriouC.H. et al.

    Intractable problems in control theory

    SIAM J. Control Optim.

    (1986)
  • GuptaA. et al.

    On the existence of optimal policies for a class of static and sequential dynamic teams

    SIAM J. Control Optim.

    (2015)
  • SaldiN.

    A topology for team policies and existence of optimal team policies in stochastic team theory

    IEEE Trans. Automat. Control

    (2020)
  • YükselS.

    A universal dynamic program and refined existence results for decentralized stochastic control

    SIAM J. Control Optim.

    (2020)
  • A. Mahajan, N.C. Martins, M. Rotkowitz, S. Yüksel, Information structures in optimal decentralized control, in: IEEE...
  • NayyarA. et al.

    The common-information approach to decentralized stochastic control

  • Cited by (1)

    View full text