当前位置: X-MOL 学术SIAM J. Control Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Universal Dynamic Program and Refined Existence Results for Decentralized Stochastic Control
SIAM Journal on Control and Optimization ( IF 2.2 ) Pub Date : 2020-09-08 , DOI: 10.1137/18m1221382
Serdar Yüksel

SIAM Journal on Control and Optimization, Volume 58, Issue 5, Page 2711-2739, January 2020.
For sequential stochastic control problems with standard Borel measurement and control action spaces, we introduce a general (universally applicable) dynamic programming formulation, establish its well-posedness, and provide new existence results for optimal policies. Our dynamic program builds in part on Witsenhausen's standard form, but with a different formulation for the state, action, and transition dynamics. Using recent results on measurability properties of strategic measures in decentralized control, we obtain a standard Borel controlled Markov model. This allows for a well-defined dynamic programming recursion through universal measurability properties of the value functions for each time stage. In addition, new existence results are obtained for optimal policies in decentralized stochastic control. These state that for a static team with independent measurements, it suffices for the cost function to be continuous (only) in the actions for the existence of an optimal policy under mild compactness (or tightness) conditions. These also apply to dynamic teams which admit static reductions with independent measurements through a change of measure transformation. We show through a counterexample that weaker conditions may not lead to the existence of an optimal team policy. The paper's existence results generalize those previously reported in the literature. A summary of and comparison with previously reported results and some applications are presented.


中文翻译:

分散随机控制的通用动态程序和精确存在的结果

SIAM控制与优化杂志,第58卷,第5期,第2711-2739页,2020年1月。
对于具有标准Borel测量和控制动作空间的连续随机控制问题,我们介绍了一种通用的(通用的)动态规划公式,建立了适当的状态,并为最优策略提供了新的存在结果。我们的动态程序部分基于Wissenhausen的标准格式,但是针对状态,动作和过渡动态采用了不同的表述。使用关于分散控制中战略措施的可测量性特性的最新结果,我们获得了标准的Borel控制的Markov模型。这样就可以通过每个时间段的值函数的通用可测量性来进行明确定义的动态编程递归。另外,获得了新的存在性结果,用于分散随机控制中的最优策略。这些说明对于具有独立度量的静态团队而言,在温和的紧迫性(或紧密度)条件下,存在最优策略的行动中,成本函数是连续的(仅)就足够了。这些也适用于动态团队,这些团队允许通过更改度量转换来进行独立度量的静态缩减。我们通过一个反例表明,条件较弱可能不会导致最佳团队政策的存在。本文的存在结果概括了先前文献中所报道的结果。介绍了以前报告的结果以及与之比较的摘要。这些也适用于动态团队,这些团队允许通过更改度量转换来进行独立度量的静态缩减。我们通过一个反例表明,条件较弱可能不会导致最佳团队政策的存在。本文的存在结果概括了先前文献中所报道的结果。介绍了以前报告的结果以及与之比较的摘要。这些也适用于动态团队,这些团队允许通过更改度量转换来进行独立度量的静态缩减。我们通过一个反例表明,条件较弱可能不会导致最佳团队政策的存在。本文的存在结果概括了先前文献中所报道的结果。介绍了以前报告的结果以及与之比较的摘要。
更新日期:2020-09-08
down
wechat
bug