当前位置: X-MOL 学术Entropy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Forward and Backward Bellman Equations Improve the Efficiency of the EM Algorithm for DEC-POMDP
Entropy ( IF 2.1 ) Pub Date : 2021-04-29 , DOI: 10.3390/e23050551
Takehiro Tottori , Tetsuya J. Kobayashi

Decentralized partially observable Markov decision process (DEC-POMDP) models sequential decision making problems by a team of agents. Since the planning of DEC-POMDP can be interpreted as the maximum likelihood estimation for the latent variable model, DEC-POMDP can be solved by the EM algorithm. However, in EM for DEC-POMDP, the forward–backward algorithm needs to be calculated up to the infinite horizon, which impairs the computational efficiency. In this paper, we propose the Bellman EM algorithm (BEM) and the modified Bellman EM algorithm (MBEM) by introducing the forward and backward Bellman equations into EM. BEM can be more efficient than EM because BEM calculates the forward and backward Bellman equations instead of the forward–backward algorithm up to the infinite horizon. However, BEM cannot always be more efficient than EM when the size of problems is large because BEM calculates an inverse matrix. We circumvent this shortcoming in MBEM by calculating the forward and backward Bellman equations without the inverse matrix. Our numerical experiments demonstrate that the convergence of MBEM is faster than that of EM.

中文翻译:

向前和向后的Bellman方程提高了DEC-POMDP EM算法的效率

分散的部分可观察的马尔可夫决策过程(DEC-POMDP)对一组代理商的顺序决策问题进行建模。由于可以将DEC-POMDP的计划解释为潜在变量模型的最大似然估计,因此可以通过EM算法求解DEC-POMDP。但是,在EM-DEC-POMDP中,需要计算无限远的向前-向后算法,这会降低计算效率。在本文中,我们通过将向前和向后的Bellman方程引入到EM中,提出了Bellman EM算法(BEM)和改进的Bellman EM算法(MBEM)。BEM可能比EM更有效,因为BEM会计算无限远范围内的向前和向后Bellman方程,而不是向前向后的算法。然而,当问题的规模很大时,BEM不能总是比EM更有效,因为BEM会计算逆矩阵。我们通过计算不带逆矩阵的前向和后向Bellman方程来规避MBEM中的这一缺点。我们的数值实验表明MBEM的收敛速度比EM快。
更新日期:2021-04-29
down
wechat
bug