Entropy Maximization for Partially Observable Markov Decision Processes,IEEE Transactions on Automatic Control

当前位置： X-MOL 学术 › IEEE Trans. Autom. Control › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Entropy Maximization for Partially Observable Markov Decision Processes
IEEE Transactions on Automatic Control ( IF 6.2 ) Pub Date : 6-16-2022 , DOI: 10.1109/tac.2022.3183564
Yagiz Savas ₁ , Michael Hibbard ₁ , Bo Wu ₁ , Takashi Tanaka ₁ , Ufuk Topcu ₁

Affiliation

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent’s trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. Focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parameteric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In a numerical example, we highlight the benefit of using an entropy-maximizing FSC compared with an FSC that simply finds a feasible policy for accomplishing a task.

中文翻译：

部分可观测马尔可夫决策过程的熵最大化

我们研究合成控制器的问题，该控制器在预期总奖励的约束下最大化部分可观察马尔可夫决策过程（POMDP）的熵。这样的控制器最大限度地减少了外部观察者对代理轨迹的可预测性，同时保证了奖励函数表示的任务的完成。着眼于具有确定性内存转换的有限状态控制器 (FSC)，我们表明 POMDP 的最大熵的下界是由此类 FSC 引起的参数马尔可夫链 (pMC) 的最大熵。这种关系使我们能够将熵最大化问题改写为所谓的诱导 pMC 参数合成问题。然后，我们提出了一种合成 FSC 的算法，该算法在具有相同内存状态数量的 FSC 上局部最大化 POMDP 的熵。在一个数值示例中，我们强调了使用熵最大化 FSC 与仅找到完成任务的可行策略的 FSC 相比的好处。

更新日期：2024-08-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11