当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement
Autonomous Agents and Multi-Agent Systems ( IF 2.0 ) Pub Date : 2020-06-10 , DOI: 10.1007/s10458-020-09467-6
Mikko Lauri , Joni Pajarinen , Jan Peters

Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest when constant communication cannot be assumed. This is common in tasks involving information gathering with multiple independently operating sensor devices that may operate over large physical distances, such as unmanned aerial vehicles, or in communication limited environments such as in the case of autonomous underwater vehicles. In this paper, we frame the information gathering task as a general decentralized partially observable Markov decision process (Dec-POMDP). The Dec-POMDP is a principled model for co-operative decentralized multi-agent decision-making. An optimal solution of a Dec-POMDP is a set of local policies, one for each agent, which maximizes the expected sum of rewards over time. In contrast to most prior work on Dec-POMDPs, we set the reward as a non-linear function of the agents’ state information, for example the negative Shannon entropy. We argue that such reward functions are well-suited for decentralized information gathering problems. We prove that if the reward function is convex, then the finite-horizon value function of the Dec-POMDP is also convex. We propose the first heuristic anytime algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving discrete problems an order of magnitude larger than previous state-of-the-art. We also propose an extension to continuous-state problems with finite action and observation spaces by employing particle filtering. The effectiveness of the proposed algorithms is verified in domains such as decentralized target tracking, scientific survey planning, and signal source localization.

中文翻译:

通过策略图改进在离散和连续状态分散POMDP中收集多主体主动信息

当无法假定持续通信时,如果部署了多个自治代理来收集有关感兴趣现象的数据,则需要用于信息收集的分散策略。这在涉及与可能在较大物理距离上操作的多个独立操作的传感器设备(例如无人飞行器)或在通信受限的环境中(例如自动水下机器人的情况)中涉及信息收集的任务中很常见。在本文中,我们将信息收集任务框架化为一般的分散的,部分可观察的马尔可夫决策过程(Dec-POMDP)。Dec-POMDP是合作分散多主体决策的原则模型。Dec-POMDP的最佳解决方案是一组本地策略,每个代理一个,随着时间的推移,这将最大化预期的奖励总和。与大多数先前关于Dec-POMDP的工作相反,我们将奖励设置为代理状态信息的非线性函数,例如负Shannon熵。我们认为这种奖励功能非常适合分散式信息收集问题。我们证明如果奖励函数是凸的,那么Dec-POMDP的有限水平值函数也是凸的。我们提出了第一个启发式随时随地算法来收集Dec-POMDPs信息,并通过解决离散问题的经验证明了其有效性,该离散问题的数量级比以前的最新技术要大​​。我们还建议通过使用粒子滤波来扩展具有有限作用和观测空间的连续状态问题。
更新日期:2020-06-10
down
wechat
bug