Intelligent Knowledge Distribution: Constrained-Action POMDPs for Resource-Aware Multiagent Communication,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Intelligent Knowledge Distribution: Constrained-Action POMDPs for Resource-Aware Multiagent Communication
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 8-11-2020 , DOI: 10.1109/tcyb.2020.3009016
Michael C. Fowler ₁ , T. Charles Clancy ₂ , Ryan K. Williams ₃

Affiliation

This article addresses a fundamental question of multiagent knowledge distribution: what information should be sent to whom and when with the limited resources available to each agent? Communication requirements for multiagent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multiagent coordination on networked systems, for example, power and bandwidth, this article introduces two concepts for the partially observable Markov decision processes (POMDPs): 1) action-based constraints that yield constrained-action POMDPs (CA-POMDPs) and 2) soft probabilistic constraint satisfaction for the resulting infinite-horizon controllers. To enable constraint analysis over an infinite horizon, an unconstrained policy is first represented as a finite-state controller (FSC) and optimized with policy iteration. The FSC representation then allows for a combination of the Markov chain Monte Carlo and discrete optimization to improve the probabilistic constraint satisfaction of the controller while minimizing the impact on the value function. Within the CA-POMDP framework, we then propose intelligent knowledge distribution (IKD) which yields per-agent policies for distributing knowledge between agents subject to interaction constraints. Finally, the CA-POMDP and IKD concepts are validated using an asset tracking problem where multiple unmanned aerial vehicles (UAVs) with heterogeneous sensors collaborate to localize a ground asset to assist in avoiding unseen obstacles in a disaster area. The IKD model was able to maintain asset tracking through multiagent communications while only violating soft power and bandwidth constraints 3% of the time, while greedy and naive approaches violated constraints more than 60% of the time.

中文翻译：

智能知识分发：用于资源感知多代理通信的约束动作 POMDP

本文解决了多智能体知识分发的一个基本问题：在每个智能体可用的资源有限的情况下，应在何时将哪些信息发送给谁？当必须维护环境的准确图像和其他代理的状态时，多代理系统的通信要求可能相当高。为了减少多智能体协调对网络系统的影响，例如功率和带宽，本文引入了部分可观察马尔可夫决策过程（POMDP）的两个概念：1）基于动作的约束，产生约束动作POMDP（CA-POMDP） ) 和 2) 所得到的无限范围控制器的软概率约束满足。为了在无限范围内进行约束分析，无约束策略首先被表示为有限状态控制器 (FSC)，并通过策略迭代进行优化。然后，FSC 表示允许将马尔可夫链蒙特卡罗和离散优化相结合，以提高控制器的概率约束满足度，同时最小化对价值函数的影响。在 CA-POMDP 框架内，我们提出了智能知识分配（IKD），它产生每个代理策略，用于在受交互约束的代理之间分配知识。最后，使用资产跟踪问题验证了 CA-POMDP 和 IKD 概念，其中具有异构传感器的多架无人机 (UAV) 协作定位地面资产，以帮助避开灾区中看不见的障碍物。 IKD 模型能够通过多代理通信来维持资产跟踪，同时仅在 3% 的时间内违反软实力和带宽限制，而贪婪和幼稚的方法在超过 60% 的时间内违反了限制。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11