当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement learning autonomously identifying the source of errors for agents in a group mission
arXiv - CS - Multiagent Systems Pub Date : 2021-07-20 , DOI: arxiv-2107.09232
Keishu Utimula, Ken-taro Hayaschi, Kousuke Nakano, Kenta Hongo, Ryo Maezono

When agents are swarmed to carry out a mission, there is often a sudden failure of some of the agents observed from the command base. It is generally difficult to distinguish whether the failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) solely by the communication between the command base and the concerning agent. By making a collision to the agent by another, we would be able to distinguish which hypothesis is likely: For $h_a$, we expect to detect corresponding displacements while for $h_a$ we do not. Such swarm strategies to grasp the situation are preferably to be generated autonomously by artificial intelligence (AI). Preferable actions ($e.g.$, the collision) for the distinction would be those maximizing the difference between the expected behaviors for each hypothesis, as a value function. Such actions exist, however, only very sparsely in the whole possibilities, for which the conventional search based on gradient methods does not make sense. Instead, we have successfully applied the reinforcement learning technique, achieving the maximization of such a sparse value function. The machine learning actually concluded autonomously the colliding action to distinguish the hypothesises. Getting recognized an agent with actuator error by the action, the agents behave as if other ones want to assist the malfunctioning one to achieve a given mission.

中文翻译:

强化学习自主识别团队任务中代理的错误来源

当代理蜂拥而至执行任务时,从指挥基地观察到的一些代理经常会突然失败。通常很难仅通过命令库和相关代理之间的通信来区分故障是由执行器(假设,$h_a$)还是传感器(假设,$h_s$)引起的。通过另一个代理与代理发生碰撞,我们将能够区分哪个假设是可能的:对于 $h_a$,我们期望检测到相应的位移,而对于 $h_a$ 我们没有。这种掌握情况的群体策略最好由人工智能 (AI) 自主生成。区分的优选动作($eg$,碰撞)是最大化每个假设的预期行为之间的差异的动作,作为价值函数。有这样的动作,然而,在整个可能性中只是非常稀疏,对于这种基于梯度方法的常规搜索没有意义。相反,我们成功地应用了强化学习技术,实现了这种稀疏值函数的最大化。机器学习实际上自动结束了碰撞动作以区分假设。通过动作识别出执行器错误的代理,代理的行为就好像其他代理想要帮助出现故障的代理完成给定的任务一样。机器学习实际上自动结束了碰撞动作以区分假设。通过动作识别出执行器错误的代理,代理的行为就好像其他代理想要帮助出现故障的代理完成给定的任务一样。机器学习实际上自动结束了碰撞动作以区分假设。通过动作识别出执行器错误的代理,代理的行为就好像其他代理想要帮助出现故障的代理完成给定的任务一样。
更新日期:2021-07-21
down
wechat
bug