当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Herd Agents Amongst Obstacles: Training Robust Shepherding Behaviors using Deep Reinforcement Learning
arXiv - CS - Robotics Pub Date : 2020-05-19 , DOI: arxiv-2005.09476
Jixuan Zhi and Jyh-Ming Lien

Robotic shepherding problem considers the control and navigation of a group of coherent agents (e.g., a flock of bird or a fleet of drones) through the motion of an external robot, called shepherd. Machine learning based methods have successfully solved this problem in an empty environment with no obstacles. Rule-based methods, on the other hand, can handle more complex scenarios in which environments are cluttered with obstacles and allow multiple shepherds to work collaboratively. However, these rule-based methods are fragile due to the difficulty in defining a comprehensive set of rules that can handle all possible cases. To overcome these limitations, we propose the first known learning-based method that can herd agents amongst obstacles. By using deep reinforcement learning techniques combined with the probabilistic roadmaps, we train a shepherding model using noisy but controlled environmental and behavioral parameters. Our experimental results show that the proposed method is robust, namely, it is insensitive to the uncertainties originated from both environmental and behavioral models. Consequently, the proposed method has a higher success rate, shorter completion time and path length than the rule-based behavioral methods have. These advantages are particularly prominent in more challenging scenarios involving more difficult groups and strenuous passages.

中文翻译:

学习在障碍中放牧代理:使用深度强化学习训练稳健的牧羊行为

机器人牧羊问题考虑通过外部机器人(称为牧羊人)的运动对一组连贯的代理(例如,一群鸟或一群无人机)进行控制和导航。基于机器学习的方法在没有障碍的空旷环境中成功解决了这个问题。另一方面,基于规则的方法可以处理更复杂的场景,其中环境杂乱无章,并允许多个牧羊人协同工作。然而,这些基于规则的方法很脆弱,因为很难定义一套可以处理所有可能情况的综合规则。为了克服这些限制,我们提出了第一个已知的基于学习的方法,该方法可以在障碍之间召集代理。通过使用深度强化学习技术结合概率路线图,我们使用嘈杂但受控的环境和行为参数训练牧羊模型。我们的实验结果表明,所提出的方法是稳健的,即它对源自环境和行为模型的不确定性不敏感。因此,与基于规则的行为方法相比,所提出的方法具有更高的成功率、更短的完成时间和路径长度。这些优势在涉及更困难群体和艰苦通道的更具挑战性的场景中尤为突出。比基于规则的行为方法更短的完成时间和路径长度。这些优势在涉及更困难群体和艰苦通道的更具挑战性的场景中尤为突出。比基于规则的行为方法更短的完成时间和路径长度。这些优势在涉及更困难群体和艰苦通道的更具挑战性的场景中尤为突出。
更新日期:2020-05-20
down
wechat
bug