当前位置: X-MOL 学术AI EDAM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer
AI EDAM ( IF 2.1 ) Pub Date : 2020-03-16 , DOI: 10.1017/s0890060420000141
Xiongqing Liu , Yan Jin

Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.

中文翻译:

基于强化学习的碰撞避免:奖励函数和知识转移的影响

在不可预测的环境中避免机器人和车辆的碰撞是一项具有挑战性的任务。已经为智能体(即机器人或车辆)开发了各种控制策略,以感知环境、评估情况并选择最佳动作以避免碰撞并完成其任务。在我们对自主船舶的研究中,我们采用机器学习方法来避免碰撞。由于缺乏人类船长可用的船舶操舵数据,因此有必要通过强化学习(RL)来获取避碰知识。鉴于学习到的神经网络往往是一个黑匣子,因此希望有一种方法可以用来设计智能体的行为,以便捕获所需的知识。此外,具有复杂任务的强化学习可能既耗时又不可行。需要一种多阶段学习方法,其中代理可以从简单的任务中学习,然后将他们学到的知识转移到密切相关但更复杂的任务中。在本文中,我们探索了通过调整奖励函数来设计代理行为的方法,并设计了一种用于多阶段知识获取的迁移强化学习方法。基于计算机模拟的智能体训练结果表明,了解每个组件在奖励函数中的作用以及传输 RL 中的各种设计参数非常重要。这些参数的设置都取决于任务的复杂性和它们之间的相似性。我们探索了通过调整奖励函数来设计代理行为的方法,并设计了一种用于多阶段知识获取的迁移强化学习方法。基于计算机模拟的智能体训练结果表明,了解每个组件在奖励函数中的作用以及传输 RL 中的各种设计参数非常重要。这些参数的设置都取决于任务的复杂性和它们之间的相似性。我们探索了通过调整奖励函数来设计代理行为的方法,并设计了一种用于多阶段知识获取的迁移强化学习方法。基于计算机模拟的智能体训练结果表明,了解每个组件在奖励函数中的作用以及传输 RL 中的各种设计参数非常重要。这些参数的设置都取决于任务的复杂性和它们之间的相似性。
更新日期:2020-03-16
down
wechat
bug