Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning,Journal of Intelligent & Robotic Systems

当前位置： X-MOL 学术 › J. Intell. Robot. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning
Journal of Intelligent & Robotic Systems ( IF 3.1 ) Pub Date : 2021-11-10 , DOI: 10.1007/s10846-021-01491-2
Bartomeu Rubí ₁ , Bernardo Morcego ₁ , Ramon Pérez ₁

Affiliation

A deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

中文翻译：

Quadrotor 路径跟随和反应性避障与深度强化学习

本文提出了一种解决四旋翼飞行器路径跟随和避障问题的深度强化学习方法。该问题通过两个代理解决：一个用于路径跟踪任务，另一个用于避障任务。提出了一种新的结构，其中由避障代理计算的动作成为路径跟随代理的状态。与传统的深度强化学习方法相比，所提出的方法可以解释训练过程的结果，速度更快，并且可以在真正的四旋翼飞行器上安全地进行训练。两个代理都实现了深度确定性策略梯度算法。路径跟随代理是在之前的工作中开发的。避障代理使用低成本激光雷达提供的信息来检测车辆周围的障碍物。由于 LIDAR 的视野很窄，因此开发了一种为代理提供先前看到的障碍物的记忆的方法。给出了定义状态向量、奖励函数和这个代理的动作的过程的详细描述。代理在 python/tensorflow 中编程，并在 RotorS/gazebo 平台上进行训练和测试。仿真结果证明了所提出方法的有效性。

更新日期：2021-11-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11