Motion control of unmanned underwater vehicles via deep imitation reinforcement learning algorithm,IET Intelligent Transport Systems

当前位置： X-MOL 学术 › IET Intell. Transp. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Motion control of unmanned underwater vehicles via deep imitation reinforcement learning algorithm
IET Intelligent Transport Systems ( IF 2.3 ) Pub Date : 2020-06-26 , DOI: 10.1049/iet-its.2019.0273
Zhenzhong Chu ₁ , Bo Sun ₁ , Daqi Zhu ₁ , Mingjun Zhang ₂ , Chaomin Luo ₃

Affiliation

In this study, a motion control algorithm based on deep imitation reinforcement learning is proposed for the unmanned underwater vehicles (UUVs). The algorithm is called imitation learning (IL) twin delay deep deterministic policy gradient (DDPG) (TD3). It combines IL with DDPG (TD3). In order to accelerate the training process of reinforcement learning, the supervised learning method is used in IL for behaviour cloning from the closed-loop control data. The deep reinforcement learning employs actor–critic architecture. The actor part executes the control strategy and the critic part evaluates current control strategy. The training efficiency of IL-TD3 is compared with DDPG and TD3. The simulation results show that the training results of IL-TD3 converge faster and the training process is more stable than both of them, the convergence rate of IL-TD3 algorithm during training is about double that of DDPG and TD3. The control performance via IL-TD3 is superior to PID in UUVs motion control tasks. The average track error of IL-TD3 is reduced by than PID control. The average tracking error under thruster fault is almost the same as under normal condition.

中文翻译：

深度模仿强化学习算法的无人水下航行器运动控制

在这项研究中，提出了一种基于深度模仿强化学习的运动控制算法，用于无人水下航行器。该算法称为模仿学习（IL）双延迟深度确定性策略梯度（DDPG）（TD3）。它结合了IL和DDPG（TD3）。为了加快强化学习的训练过程，在IL中使用监督学习方法从闭环控制数据克隆行为。深度强化学习采用演员批评体系。演员部分执行控制策略，评论者部分评估当前的控制策略。将IL-TD3的训练效率与DDPG和TD3进行比较。仿真结果表明，IL-TD3的训练结果收敛速度更快，训练过程比两者都更稳定，在训练过程中，IL-TD3算法的收敛速度约为DDPG和TD3的两倍。在UUV运动控制任务中，通过IL-TD3的控制性能优于PID。IL-TD3的平均跟踪误差降低了比PID控制。推进器故障时的平均跟踪误差与正常情况下几乎相同。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11