Elsevier

Neurocomputing

Volume 411, 21 October 2020, Pages 375-392
Neurocomputing

A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control

https://doi.org/10.1016/j.neucom.2020.05.089Get rights and content

Abstract

Model-free reinforcement learning methods have potentials in ship collision avoidance under unknown environments. To defect the low efficiency problem of the model-free reinforcement learning, a composite learning method is proposed based on an asynchronous advantage actor-critic (A3C) algorithm, a long short-term memory neural network (LSTM) and Q-learning. The proposed method uses Q-learning for adaptive decisions between a LSTM inverse model-based controller and the model-free A3C policy. Multi-ship collision avoidance simulations are conducted to verify the effectiveness of the model-free A3C method, the proposed inverse model-based method and the composite learning method. The simulation results indicate that the proposed composite learning based ship collision avoidance method outperforms the A3C learning method and a traditional optimization-based method.

Introduction

The automation of vehicles is an effective approach to reduce the risk caused by dangerous driving behaviors [20], [71]. Due to the safety requirements of the autonomous surface vessels (ASVs), a collision avoidance system becomes more and more important [4]. With the recent development of artificial intelligence, deep reinforcement learning (DRL) methods [34], [44] have profound effects on complex decision-making tasks like multi-ship collision avoidance, for which the DRL method provides an alternative approach compared with the traditional methods. As a branch of DRL methods, the policy gradient-based methods (e.g., deep deterministic policy gradient (DDPG) [25], asynchronous advantage actor-critic (A3C) [33], proximal policy optimization (PPO) [41], etc) can solve continuous action problems with a deterministic policy parameterized by neural networks. At present, the DRL has made breakthroughs in a variety of domains, such as control [38], path planning [29], collision avoidance [10] and so on. In this section, the existing ship collision avoidance methods are introduced, including the traditional methods and DRL based methods.

Traditional ship collision avoidance methods can be briefly divided to two categories: path generation methods and intelligent optimization methods.

Path generation methods mainly include global grid based methods (e.g., A* [47], [26]) and local path generation methods (e.g., artificial potential field (APF) [62], [63]). As a heuristic search algorithm, A* considers both the origin and the destination, which has a global optimality. Hierarchical planning [53], [8] and additional constraints [11], [48] are the commonly used approaches in A* to improve the search efficiency and path smoothness. Compared with A*, APF [31], [24] has smaller computation and smoother paths by using artificial gravitational and repulsive field to model the navigation environment.

With the development of intelligent optimization algorithms, the optimization based methods, e.g., fuzzy mathematics, neural networks and swarm intelligence, have attracted more attention in ship collision avoidance [49] than before. Fuzzy mathematics has been applied in the fuzzy classification [13], [12] and reasoning [39], [40] of ship collision risk for a long time. Generally, the output of fuzzy mathematics relies on the membership functions set in advance, which needs more prior knowledge. Besides, neural networks [17] is another powerful approach to model the uncertain factors in reasoning the ship collision risk. The neural networks are commonly combined with fuzzy mathematics [2] and expert system (ES) [46] to realize reliable collision avoidance.

Recently, swarm intelligence and evolution optimization methods have become hot topics in ship collision avoidance. Ant colony optimization (ACO) [22], [23], [51] and particle swarm optimization (PSO) [32], [35], [7], [28] are the most commonly used algorithms in ship collision avoidance, which can obtain good results with an appropriate fitness function.

At present, several DRL methods have been applied in ship collision avoidance. The key issues in the DRL methods are the definitions of the state, action and reward function.

Deep Q-network is firstly applied in ship collision avoidance with a low-dimensional state-action space. In Ref. [42], the discrete ship heading changes and a set of distances measured by the fixed interval detection lines around the ship are defined as the actions and states in deep Q-network, respectively. The proposed deep Q-network is verified in both simulations [42] and model ship experiments [43]. In Ref. [65], a so-called constrained deep Q-network is proposed to reduce the complexity of the state-action space by adding constrains based on international regulations for preventing collisions at sea (COLREGs), which also obtains good collision avoidance results.

In addition, using high-complex deep neural networks is an effective approach to train the value function in a high-dimensional state-action space. In Ref. [9], a convolutional neural network (CNN) is used to obtain a more reliable policy. This CNN model uses the perception information, the motion state and the actions of the ship in a certain horizon to produce a chain lumped state matrix for reinforcement learning.

Although deep Q-network based methods have achieved good results, a large memory space is still required for the discrete actions. With the development of policy gradient-based methods (e.g., DDPG, A3C, etc), a deterministic policy from ship states to continuous actions can be obtained with less memory space. In Ref. [19] the DDPG algorithm is applied in a simplified state-action space, in which the vertical distances away from the target course are defined as the continuous actions. In Xu et al. [59], the DDPG algorithm is also applied for ship collision avoidance with the same state-action space defined in Cheng and Zhang [9]. The simulation results have indicated the effectiveness and advantages of DDPG algorithm in continuous decisions for ship collision avoidance.

Although existing DRL methods have obtained rich achievements in ship collision avoidance, the relative low efficiency is the main barrier in application. At present, the asynchronous computing framework (e.g., A3C) and model-based for model-free (i.e., MB-MF) learning can both improve the learning efficiency. To reduce the exploring time and memory costs, an on-policy A3C method [33] updates the obtained gradients through asynchronous parallel computing, which has greater potentials than the experience reply technique [37] in DDPG. For MB-MF learning methods, the main idea is to establish a short-term model-based optimizer for combination with the long-term learner. In Ref. [5], a linear-quadratic regulator (LQR) optimizer is adopted for data-efficient learning in both simulations and real-world experiments. This LQR optimizer uses a time-varying linear-Gaussian (TVLG) model for optimization, which is established by fitting the samples. In Ref. [36], the widely used model predictive control (MPC) approach [70], [67], [68], [69] is applied for model-based learning, which can initialize the networks for accelerating the model-free learning. The model used in MPC is a dynamic model that predicts the state changes over the time step duration.

Surveying from the ship collision avoidance research, DRL methods obtain an optimal policy by maximizing future rewards through interactions, which have better potentials than the traditional methods in uncertain environments. In spite of this, the main drawback in existing DRL based collision avoidance methods is the low learning efficiency problem, especially the off-line methods such as DDPG.

As denoted in the related DRL works, the asynchronous computing and MB-MF learning can be considered to improve the learning efficiency for ship collision avoidance. Motivated by Refs. [33], [5], [36], the main works of this paper are as follows:

  • (1)

    To reduce the learning time and memory costs, the A3C method [33] is applied for ship collision avoidance in this study.

  • (2)

    The MB-MF learning is considered to further improve the learning efficiency. Besides, traditional model-based method [5], [36] needs to establish an accurate model before designing the controller. While the inverse control [55], [3], [14] method directly uses the inverse model between the desired outputs and inputs as the controller, which has more concise strategy and potentials in MB-MF learning.

In order to improve the learning efficiency for ship collision avoidance, a composite learning method is proposed in this study. A simple framework of the proposed composite learning is shown in Fig. 1 for brief explanation. Instead of the simple initialization in Ref. [36], the main idea of this method is to use Q-learning for adaptive decisions between the actions of a model-free A3C and a model-based inverse controller in the entire learning process. The contributions of this study are:

  • (1)

    Compared with traditional ship collision avoidance methods, the proposed method has efficient learning ability by integrating A3C reinforcement learning, which performs better with limited perception and states.

  • (2)

    Compared with pure model-free A3C method, the proposed composite method uses a feed-forward LSTM controller and Q-learning to generate supervised trajectories for A3C, which has higher learning efficiency.

Therefore, in addition to A3C applications, the originality of the proposed composite learning method is reflected in two aspects: the inverse controller for ship collision avoidance and decisions based on Q-learning.

The remainder of this article is organized as follows. In Section 2, the ship hydrodynamic model and collision risk model are described. In Section 3, the A3C learning based ship collision avoidance method is described. In Section 4, the composite learning method is proposed. In Section 5, simulation experiments under multi-ship encounters are carried out to assess the effectiveness of the proposed methods. In Section 6, conclusions and future research are presented.

Section snippets

Ship hydrodynamic model and collision risk model

In consideration of the nonlinear characteristics of ship motion and different collision states (e.g., distance of the closest point of approach (DCPA) and time to the closest point of approach (TCPA)), the three degree-of-freedom (3-DOF) ship hydrodynamic model and the collision risk model are used to calculate the collision risk index (CRI).

A3C learning based ship collision avoidance method

In this section, the basic A3C algorithm is applied to ship collision avoidance by regarding the collision avoidance process as a typical Markov decision-making process (MDP). The definitions of the state and reward function are the key issues in the application of A3C.

The proposed composite Learning method

In this section, a composite learning method is proposed by using Q-learning to make adaptive decisions between the A3C learning and an inverse model-based controller. The inverse controller and Q-learning decisions are described as follows.

Case study

The widely used KVLCC2 ship model [27] is adopted for multi-ship collision avoidance in the following case studies to verify the effectivenesses of the proposed composite learning method. Case studies are conducted from the following aspects for comprehensive verification: 1) Ship collision avoidance based on the basic A3C learning; 2) Comparisons between the proposed method and A3C learning; 3) Comparisons between the proposed method and traditional optimization method.

Conclusions and future research

To realize efficient learning of multi-ship collision avoidance policy, a model based for model-free (MB-MF) composite learning method is proposed in this study. The main originality of the proposed method is to use the LSTM model-based controller to accelerate the model-free A3C learning by adaptive Q-learning decisions in the entire learning process. The following conclusions are drawn from the simulation experiments using the model of KVLCC2 ship:

  • (1)

    In terms of the application of A3C learning

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Shuo Xie: Conceptualization, Methodology, Writing - original draft. Xiumin Chu: Funding acquisition, Supervision. Mao Zheng: Software, Validation. Chenguang Liu: Writing - review & editing.

Acknowledgements

This research is supported by the National Key Research and Development Program of China (No. 2018YFB1600400), the Fundamental Research Funds for the Central Universities (WUT:203144003), the National Natural Science Foundation of China (No. 51709220), the Open Project Program of Fujian University Engineering Research Center of Marine Intelligent Ship Equipment (No. 322031010602), the Project of Science and Technology Bureau of Fuzhou (No. 2018-G-92) and the Key Project of Science and

Shuo Xie is a PhD candidate of Wuhan University of Technology and National Engineering Research Center for Water Transport Safety. He received his master degree majoring in Transportation Engineering in the School of Energy and Power Engineering, Wuhan University of Technology in 2017. He has published more than 10 academic papers including 4 SCI journal papers. His research interests include ship model identification, ship control and collision avoidance.

References (71)

  • Y. Ma et al.

    Multi-objective path planning for unmanned surface vehicle with currents effects

    ISA Transactions

    (2018)
  • W. Naeem et al.

    A reactive COLREGs-compliant navigation strategy for autonomous maritime navigation

    IFAC-PapersOnLine

    (2016)
  • Y.P. Pane et al.

    Reinforcement learning based compensation methods for robot manipulators

    Engineering Applications of Artificial Intelligence

    (2019)
  • H. Shen et al.

    Automatic collision avoidance of multiple ships based on deep Q-learning

    Applied Ocean Research

    (2019)
  • U. Simsir et al.

    Decision support system for collision avoidance of vessels

    Applied Soft Computing Journal

    (2014)
  • Y. Singh et al.

    A constrained A* approach towards optimal path planning for an unmanned surface vehicle in a maritime environment containing dynamic obstacles and ocean currents

    Ocean Engineering

    (2018)
  • R. Song et al.

    Smoothed A* algorithm for practical unmanned surface vehicle path planning

    Applied Ocean Research

    (2019)
  • R. Szlapczynski et al.

    Review of ship safety domains: models and applications

    Ocean Engineering

    (2017)
  • Y. Tian et al.

    LSTM-based traffic flow prediction with missing data

    Neurocomputing

    (2018)
  • J. Woo et al.

    Dynamic model identification of unmanned surface vehicles using deep learning network

    Applied Ocean Research

    (2018)
  • Y. Xue et al.

    Automatic simulation of ship navigation

    Ocean Engineering

    (2011)
  • H. Zheng et al.

    Predictive path following with arrival time awareness for waterborne agvs

    Transportation Research Part C: Emerging Technologies

    (2016)
  • H. Zheng et al.

    Closed-loop scheduling and control of waterborne agvs for energy-efficient inter terminal transport

    Transportation Research Part E: Logistics and Transportation Review

    (2017)
  • M.A. Abkowitz

    Measurement of hydrodynamic characteristics from ship maneuvering trials by system identification

    Maneuverability

    (1980)
  • T. Awad, M.A. elfatah Elgohary, T.E. Mohamed, Ship roll damping via direct inverse neural network control system,...
  • Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, S. Levine, Combining model-based and model-free updates for...
  • D. Chen et al.

    A research on AIS-based embedded system for ship collision avoidance

  • L. Chen et al.

    Ship collision avoidance path planning by pso based on maneuvering equation

  • L. Cheng, C. Liu, B. Yan, Improved hierarchical A-star algorithm for optimal parking path planning of the large parking...
  • E. Fernandes et al.

    Towards an orientation enhanced A-star algorithm for robotic navigation

    2015 IEEE International Conference on Industrial Technology (ICIT)

    (2015)
  • K. Hara et al.

    A safe way of collision avoidance maneuver based on maneuvering standard using fuzzy reasoning model

  • K. Hasegawa et al.

    Ship auto-navigation fuzzy expert system (safes)

    Journal of the Society of Naval Architects of Japan

    (1989)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Computation

    (1997)
  • Y. Hu et al.

    An enhanced lstm for trend following of time series

    IEEE Access

    (2019)
  • M. Inaishi et al.

    Basic research on a collision avoidance system using neural networks

    Journal of Navigation

    (1992)
  • Cited by (44)

    View all citing articles on Scopus

    Shuo Xie is a PhD candidate of Wuhan University of Technology and National Engineering Research Center for Water Transport Safety. He received his master degree majoring in Transportation Engineering in the School of Energy and Power Engineering, Wuhan University of Technology in 2017. He has published more than 10 academic papers including 4 SCI journal papers. His research interests include ship model identification, ship control and collision avoidance.

    Xiumin Chu is a professor in the National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan, China. He received the PhD degree (2002) and M.S. degree (1998) majoring in Automobile Application Engineering in Jilin University. He has published 2 books and more than 70 papers. His research topics include waterway transportation intelligence, smart ship, and ship motion simulation.

    Mao Zheng is a senior engineer at National Engineering Research Center for Water Transportation Safety. He received his PhD degree majoring in Ship Engineering college, Harbin Engineering University in 2014. He has published more than 10 academic papers on ship hydrodynamics and machine learning. His current research interests include ship collision avoidance, ship maneuvering motions simulation and tests, etc.

    Chenguang Liu is an assistant professor in the National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan, China. He received his M.S. degree and Ph.D. degree in the School of Energy and Power Engineering, Wuhan University of Technology, China in 2014 and 2017, respectively. He finished his post-doctoral research in Wuhan University in 2019. He has published more than 10 academic papers. His current research interests include ship intelligence, ship motion control, and model predictive control.

    View full text