A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control
Introduction
The automation of vehicles is an effective approach to reduce the risk caused by dangerous driving behaviors [20], [71]. Due to the safety requirements of the autonomous surface vessels (ASVs), a collision avoidance system becomes more and more important [4]. With the recent development of artificial intelligence, deep reinforcement learning (DRL) methods [34], [44] have profound effects on complex decision-making tasks like multi-ship collision avoidance, for which the DRL method provides an alternative approach compared with the traditional methods. As a branch of DRL methods, the policy gradient-based methods (e.g., deep deterministic policy gradient (DDPG) [25], asynchronous advantage actor-critic (A3C) [33], proximal policy optimization (PPO) [41], etc) can solve continuous action problems with a deterministic policy parameterized by neural networks. At present, the DRL has made breakthroughs in a variety of domains, such as control [38], path planning [29], collision avoidance [10] and so on. In this section, the existing ship collision avoidance methods are introduced, including the traditional methods and DRL based methods.
Traditional ship collision avoidance methods can be briefly divided to two categories: path generation methods and intelligent optimization methods.
Path generation methods mainly include global grid based methods (e.g., A* [47], [26]) and local path generation methods (e.g., artificial potential field (APF) [62], [63]). As a heuristic search algorithm, A* considers both the origin and the destination, which has a global optimality. Hierarchical planning [53], [8] and additional constraints [11], [48] are the commonly used approaches in A* to improve the search efficiency and path smoothness. Compared with A*, APF [31], [24] has smaller computation and smoother paths by using artificial gravitational and repulsive field to model the navigation environment.
With the development of intelligent optimization algorithms, the optimization based methods, e.g., fuzzy mathematics, neural networks and swarm intelligence, have attracted more attention in ship collision avoidance [49] than before. Fuzzy mathematics has been applied in the fuzzy classification [13], [12] and reasoning [39], [40] of ship collision risk for a long time. Generally, the output of fuzzy mathematics relies on the membership functions set in advance, which needs more prior knowledge. Besides, neural networks [17] is another powerful approach to model the uncertain factors in reasoning the ship collision risk. The neural networks are commonly combined with fuzzy mathematics [2] and expert system (ES) [46] to realize reliable collision avoidance.
Recently, swarm intelligence and evolution optimization methods have become hot topics in ship collision avoidance. Ant colony optimization (ACO) [22], [23], [51] and particle swarm optimization (PSO) [32], [35], [7], [28] are the most commonly used algorithms in ship collision avoidance, which can obtain good results with an appropriate fitness function.
At present, several DRL methods have been applied in ship collision avoidance. The key issues in the DRL methods are the definitions of the state, action and reward function.
Deep Q-network is firstly applied in ship collision avoidance with a low-dimensional state-action space. In Ref. [42], the discrete ship heading changes and a set of distances measured by the fixed interval detection lines around the ship are defined as the actions and states in deep Q-network, respectively. The proposed deep Q-network is verified in both simulations [42] and model ship experiments [43]. In Ref. [65], a so-called constrained deep Q-network is proposed to reduce the complexity of the state-action space by adding constrains based on international regulations for preventing collisions at sea (COLREGs), which also obtains good collision avoidance results.
In addition, using high-complex deep neural networks is an effective approach to train the value function in a high-dimensional state-action space. In Ref. [9], a convolutional neural network (CNN) is used to obtain a more reliable policy. This CNN model uses the perception information, the motion state and the actions of the ship in a certain horizon to produce a chain lumped state matrix for reinforcement learning.
Although deep Q-network based methods have achieved good results, a large memory space is still required for the discrete actions. With the development of policy gradient-based methods (e.g., DDPG, A3C, etc), a deterministic policy from ship states to continuous actions can be obtained with less memory space. In Ref. [19] the DDPG algorithm is applied in a simplified state-action space, in which the vertical distances away from the target course are defined as the continuous actions. In Xu et al. [59], the DDPG algorithm is also applied for ship collision avoidance with the same state-action space defined in Cheng and Zhang [9]. The simulation results have indicated the effectiveness and advantages of DDPG algorithm in continuous decisions for ship collision avoidance.
Although existing DRL methods have obtained rich achievements in ship collision avoidance, the relative low efficiency is the main barrier in application. At present, the asynchronous computing framework (e.g., A3C) and model-based for model-free (i.e., MB-MF) learning can both improve the learning efficiency. To reduce the exploring time and memory costs, an on-policy A3C method [33] updates the obtained gradients through asynchronous parallel computing, which has greater potentials than the experience reply technique [37] in DDPG. For MB-MF learning methods, the main idea is to establish a short-term model-based optimizer for combination with the long-term learner. In Ref. [5], a linear-quadratic regulator (LQR) optimizer is adopted for data-efficient learning in both simulations and real-world experiments. This LQR optimizer uses a time-varying linear-Gaussian (TVLG) model for optimization, which is established by fitting the samples. In Ref. [36], the widely used model predictive control (MPC) approach [70], [67], [68], [69] is applied for model-based learning, which can initialize the networks for accelerating the model-free learning. The model used in MPC is a dynamic model that predicts the state changes over the time step duration.
Surveying from the ship collision avoidance research, DRL methods obtain an optimal policy by maximizing future rewards through interactions, which have better potentials than the traditional methods in uncertain environments. In spite of this, the main drawback in existing DRL based collision avoidance methods is the low learning efficiency problem, especially the off-line methods such as DDPG.
As denoted in the related DRL works, the asynchronous computing and MB-MF learning can be considered to improve the learning efficiency for ship collision avoidance. Motivated by Refs. [33], [5], [36], the main works of this paper are as follows:
- (1)
To reduce the learning time and memory costs, the A3C method [33] is applied for ship collision avoidance in this study.
- (2)
The MB-MF learning is considered to further improve the learning efficiency. Besides, traditional model-based method [5], [36] needs to establish an accurate model before designing the controller. While the inverse control [55], [3], [14] method directly uses the inverse model between the desired outputs and inputs as the controller, which has more concise strategy and potentials in MB-MF learning.
In order to improve the learning efficiency for ship collision avoidance, a composite learning method is proposed in this study. A simple framework of the proposed composite learning is shown in Fig. 1 for brief explanation. Instead of the simple initialization in Ref. [36], the main idea of this method is to use Q-learning for adaptive decisions between the actions of a model-free A3C and a model-based inverse controller in the entire learning process. The contributions of this study are:
- (1)
Compared with traditional ship collision avoidance methods, the proposed method has efficient learning ability by integrating A3C reinforcement learning, which performs better with limited perception and states.
- (2)
Compared with pure model-free A3C method, the proposed composite method uses a feed-forward LSTM controller and Q-learning to generate supervised trajectories for A3C, which has higher learning efficiency.
Therefore, in addition to A3C applications, the originality of the proposed composite learning method is reflected in two aspects: the inverse controller for ship collision avoidance and decisions based on Q-learning.
The remainder of this article is organized as follows. In Section 2, the ship hydrodynamic model and collision risk model are described. In Section 3, the A3C learning based ship collision avoidance method is described. In Section 4, the composite learning method is proposed. In Section 5, simulation experiments under multi-ship encounters are carried out to assess the effectiveness of the proposed methods. In Section 6, conclusions and future research are presented.
Section snippets
Ship hydrodynamic model and collision risk model
In consideration of the nonlinear characteristics of ship motion and different collision states (e.g., distance of the closest point of approach (DCPA) and time to the closest point of approach (TCPA)), the three degree-of-freedom (3-DOF) ship hydrodynamic model and the collision risk model are used to calculate the collision risk index (CRI).
A3C learning based ship collision avoidance method
In this section, the basic A3C algorithm is applied to ship collision avoidance by regarding the collision avoidance process as a typical Markov decision-making process (MDP). The definitions of the state and reward function are the key issues in the application of A3C.
The proposed composite Learning method
In this section, a composite learning method is proposed by using Q-learning to make adaptive decisions between the A3C learning and an inverse model-based controller. The inverse controller and Q-learning decisions are described as follows.
Case study
The widely used KVLCC2 ship model [27] is adopted for multi-ship collision avoidance in the following case studies to verify the effectivenesses of the proposed composite learning method. Case studies are conducted from the following aspects for comprehensive verification: 1) Ship collision avoidance based on the basic A3C learning; 2) Comparisons between the proposed method and A3C learning; 3) Comparisons between the proposed method and traditional optimization method.
Conclusions and future research
To realize efficient learning of multi-ship collision avoidance policy, a model based for model-free (MB-MF) composite learning method is proposed in this study. The main originality of the proposed method is to use the LSTM model-based controller to accelerate the model-free A3C learning by adaptive Q-learning decisions in the entire learning process. The following conclusions are drawn from the simulation experiments using the model of KVLCC2 ship:
- (1)
In terms of the application of A3C learning
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Shuo Xie: Conceptualization, Methodology, Writing - original draft. Xiumin Chu: Funding acquisition, Supervision. Mao Zheng: Software, Validation. Chenguang Liu: Writing - review & editing.
Acknowledgements
This research is supported by the National Key Research and Development Program of China (No. 2018YFB1600400), the Fundamental Research Funds for the Central Universities (WUT:203144003), the National Natural Science Foundation of China (No. 51709220), the Open Project Program of Fujian University Engineering Research Center of Marine Intelligent Ship Equipment (No. 322031010602), the Project of Science and Technology Bureau of Fuzhou (No. 2018-G-92) and the Key Project of Science and
Shuo Xie is a PhD candidate of Wuhan University of Technology and National Engineering Research Center for Water Transport Safety. He received his master degree majoring in Transportation Engineering in the School of Energy and Power Engineering, Wuhan University of Technology in 2017. He has published more than 10 academic papers including 4 SCI journal papers. His research interests include ship model identification, ship control and collision avoidance.
References (71)
- et al.
A study on the collision avoidance of a ship using neural networks and fuzzy logic
Applied Ocean Research
(2012) - et al.
A review on improving the autonomy of unmanned surface vehicles through intelligent collision avoidance manoeuvres
Annual Reviews in Control
(2012) - et al.
Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels
Neurocomputing
(2018) - et al.
Neural networks based reinforcement learning for mobile robots obstacle avoidance
Expert Systems with Applications
(2016) - et al.
Neural inverse optimal control for discrete-time impulsive systems
Neurocomputing
(2018) Analysis of causes of collision caused by human error of captain and oow in ship collision accidents
Journal of the Ergonomics Society of Korea
(2018)- et al.
A study on path optimization method of an unmanned surface vehicle under environmental loads using genetic algorithm
Ocean Engineering
(2017) - et al.
Impacts of the rudder profile on manoeuvring performance of ships
Ocean Engineering
(2016) - et al.
Solving the optimal path planning of a mobile robot using improved Q-learning
Robotics and Autonomous Systems
(2019) - et al.
Measures to diminish the parameter drift in the modeling of ship manoeuvring using system identification
Applied Ocean Research
(2017)
Multi-objective path planning for unmanned surface vehicle with currents effects
ISA Transactions
A reactive COLREGs-compliant navigation strategy for autonomous maritime navigation
IFAC-PapersOnLine
Reinforcement learning based compensation methods for robot manipulators
Engineering Applications of Artificial Intelligence
Automatic collision avoidance of multiple ships based on deep Q-learning
Applied Ocean Research
Decision support system for collision avoidance of vessels
Applied Soft Computing Journal
A constrained A* approach towards optimal path planning for an unmanned surface vehicle in a maritime environment containing dynamic obstacles and ocean currents
Ocean Engineering
Smoothed A* algorithm for practical unmanned surface vehicle path planning
Applied Ocean Research
Review of ship safety domains: models and applications
Ocean Engineering
LSTM-based traffic flow prediction with missing data
Neurocomputing
Dynamic model identification of unmanned surface vehicles using deep learning network
Applied Ocean Research
Automatic simulation of ship navigation
Ocean Engineering
Predictive path following with arrival time awareness for waterborne agvs
Transportation Research Part C: Emerging Technologies
Closed-loop scheduling and control of waterborne agvs for energy-efficient inter terminal transport
Transportation Research Part E: Logistics and Transportation Review
Measurement of hydrodynamic characteristics from ship maneuvering trials by system identification
Maneuverability
A research on AIS-based embedded system for ship collision avoidance
Ship collision avoidance path planning by pso based on maneuvering equation
Towards an orientation enhanced A-star algorithm for robotic navigation
2015 IEEE International Conference on Industrial Technology (ICIT)
A safe way of collision avoidance maneuver based on maneuvering standard using fuzzy reasoning model
Ship auto-navigation fuzzy expert system (safes)
Journal of the Society of Naval Architects of Japan
Long short-term memory
Neural Computation
An enhanced lstm for trend following of time series
IEEE Access
Basic research on a collision avoidance system using neural networks
Journal of Navigation
Cited by (44)
Spatio-temporal multi-graph transformer network for joint prediction of multiple vessel trajectories
2024, Engineering Applications of Artificial IntelligenceAutonomous ship navigation with an enhanced safety collision avoidance technique
2024, ISA TransactionsIntelligent ship collision avoidance model integrating human thinking experience
2023, Ocean EngineeringImproved reinforcement learning for collision-free local path planning of dynamic obstacle
2023, Ocean Engineering
Shuo Xie is a PhD candidate of Wuhan University of Technology and National Engineering Research Center for Water Transport Safety. He received his master degree majoring in Transportation Engineering in the School of Energy and Power Engineering, Wuhan University of Technology in 2017. He has published more than 10 academic papers including 4 SCI journal papers. His research interests include ship model identification, ship control and collision avoidance.
Xiumin Chu is a professor in the National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan, China. He received the PhD degree (2002) and M.S. degree (1998) majoring in Automobile Application Engineering in Jilin University. He has published 2 books and more than 70 papers. His research topics include waterway transportation intelligence, smart ship, and ship motion simulation.
Mao Zheng is a senior engineer at National Engineering Research Center for Water Transportation Safety. He received his PhD degree majoring in Ship Engineering college, Harbin Engineering University in 2014. He has published more than 10 academic papers on ship hydrodynamics and machine learning. His current research interests include ship collision avoidance, ship maneuvering motions simulation and tests, etc.
Chenguang Liu is an assistant professor in the National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan, China. He received his M.S. degree and Ph.D. degree in the School of Energy and Power Engineering, Wuhan University of Technology, China in 2014 and 2017, respectively. He finished his post-doctoral research in Wuhan University in 2019. He has published more than 10 academic papers. His current research interests include ship intelligence, ship motion control, and model predictive control.