Target tracking strategy using deep deterministic policy gradient

doi:10.1016/j.asoc.2020.106490

Applied Soft Computing

Volume 95, October 2020, 106490

https://doi.org/10.1016/j.asoc.2020.106490 Get rights and content

Highlights

•
Adaptive high-level control strategy of unmanned combat air vehicle (UCAV).
•
Deep reinforcement learning to solve a continuous tracking problem.
•
Novel target tracking simulator to test UCAV’s agile maneuvering capability.

Abstract

To address the challenge of maintaining high robustness of target tracking in a 3D dynamic high-altitude scenario, this paper presents a method to formulate continuous strategic maneuvers for unmanned combat air vehicles (UCAVs) based on deep deterministic policy gradient (DDPG). DDPG is an efficient reinforcement learning approach that helps UCAV perform a variety of navigation tasks in real-time in a dynamic and random electronic warfare environment, and therefore possesses clear advantages over other technologies. First, create a target tracking simulator, Tracker, in the cognitive electronic warfare framework, and conduct a theoretical analysis of maneuvering bias produced by environmental observational errors. Tracker can automatically correlate the maximum physical overload with UCAV’s attitude angles and desired movement commands. Second, shape the agent’s behavior rewards under the inspiration of vector-based navigation to ensure that the DDPG’s output is reliable. Finally, a DRL-based navigation decision framework is employed to validate the simulation for target tracking tasks in different environments and bring excellent results. In terms of behavior assessment, the agile maneuvers mastered by the agent are dissected by time segmentation of high-quality trajectories.

Introduction

The capability of the autonomous decision-making system of unmanned combat air vehicle (UCAV) perform agile maneuvers in complex dynamic environments has been favored by industry, and attracted many attentions in control engineering during past decades [1]. The purpose of unmanned aerial vehicle (UAV) technologies is to create a completely independent and intelligent decision support system, in which autonomy, navigation algorithm, target detection, energy and information management are particularly emphasized [2], [3]. Moreover, when UCAV is used for military purposes, it always works in dense and threatening environments that require active trajectory generation and navigation control [4], [5].

Advances in the field of navigation, guidance and control systems have made significant contributions to the development of autonomous vehicles [6]. Traditionally, a guidance system is designed separately from a control system, and the strategy of optimizing the offline path planning is the most common method [7]. Naeem W. et al. [8] reviewed some important guidance laws for UAVs, including Lyapunov-based guidance, proportional navigation guidance (PNG) and line-of-sight (LOS) guidance. Through a sensible guidance strategy, UCAV moves towards target by splitting the path into multiple waypoints and in a certain order. The waypoints are a number of fixed points in the space that can be optimized to find the best path to guide the UAV to target area. Typical criteria for assessing the optimal path for UAVs are related to travel time, safety conditions, and energy consumption [9]. In a known scenario, path planning first needs to model the mission environment, and then optimizes the trajectory of a certain objective function by classical algorithms, including artificial potential field (APF), A* algorithm, rapid-exploring random tree (RRT), heuristic evolutionary computation and so forth [10], [11], [12], [13]. These methods introduced different considerations such as trajectory smoothness, waypoint accessibility, map gridding and action discretization However, When UAV operates in dynamic spaces, online reactive path planning based on multi-sensor fusion will consume significant computational time for environmental assessment, so it is hard to meet real-time requirements.

Target tracking, as a key part of the UCAV navigation task in cognitive electronic warfare (CEW), is the process of linking target searching and striking. Precise target tracking tends to use motion planning methods for end-to-end decision making, while treating the physical constraints of maneuver vehicles as optimization terms or analytical terms [14]. To better adapt low-level controllers of navigation, there are two effective motion planning strategies for UCAV, one is to record distinctive maneuvering patterns individually in a pattern library and then to arrange where and when these patterns are used [15], [16]; the other is to directly design a complete optimal/suboptimal control scheme under defined dynamic models, including fuzzy control [17], [18], neural network control [19], nonlinear dynamic inversion control [20], etc. In contrast, the former is low in complexity but difficult to overcome environmental dynamic changes, while the latter is highly real-time but often limited in performance due to the strong non-linearity of dynamic models of UCAV.

Fortunately, the emergence of deep reinforcement learning (DRL) techniques make UCAV possible to play an effective role in CEW under prior near-ignorance while eliminating the derivation of non-linear dynamic models, thus promising to overcome the bottleneck of traditional planning or control means. Recently, various kinds of intelligent agents derived from DRL have achieved remarkable success in Atari 2600 game, Go, navigation support and other applications [21], [22], and we pay more attention to how DRL solves different planning tasks in the field of navigation [23], [24], [25]. When considering discrete actions, Zhao et al. let UCAV master a high-winning rate of air-combat decisions via using deep Q-learning network (DQN) algorithm [24]. DRL-based path planning algorithms such as value iteration network (VIN) and three-dimensional path planning network (TDPP-Net) performed well in 2D and 3D space, respectively [26], [27]. For continuous actions, Zhang et al. enabled UAVs to navigate from arbitrary departure places to destinations using only sensory information of local environment and GPS signal based on deep deterministic policy gradient (DDPG) [28], but only simple expressions of state and action were designed. Authors in [29] designed and validated a Gazebo-based multi-functional reinforcement learning framework, which aims to solve the UAV landing task on a moving platform using a new DDPG algorithm. Unfortunately, the UAV designed in these studies is either a pure static environment or only a threat of motion, so there is no guarantee of generalizability of the DRL network and robustness of navigation control, especially for target tracking.

To bring UCAV’s motion closer to reality, DRL algorithms are commonly tested on a simulation systems that use video images as input. Yu et al. implemented a DQN-based agent to avoid obstacles by learning the steering motion of the simulated car with original video images [30]. However, there are two thorny issues when using images as the state input: (1) the performance of visual/infrared sensors is easily disturbed by climate and distance, making it difficult to generalize to invisible target states using convolutional neural networks; (2) in order to fully exploit the maneuverability of UCAV for evasion and attack within certain physical constraints, the performance of the DRL algorithm and the reasonableness of the simulator are extremely demanding.

Since a complete understanding of UCAV’s kinetics can be ignored by using DRL to enhance different control formulas, it can greatly improve the adaptability of autonomous systems [25], [31]. For example, in conjunction with UCAV’s target searching, a linear control model is firstly established to express the state update process of UCAV, then a complete or partial non-linear control model is formed by DRL network, and finally the controller design is inferred based on the generated strategy [32]. Furthermore, With interactive experience, the model-free DRL algorithms are cable to directly determine optimal control strategies when learning unknown dynamic models, thus making them easy to apply to the field of unmanned driving [33], [34], [35], including autonomous underwater vehicles’ navigation control and mobile robots’ simultaneous localization and mapping [32], [36], yet none of these unmanned vehicles for training consider the coupling motion in the vertical direction. Overall, guiding UCAV to generate autonomous longitudinal maneuvers in a 3D space is not easy, and evaluating the learned behavioral strategies of UCAVs in a unified coordinate system must be more difficult, both of which limit the development of DRL in specific combat scenarios.

The above literature review identifies a large amount of DRL-related prior work on autonomous navigation and provides a solid research base for UCAV target tracking task. To overcome the shortcoming of DRL’s instability in continuous control models and to implement a UCAV learning system based on observe-orient-decide-act (OODA) loop theory in CEW [37], the contribution of this study includes the following three points:

(1)
The effects of error-coupling between relative velocity and displacement on reward shaping of DRL are discussed based on the objective function of vector-based navigation in the APF method;
(2)
By configuring the target tracking framework of UCAV with DDPG [33], we develop an advanced underactuated controller from radar sensor to vectorial motion;
(3)
Establish a new target tracking simulator and a variety of threat scenarios to evaluate the tracking performance and strategy essence of deep agents.

The structure of this paper is outlined as follows. Section 2 briefly introduces the technical background of DRL and DDPG algorithms; In Section 3, we establish a new task model based on the versatile CEW (vCEW) framework developed before, and update the DRL framework for controlling UCAV. Section 4 extrapolates the error-coupling phenomenon inherent in the objective function of traditional target tracking, and the matching process between the DDPG algorithm and the Tracker environment is revealed in Section 5. Then, Section 6 provides a comprehensive discussion of the experimental arrangements and its results. Section 7 reports concluding remarks.

Section snippets

Deep reinforcement learning

One of the primary goals in the field of artificial intelligence is to solve the complex tasks of unprocessed high-dimensional sensory input, and the combination of reinforcement learning (RL) and deep learning has made important progress in this goal [21].

Tracker environment

The target tracking environment, Tracker, proposed in this work is based on our previous vCEW framework [31]. In the Tracker, a UCAV is used as a maneuverable combat agent. Its mission is to keep tracking the target in a given 3D space while avoiding all threats safely. Note that threats in the environment include both dynamic and static obstacles. For each timestep of the mission, the UCAV relies on radar sensors to sense and process threat state information, then takes a desired action under

Error-coupling analysis

For the problem of moving target tracking, the vector-based navigation method under APF thinking can shape arbitrary vectorial actions and is therefore suitable for inspiring autonomous maneuvers of the UCAV [41], [42]. However, since the observational errors produced in Tracker can be lethal to both velocity and displacement estimation, thus the extent to which the objective function of target tracking is affected has to be assessed to ensure the effectiveness of vector-based navigation in DRL.

Adaptive matching between DRL and tracker

This section focuses on how to match the execution agent in a Tracker task to the input and output ports of the DDPG algorithm.

Simulation experiments and numerical results

The reliability of motion planning in real environments is related to the environmental complexity: in obstacle-sparse spaces, the intelligent level of deep agents trained by DRL cannot be highlighted; while in obstacle-dense environments, the performance of DRL algorithms is not sufficient to allow the agent to complete the tracking task. Thus in this Section, we will arrange a variety of task scenarios to test the adaptability of DDPG. In addition, to analyze the performance of the DRL

Conclusion

This paper discussed the control method of combining DDPG with UCAV’s constrained motion model, which is helpful for UCAVs to realize autonomous agile maneuvers in CEW. By analyzing the objective function of traditional objective tracking, we introduced the classical ideas of vector-based navigation into the reward shaping process of DRL, and theoretically explored the role of error coupling. Then a DRL-based framework for UCAVs’ continuous-action decision-making inspired by the stage

CRediT authorship contribution statement

Shixun You: Conceptualization, Methodology, Software. Ming Diao: Data curation, Investigation, Formal analysis. Lipeng Gao: Writing - original draft. Fulong Zhang: Visualization, Software. Huan Wang: Supervision, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Shixun You received his bachelor’s degree from the Taiyuan University of Technology, China, in 2015, and he is pursuing his master’s degree and Ph.D. degree in Information and Communication Engineering from Harbin Engineering University, China. His research interests include artificial intelligence, evolutionary computation, cognitive jamming, and cooperative jamming.

References (42)

PaulT. et al.
Modelling of UAV formation flight using 3D potential field
Simul. Model. Pract. Theory
(2008)
WuK. et al.
Tdpp-net: Achieving three-dimensional path planning via a deep neural network architecture
Neurocomputing
(2019)
CarluchoI. et al.
Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning
Robot. Auton. Syst.
(2018)
ZhuL. et al.
A 3D collision avoidance strategy for UAV with physical constraints
Measurement
(2016)
IsraelK. et al.
Defense Science Board Study on Unmanned Aerial Vehicles and Uninhabited Combat Aerial Vehicles
(2004)
BirkA. et al.
Safety, security, and rescue missions with an unmanned aerial vehicle (UAV)
J. Intell. Robot. Syst.
(2011)
FuC. et al.
Efficient visual odometry and mapping for unmanned aerial vehicle using ARM-based stereo vision pre-processing system
RaoG.N. et al.
Trends in electronic warfare
IETE Tech. Rev.
(2003)
KabambaP.T. et al.
Optimal path planning for unmanned combat aerial vehicles to defeat radar tracking
J. Guid. Control Dyn.
(2006)
LongL.N. et al.
A review of intelligent systems software for autonomous vehicles

NikolosI.K. et al.

Evolutionary algorithm based offline/online path planner for UAV navigation

IEEE Trans. Syst. Man Cybern. B

(2003)

NaeemW. et al.

A review of guidance laws applicable to unmanned underwater vehicles

J. Inst. Navig.

(2003)

FujimuraK.

Path planning with multiple objectives

IEEE Robot. Autom. Mag.

(1996)

RodriguezS. et al.

An obstacle-based rapidly-exploring random tree

NanniciniG. et al.

Bidirectional A* search for time-dependent fast paths

J. Am. Chem. Soc.

(2008)

MasehianE. et al.

Classic and heuristic approaches in robot motion planning-a chronological review

World Acad. Sci. Eng. Technol.

(2007)

MinguezJ. et al.

Motion planning and obstacle avoidance

UreN.K. et al.

Autonomous control of unmanned combat air vehicles: Design of a multimodal control and flight planning framework for agile maneuvering

IEEE Control Syst. Mag.

(2012)

GodboltB. et al.

Control-oriented physical input modelling for a helicopter UAV

J. Intell. Robot. Syst.

(2014)

SunK. et al.

Adaptive fuzzy control for non-triangular structural stochastic switched nonlinear systems with full state constraints

IEEE Trans. Fuzzy Syst.

(2019)

MohanV. et al.

An expert 2DOF fractional order fuzzy PID controller for nonlinear systems

Neural Comput. Appl.

(2019)

Cited by (25)

Towards universal and sparse adversarial examples for visual object tracking
2024, Applied Soft Computing
Adversarial attack is aimed to add small perturbations to the model that are imperceptible to humans, resulting in incorrect outputs with high confidence. Currently, adversarial attack mainly focuses on image classification and object detection tasks, but are insufficient in visual tracking. Nevertheless, existing attack methods for object tracking are limited to Siamese networks, with other types of trackers being infrequently targeted. In order to expand the usage of adversarial attacks in object tracking, we propose a model-free black-box framework for learning to generate universal and sparse adversarial examples (USAE) for tracking task. To this end, we first randomly add a noisy patch to any interference image, and then apply standard projected gradient descent to optimize the generation process of adversarial examples which is subjected to a similarity constraint with original images, making its embedding feature closer to the patched interference image in $l_{2}$ -norm. Consequently, there is no significant difference between adversarial images and original images for human vision, but leading to tracking failure. Furthermore, our method just attacks 50 original images with adversarial images in each sequence, rather than an entire dataset. Numerous experiments on VOT2018, OTB2013, OTB2015, and GOT-10k datasets verify the effectiveness of USAE attack. Specifically, the number of lost reaches 1180 on VOT2018, the precision of OTB2015 decreased by 42.1%, and the success rate of GOT-10k is reduced to 1.8%, which shows the attack effect is remarkable. Moreover, USAE has a good transferability among various trackers like SiamRPN++, ATOM, DiMP, KYS, and ToMP. Notice that the proposed method is black-box and applicable to most realistic scenarios.
Autonomous target tracking of multi-UAV: A two-stage deep reinforcement learning approach with expert experience
2023, Applied Soft Computing
In recent years, deep reinforcement learning (DRL) has developed rapidly and has been applied to multi-UAV target tracking (MTT) research. However, DRL still faces challenges in data utilization and learning speed. To better solve the above problems, a novel two-stage DRL-based multi-UAV decision-making method is proposed in this paper. Specifically, a sample generator combining artificial potential field with proportional–integral–derivative is used to produce expert experience data. On this basis, a two-stage reinforcement learning training method is introduced. For the first stage, the policy network and critic network are pre-trained using expert data, combined with behavior cloning loss and additional Q-value loss, which reduces ineffective exploration and speeds up learning. For the second RL stage, by calculating the average return of the last recent $k$ excellent episodes, the excellent experience generated by the agent itself is screened out and used to guide the policy network to choose the actions with high reward, thus improving the efficiency of data utilization. Extensive simulation experiments show that our method not only enables multi-UAV to continuously track the target in obstacle environments but also significantly improves the learning speed and convergence effect.
Controlling fracture propagation using deep reinforcement learning
2023, Engineering Applications of Artificial Intelligence
Mechanical discontinuity embedded in a material plays an essential role in determining the bulk mechanical, physical, and chemical properties. The ability to control mechanical discontinuity is relevant for industries dependent on natural, synthetic and composite materials, e.g. construction, aerospace, oil and gas, ceramics, metal, and geothermal industries, to name a few. The paper is a proof-of-concept development of a reinforcement learning framework to control the propagation of mechanical discontinuity. The reinforcement learning framework is coupled with an OpenAI-Gym-based environment that uses the mechanistic equation governing the propagation of mechanical discontinuity. Learning agent does not explicitly know about the underlying physics of propagation of discontinuity; nonetheless, the learning agent can infer the control strategy by continuously interacting the environment. The design of Markov decision process, which includes state, action and reward, is crucial for robust control. The deep deterministic policy gradient (DDPG) algorithm is implemented for learning continuous actions. It is also observed that the training efficiency is strongly determined by the formulation of reward function. The reward function that forces the learning agent to stay on the shortest linear path between crack tip and goal point performs much better than the reward function that aims to reach closest to the goal point in minimum number of steps. After close to 500 training episodes, the reinforcement learning framework successfully controlled the propagation of discontinuity in a material despite the complexity of the propagation pathway determined by multiple goal points.
Advanced deep deterministic policy gradient based energy management strategy design for dual-motor four-wheel-drive electric vehicle
2023, Mechanism and Machine Theory
Citation Excerpt :
Compared to the optimization theory-based EMS that relies on accurate system model to improve performance, the model-free RL method is an appropriate option for the complicated vehicle system, which is hardly described accurately. Q-Learning [40], DQN [41], DDQN [42], DDPG [43], etc. are typical model-free reinforcement learning algorithms. Compared with Q-Learning, DQN, and DDQN, DDPG has a unique advantage in handling continuous state-action problems, which can search for the optimal solution without discretizing the state and action.
Four-wheel-drive battery electric vehicles (BEV) driven by multiple motors on different axles are getting popular by offering outstanding dynamic and safety performance without sacrificing structure complexity. However, efficiently splitting the power flow between power sources is crucial and difficult. In this study, an intelligent energy management strategy (EMS) is proposed for a specific dual-motor four-wheel-drive (DM-4WD) BEV to reduce energy consumption in unknown traffic conditions. A novel reward factor involved deep deterministic policy gradient (DDPG) algorithm is proposed in EMS design, whose parameters matching are based on particle swarm optimization algorithm to provide a platform to investigate the maximum potential of energy performance improvement for the proposed EMS. The simulation results show that the proposed DDPG-EMS reaches 95.7%, 94.8%, and 95.5% of benchmark dynamic programming-EMS energy performance and outperforms the discontinued-action-based double deep Q-learning strategy in unknow driving cycles. Furthermore, the adaptability of DDPG-EMS is improved by introducing novel rewards setting, which is 3%, 3.8%, and 2.4% better than the traditional State-of-Charge (SOC)-based DDPG-EMS. The simulation results suggest the proposed strategy is efficient and instructive for multi-power BEV EMS design.
An electromyography signals-based human-robot collaboration system for human motion intention recognition and realization
2022, Robotics and Computer-Integrated Manufacturing
With the development of manufacturing industry to the directions of personalization and flexibility, the advantages of human-robot collaboration attract more and more attention. In order to enable the robot fully understand human motion intention and play the supervision and guidance role of the human tutor, a human-robot collaboration system based on Electromyography (EMG) signals is proposed. In the investigation, first, aiming at the problems that the coupling effect of output force and joint angle is not considered in the previous research, and the extraction accuracy of force information is poor, the Fast Orthogonal Search (FOS) algorithm is modified to improve the recognition accuracy of motion intention by using the correlation coefficient between the extracted feature signal and the arm output force as the optimization parameter. Moreover, aiming at the problems that the traditional control models cannot balance accurate tracking and comfortable collaboration, and the robot motion stability is poor, two evaluation methods of control signal execution efficiency and motion stability are proposed from the perspective of ergonomics. Afterwards, the human-robot end-to-end collaborative control model based on Deep Deterministic Policy Gradient (DDPG) reinforcement learning algorithm is trained online to improve the accuracy and comfort of the collaboration. As compared with the traditional PD control algorithms based on admittance model, the proposed control method is model-free and intelligent. It effectively avoids the challenge of PD control parameter selection, and behaves good coordination between accurate tracking and comfortable collaboration. Human-robot collaborative sawing experiment shows that the proposed collaboration system can save about 86.2% physical power for the human tutor.
Controlling mixed-mode fatigue crack growth using deep reinforcement learning
2022, Applied Soft Computing
Citation Excerpt :
The algorithm achieves better performance compared with other algorithms in terms of long-term average net bit rate. You, Diao [13] developed target tracking strategy for unmanned combat air vehicles based on DDPG. They found the agent was able to master several commonly encountered maneuvers to control unmanned combat air vehicles to perform real-time navigation tasks.
Mechanical discontinuity embedded in a material determines the bulk mechanical, physical, and chemical properties. Under external forces, mechanical discontinuity undergo spatiotemporal propagation; thereby altering various properties of the material. This paper is a proof-of-concept development and deployment of a reinforcement learning framework, based on deep deterministic policy gradient, to precisely control both the direction and rate of the fatigue crack growth. The ability to control mechanical discontinuity in essence determines the key material properties. The desired control is relatively hard to achieve considering the large, continuous state and action spaces along with the exponential relationship between crack growth and stress cycle. The reinforcement-learning scheme is capable of learning an optimal and computational tractable control strategy. In the proposed approach, the reinforcement learning framework is integrated into an OpenAI-Gym-based environment that implements the mechanistic equations governing the fatigue crack growth. The learning agent does not explicitly know about the underlying physics, nonetheless, the learning agent can infer the control strategy by continuously interacting the numerical environment. The paper formulates an adaptive reward function involving reward shaping that can be generalized to similar control problems to improve the training efficiency. The reinforcement learning framework can successfully control the fatigue crack growth in a material despite the complexity of the propagation/growth pathway determined by multiple goal points. The paper provides the mathematical/physical basis of the reward function and the effect of neural network size and architecture and the state and action space that boosts the training speed while preserving the stability of the RL agents for the desired control problem.

View all citing articles on Scopus

Ming Diao received his bachelor’s, master’s, and Ph.D degrees from Harbin Engineering University, China. He is currently a Professor with the College of Information and Communication, Harbin Engineering University. He won four Ministerial and Provincial-Level Science and Technology Awards. He is a member of the China Society of Image and Graphics (CHN) and a Fellow of China Institute of Communications (CHN). His research interests include wideband signal processing, pattern recognition, machine learning, and telecommunication.

Lipeng Gao received his bachelor’s, master’s, and Ph.D degrees from Harbin Engineering University, China. He is currently a Professor with the College of Information and Communication, Harbin Engineering University. He won two Ministerial and Provincial-Level Science and Technology Awards. His research interests include wideband signal processing, information fusion, artificial intelligence, and cooperative jamming.

Fulong Zhang received his bachelor’s degree from Harbin Engineering University of China in 2003. He is currently a researcher and expert in the First Academy of China Aerospace Science and Industry Corporation. As the academic and technical leader in the field of electronic countermeasures in No.8511 Research Institute of CASIC, he is responsible for the overall design, testing and verification of multi-type electronic countermeasure equipment and comprehensive electromagnetic protection equipment. He has won one second prize of National Defense Science and Technology Progress Award, two third prizes, six national defense scientific and technological achievements, and eight authorized national defense patents.

Huan Wang received his bachelor’s and master’s degree from Harbin Engineering University of China in 2016 and 2019, respectively. He is currently an assistant engineer and assistant designer in No.8511 Research Institute, China Aerospace Science and Industry Corporation. His research interests include wideband signal processing, radar signal recognition, and cooperative jamming.

View full text

Target tracking strategy using deep deterministic policy gradient

Highlights

Abstract

Introduction

Section snippets

Deep reinforcement learning

Tracker environment

Error-coupling analysis

Adaptive matching between DRL and tracker

Simulation experiments and numerical results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Simul. Model. Pract. Theory

Neurocomputing

Robot. Auton. Syst.

Measurement

Defense Science Board Study on Unmanned Aerial Vehicles and Uninhabited Combat Aerial Vehicles

Safety, security, and rescue missions with an unmanned aerial vehicle (UAV)

J. Intell. Robot. Syst.

Efficient visual odometry and mapping for unmanned aerial vehicle using ARM-based stereo vision pre-processing system

Trends in electronic warfare

IETE Tech. Rev.

Optimal path planning for unmanned combat aerial vehicles to defeat radar tracking

J. Guid. Control Dyn.

A review of intelligent systems software for autonomous vehicles

Evolutionary algorithm based offline/online path planner for UAV navigation

IEEE Trans. Syst. Man Cybern. B

A review of guidance laws applicable to unmanned underwater vehicles

J. Inst. Navig.

Path planning with multiple objectives

IEEE Robot. Autom. Mag.

An obstacle-based rapidly-exploring random tree

Bidirectional A* search for time-dependent fast paths

J. Am. Chem. Soc.

Classic and heuristic approaches in robot motion planning-a chronological review

World Acad. Sci. Eng. Technol.

Motion planning and obstacle avoidance

Autonomous control of unmanned combat air vehicles: Design of a multimodal control and flight planning framework for agile maneuvering

IEEE Control Syst. Mag.

Control-oriented physical input modelling for a helicopter UAV

J. Intell. Robot. Syst.

Adaptive fuzzy control for non-triangular structural stochastic switched nonlinear systems with full state constraints

IEEE Trans. Fuzzy Syst.

An expert 2DOF fractional order fuzzy PID controller for nonlinear systems

Neural Comput. Appl.