Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving
Introduction
Ensuring pedestrian safety is one of the most fundamental and important requirements for driving. However, according to the latest report on road safety records from the World Health Organization, more than 27,000 pedestrians lose their lives each year worldwide, accounting for 22% of all road traffic deaths [1]. Apart from the pedestrian reasons (e.g., disobey traffic rules), most traffic deaths were caused by drivers’ reasons, such as unskilled driving, incorrect operating habits, distraction in driving or slow reaction to changing environments [2]. Improving the capacity of avoiding collisions with pedestrians for intelligent vehicles can effectively reduce traffic accidents and save human lives.
Currently, the mainstream of methods for pedestrian collision avoidance is to apply the advanced driver assistant system (ADAS), which includes forward collision warning system (FCW), automated emergency braking system (AEBS) and pedestrian protection system (PPS). FCW usually uses environmental sensors to detect an imminent crash and then alerts the driver to brake by sound or light [3]. AEBS can measure the relative distance and velocity between the vehicle and pedestrian during driving. When the driver brakes too late or the braking force is insufficient, the system assists the driver to avoid or mitigate collisions by emergency braking [4]. PPS redesigns the bumper, bonnet, and windshield of vehicles so that the bonnet can bounce airbags to cushion the pedestrian when the pedestrian is hit in a car-pedestrian crash [5]. The above ADAS systems can protect pedestrians to a certain extent. However, in some environments these systems are unable to flexibly choose feasible lane-changing maneuvers to avoid a potential collision in advance, thus making the driving dull and restricting the capacity of pedestrian collision avoidance [6]. More and more research begins to pay attention to making intelligent vehicles more reliable and adaptive to human drivers [7], [8], [9].
In order to improve the ability to avoid pedestrians and the safety level of driving in different environments, the human-machine cooperative driving scheme has become a new trend [10], [11]. Some methods for pedestrian collision avoidance with the human-machine cooperative driving scheme have been proposed. A Stackelberg-based cooperative control scheme was proposed in [12], which used a leader-follower structure to reduce human-machine conflicts. This method needs to model the action of human drivers as a fixed control strategy (e.g., PD controller), thus it is unable to deal with various driving habits in reality. In [13], a fictive driver activity parameter was introduced. This method can compute the steering assistance actions according to the driver’s real-time behaviors by the Takagi-Sugeno fuzzy control approach. In [14], an emergency steering control system was presented based on optimized trajectories. The optimized trajectory, which is derived from a 5th order polynomial, was selected by a designed performance function. In [15], the driver status and maneuver decisions were considered in cooperative trajectory planning. Besides, some traditional obstacle avoidance strategies in autonomous driving could also be used. These strategies contain: grid-based methods [16], [17], potential field methods [18] and discrete optimization methods [19], [20]. However, these methods are not suitable in the environment with dynamic pedestrians.
Reinforcement learning (RL) has shown great potential in the field of intelligent vehicles [21]. For example, Riedmiller et al. [22] trained an agent to drive a vehicle by following a GPS trajectory without obstacles by RL. Huang et al. [23] and Garcia et al [24] improved the original Q-learning algorithm for robots’ navigation and reduced the probability of collision. Mao et al. [25] proposed a novel network architecture which jointly learn pedestrian detection as well as the given extra feature. Although the above RL-based obstacle avoidance methods can decouple obstacles avoidance from visual information, they require numerous parameters to be manually tuned to plan available paths and cannot perform well when transferring to a new environment. With the development of deep learning (DL) and its excellent ability in processing high-dimensional inputs, deep reinforcement learning (DRL) methods can effectively overcome the problem of high-dimensional inputs, thus received lots of research interests [26], [27], [28], [29], [30], [31]. DRL has been used in the fields of games [32], [33] and robotic manipulators [34]. Among recent DRL methods, deep Q-network (DQN) method [35] is a simple and effective method, thus various DRL methods (e.g., [36], [37], [38], [39]) have been developed based on the idea of DQN.
DRL can be a useful framework to learn pedestrian collision avoidance policy in the HMC system, which improves the intelligence and driving safety of vehicles. As DRL does not leverage any prior knowledge of the environment and human-designed rules, it has the potential to solve more general problems (e.g., pedestrian collision avoidance) than traditional rule-based methods. The policy learned by the DRL method is more consistent with human driving habits. Therefore, the human driver’s driving experience will be improved. Moreover, DRL methods can learn end-to-end control policies without human-designed features. With the consideration of these advantages, some researchers tried to exploit DRL methods in certain situations for improving driving safety very recently. In [40], an autonomous braking system via DRL was presented. Similarly, an improved DRL method was used to determine to accelerate, decelerate or maintain speed for vehicles in [41]. In [42], an automated lane change behavior in structured highways was learned by an improved DRL method. To our knowledge, except for these related research, there have been few works to exploit DRL to deal with the problem of pedestrian collision avoidance under the human-machine cooperative driving scheme.
In this paper, a novel learning-based human-machine cooperative driving scheme (L-HMC) with pedestrian collision avoidance capacity using deep reinforcement learning is proposed. In the scheme, the policy for pedestrian collision avoidance is learned offline by an improved deep Q-network (DQN) method. Then, the human-machine cooperative driving scheme assists human drivers online to avoid a potential collision with pedestrians using the learned policy. Note that the proposed L-HMC could also be combined with abnormal status recognition (e.g., drowsiness, distraction) [43], [44], [45] of drivers to further improve the safety of human driving. The effectiveness of the proposed scheme has been successfully verified on the human-machine cooperative driving platform built in PreScan.
The main contributions of our work can be summarized into three aspects:
- (1)
To learn a driving policy for pedestrian collision avoidance more efficiently, an improved deep reinforcement learning method (specifically, the DQN method) is proposed. In the method, a novel replay buffer is designed to store non-uniform samples, thus accelerating the convergence rate.
- (2)
A novel human-machine cooperative driving scheme using DQN is designed to help the human driver avoid the potential collision with a dynamic pedestrian. The results show that the proposed L-HMC scheme can effectively help drivers avoid the pedestrian in emergencies in different scenarios with flexible strategies.
- (3)
Simulation results based on the human-machine cooperative driving are conducted. To obtain more accurate results, a simulation environment with a real vehicle dynamic model for human-machine cooperative driving is established.
The rest of this paper is structured as follows. Section 2 presents the problem formulation of pedestrian collision avoidance and the models of the vehicle kinematics and dynamics. Section 3 describes the Markov Decision Process (MDP) model for the pedestrian collision avoidance problem and proposes an improved DQN algorithm to solve the MDP. Then, Section 4 presents the human-machine cooperative driving scheme with DQN-based pedestrian collision avoidance. Experimental results using the PreScan platform are discussed in Section 5. Finally, the concluding remarks and future work are given in Section 6.
Section snippets
Problem formulation and research backgrounds
In this section, we will formulate the problem of pedestrian collision avoidance. Then, the MDP model, which is the foundation of the proposed DQN-based pedestrian collision avoidance approach, is introduced. Finally, the models of the vehicle’s kinematics and dynamics, which are used in the later simulation environment, will be briefly introduced.
DQN-Based Pedestrian collision avoidance approach
In this section, the problem of pedestrian collision avoidance is formulated as an MDP model. Then, we propose an improved DQN-based approach to solve the MDP problem and finally learn a near-optimal policy.
Human-Machine cooperative driving scheme
The on-line cooperative control algorithm for pedestrian collision avoidance in the human-machine cooperative driving scheme is shown in Algorithm 2. The algorithm is running in real-time, which only transfers the control ownership when the detected situation is dangerous to the pedestrian.
To explain clearly, the proposed scheme of human-machine cooperative driving when avoiding the unsafe crossing pedestrian is illustrated in Fig. 6. The trigger point P (whose calculation will be discussed
Simulation and performance evaluation
In this section, the performance of our method is evaluated. First, the simulation setup and parameters are introduced. Then, the simulation results of the DQN-based pedestrian collision avoidance approach are presented. In the end, the performance and further research topics of our human-machine cooperative driving scheme L-HMC are evaluated and analyzed.
Conclusion and future works
In this paper, the deep reinforcement learning-based pedestrian collision avoidance method provides a feasible and effective technical solution for the human-machine collaborative driving scheme of intelligent vehicles. To accelerate the convergence rate of offline DQN training, two replay buffers were designed in the improved DQN based pedestrian collision avoidance method. We proposed the online cooperative control algorithm to improve the ability of pedestrian collision avoidance in the case
CRediT authorship contribution statement
Junxiang Li: Writing - original draft, Software. Liang Yao: Data curation, Software. Xin Xu: Conceptualization, Formal analysis. Bang Cheng: Software. Junkai Ren: Writing - original draft, Software, Writing - review & editing.
Acknowledgement
This work is supported by the National Natural Science Foundation of China under Grants 61751311, U1564214, 61825305, and the National Key R&D Program of China under grant 2018YFB1305105.
References (47)
- et al.
Multiple controller switching concept for human-machine shared control of lane keeping assist systems
2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
(2018) - et al.
Direction-dependent optimal path planning for autonomous vehicles
Rob. Auton. Syst.
(2015) - et al.
Harmonic potential field path planning for high speed vehicles
2008 American Control Conference
(2008) - et al.
Reinforcement learning algorithms with function approximation: recent advances and applications
Inf. Sci. (Ny)
(2014) - et al.
Learning to drive a real car in 20 minutes
2007 Frontiers in the Convergence of Bioscience and Information Technologies
(2007) - et al.
Mastering the game of go with deep neural networks and tree search.
Nature
(2016) Global status report on road safety 2013: supporting a decade of action: summary
Technical Report
(2013)- et al.
The effects of age on crash risk associated with driver distraction
Int. J. Epidemiol.
(2017) Mobileye: The future of driverless cars
(2014)- et al.
Technical feasibility of advanced driver assistance systems (ADAS) for road traffic safety
Transp. Plann. Technol.
(2005)