Elsevier

Information Sciences

Volume 532, September 2020, Pages 110-124
Information Sciences

Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving

https://doi.org/10.1016/j.ins.2020.03.105Get rights and content

Abstract

With the development of intelligent driving technology, human-machine cooperative driving is significant to improve driving safety in abnormal situations, such as distraction or incorrect operations of drivers. For human-machine cooperative driving, the capacity of pedestrian collision avoidance is fundamental and important. This paper proposes a novel learning-based human-machine cooperative driving scheme (L-HMC) with active collision avoidance capacity using deep reinforcement learning. Firstly, an improved deep Q-network (DQN) method is designed to learn the optimal driving policy for pedestrian collision avoidance. In the improved DQN method, two replay buffers with nonuniform samples are designed to shorten the learning process of the optimal driving policy. Then, a human-machine cooperative driving scheme is proposed to assist human drivers with the learned driving policy for pedestrian collision avoidance when the driving behavior of human drivers is dangerous to the pedestrian. The effectiveness of the human-machine cooperative driving scheme is verified on the simulation platform PreScan using a real vehicle dynamic model. The results demonstrate that the deep reinforcement learning-based method can learn an effective driving policy for pedestrian collision avoidance with a fast convergence rate. Meanwhile, the proposed human-machine cooperative driving scheme L-HMC can avoid potential pedestrian collisions through flexible policies in typical scenarios, therefore improving driving safety.

Introduction

Ensuring pedestrian safety is one of the most fundamental and important requirements for driving. However, according to the latest report on road safety records from the World Health Organization, more than 27,000 pedestrians lose their lives each year worldwide, accounting for 22% of all road traffic deaths [1]. Apart from the pedestrian reasons (e.g., disobey traffic rules), most traffic deaths were caused by drivers’ reasons, such as unskilled driving, incorrect operating habits, distraction in driving or slow reaction to changing environments [2]. Improving the capacity of avoiding collisions with pedestrians for intelligent vehicles can effectively reduce traffic accidents and save human lives.

Currently, the mainstream of methods for pedestrian collision avoidance is to apply the advanced driver assistant system (ADAS), which includes forward collision warning system (FCW), automated emergency braking system (AEBS) and pedestrian protection system (PPS). FCW usually uses environmental sensors to detect an imminent crash and then alerts the driver to brake by sound or light [3]. AEBS can measure the relative distance and velocity between the vehicle and pedestrian during driving. When the driver brakes too late or the braking force is insufficient, the system assists the driver to avoid or mitigate collisions by emergency braking [4]. PPS redesigns the bumper, bonnet, and windshield of vehicles so that the bonnet can bounce airbags to cushion the pedestrian when the pedestrian is hit in a car-pedestrian crash [5]. The above ADAS systems can protect pedestrians to a certain extent. However, in some environments these systems are unable to flexibly choose feasible lane-changing maneuvers to avoid a potential collision in advance, thus making the driving dull and restricting the capacity of pedestrian collision avoidance [6]. More and more research begins to pay attention to making intelligent vehicles more reliable and adaptive to human drivers [7], [8], [9].

In order to improve the ability to avoid pedestrians and the safety level of driving in different environments, the human-machine cooperative driving scheme has become a new trend [10], [11]. Some methods for pedestrian collision avoidance with the human-machine cooperative driving scheme have been proposed. A Stackelberg-based cooperative control scheme was proposed in [12], which used a leader-follower structure to reduce human-machine conflicts. This method needs to model the action of human drivers as a fixed control strategy (e.g., PD controller), thus it is unable to deal with various driving habits in reality. In [13], a fictive driver activity parameter was introduced. This method can compute the steering assistance actions according to the driver’s real-time behaviors by the Takagi-Sugeno fuzzy control approach. In [14], an emergency steering control system was presented based on optimized trajectories. The optimized trajectory, which is derived from a 5th order polynomial, was selected by a designed performance function. In [15], the driver status and maneuver decisions were considered in cooperative trajectory planning. Besides, some traditional obstacle avoidance strategies in autonomous driving could also be used. These strategies contain: grid-based methods [16], [17], potential field methods [18] and discrete optimization methods [19], [20]. However, these methods are not suitable in the environment with dynamic pedestrians.

Reinforcement learning (RL) has shown great potential in the field of intelligent vehicles [21]. For example, Riedmiller et al. [22] trained an agent to drive a vehicle by following a GPS trajectory without obstacles by RL. Huang et al. [23] and Garcia et al [24] improved the original Q-learning algorithm for robots’ navigation and reduced the probability of collision. Mao et al. [25] proposed a novel network architecture which jointly learn pedestrian detection as well as the given extra feature. Although the above RL-based obstacle avoidance methods can decouple obstacles avoidance from visual information, they require numerous parameters to be manually tuned to plan available paths and cannot perform well when transferring to a new environment. With the development of deep learning (DL) and its excellent ability in processing high-dimensional inputs, deep reinforcement learning (DRL) methods can effectively overcome the problem of high-dimensional inputs, thus received lots of research interests [26], [27], [28], [29], [30], [31]. DRL has been used in the fields of games [32], [33] and robotic manipulators [34]. Among recent DRL methods, deep Q-network (DQN) method [35] is a simple and effective method, thus various DRL methods (e.g., [36], [37], [38], [39]) have been developed based on the idea of DQN.

DRL can be a useful framework to learn pedestrian collision avoidance policy in the HMC system, which improves the intelligence and driving safety of vehicles. As DRL does not leverage any prior knowledge of the environment and human-designed rules, it has the potential to solve more general problems (e.g., pedestrian collision avoidance) than traditional rule-based methods. The policy learned by the DRL method is more consistent with human driving habits. Therefore, the human driver’s driving experience will be improved. Moreover, DRL methods can learn end-to-end control policies without human-designed features. With the consideration of these advantages, some researchers tried to exploit DRL methods in certain situations for improving driving safety very recently. In [40], an autonomous braking system via DRL was presented. Similarly, an improved DRL method was used to determine to accelerate, decelerate or maintain speed for vehicles in [41]. In [42], an automated lane change behavior in structured highways was learned by an improved DRL method. To our knowledge, except for these related research, there have been few works to exploit DRL to deal with the problem of pedestrian collision avoidance under the human-machine cooperative driving scheme.

In this paper, a novel learning-based human-machine cooperative driving scheme (L-HMC) with pedestrian collision avoidance capacity using deep reinforcement learning is proposed. In the scheme, the policy for pedestrian collision avoidance is learned offline by an improved deep Q-network (DQN) method. Then, the human-machine cooperative driving scheme assists human drivers online to avoid a potential collision with pedestrians using the learned policy. Note that the proposed L-HMC could also be combined with abnormal status recognition (e.g., drowsiness, distraction) [43], [44], [45] of drivers to further improve the safety of human driving. The effectiveness of the proposed scheme has been successfully verified on the human-machine cooperative driving platform built in PreScan.

The main contributions of our work can be summarized into three aspects:

  • (1)

    To learn a driving policy for pedestrian collision avoidance more efficiently, an improved deep reinforcement learning method (specifically, the DQN method) is proposed. In the method, a novel replay buffer is designed to store non-uniform samples, thus accelerating the convergence rate.

  • (2)

    A novel human-machine cooperative driving scheme using DQN is designed to help the human driver avoid the potential collision with a dynamic pedestrian. The results show that the proposed L-HMC scheme can effectively help drivers avoid the pedestrian in emergencies in different scenarios with flexible strategies.

  • (3)

    Simulation results based on the human-machine cooperative driving are conducted. To obtain more accurate results, a simulation environment with a real vehicle dynamic model for human-machine cooperative driving is established.

The rest of this paper is structured as follows. Section 2 presents the problem formulation of pedestrian collision avoidance and the models of the vehicle kinematics and dynamics. Section 3 describes the Markov Decision Process (MDP) model for the pedestrian collision avoidance problem and proposes an improved DQN algorithm to solve the MDP. Then, Section 4 presents the human-machine cooperative driving scheme with DQN-based pedestrian collision avoidance. Experimental results using the PreScan platform are discussed in Section 5. Finally, the concluding remarks and future work are given in Section 6.

Section snippets

Problem formulation and research backgrounds

In this section, we will formulate the problem of pedestrian collision avoidance. Then, the MDP model, which is the foundation of the proposed DQN-based pedestrian collision avoidance approach, is introduced. Finally, the models of the vehicle’s kinematics and dynamics, which are used in the later simulation environment, will be briefly introduced.

DQN-Based Pedestrian collision avoidance approach

In this section, the problem of pedestrian collision avoidance is formulated as an MDP model. Then, we propose an improved DQN-based approach to solve the MDP problem and finally learn a near-optimal policy.

Human-Machine cooperative driving scheme

The on-line cooperative control algorithm for pedestrian collision avoidance in the human-machine cooperative driving scheme is shown in Algorithm 2. The algorithm is running in real-time, which only transfers the control ownership when the detected situation is dangerous to the pedestrian.

To explain clearly, the proposed scheme of human-machine cooperative driving when avoiding the unsafe crossing pedestrian is illustrated in Fig. 6. The trigger point P (whose calculation will be discussed

Simulation and performance evaluation

In this section, the performance of our method is evaluated. First, the simulation setup and parameters are introduced. Then, the simulation results of the DQN-based pedestrian collision avoidance approach are presented. In the end, the performance and further research topics of our human-machine cooperative driving scheme L-HMC are evaluated and analyzed.

Conclusion and future works

In this paper, the deep reinforcement learning-based pedestrian collision avoidance method provides a feasible and effective technical solution for the human-machine collaborative driving scheme of intelligent vehicles. To accelerate the convergence rate of offline DQN training, two replay buffers were designed in the improved DQN based pedestrian collision avoidance method. We proposed the online cooperative control algorithm to improve the ability of pedestrian collision avoidance in the case

CRediT authorship contribution statement

Junxiang Li: Writing - original draft, Software. Liang Yao: Data curation, Software. Xin Xu: Conceptualization, Formal analysis. Bang Cheng: Software. Junkai Ren: Writing - original draft, Software, Writing - review & editing.

Acknowledgement

This work is supported by the National Natural Science Foundation of China under Grants 61751311, U1564214, 61825305, and the National Key R&D Program of China under grant 2018YFB1305105.

References (47)

  • Y. Fan et al.

    An autonomous dynamic collision avoidance control method for unmanned surface vehicle in unknown ocean environment

    Int. J. Adv. Rob. Syst.

    (2019)
  • K.A. Brookhuis et al.

    Behavioural impacts of advanced driver assistance systems–an overview

    Eur. J. Transp. Infrastruct.Res.

    (2019)
  • M. Dirik et al.

    Visual-servoing based global path planning using interval type-2 fuzzy logic control

    Axioms

    (2019)
  • J. Rivera et al.

    Design and implementation of intelligent controllers in soft processors for the walking of a biped robot

    Computacin y Sistemas

    (2018)
  • M. Sanchez et al.

    Generalized type-2 fuzzy systems for controlling a mobile robot and a performance comparison with interval type-2 and type-1 fuzzy systems

    Expert Syst. Appl.

    (2015)
  • S. Zieba et al.

    Using adjustable autonomy and human–machine cooperation to make a human–machine system resilient–application to a ground robotic system

    Inf. Sci. (Ny)

    (2011)
  • K. Yang et al.

    Application of Stackelberg game theory for shared steering torque control in lane change maneuver

    IEEE Intell. Veh. Symposium Proc.

    (2018)
  • A.T. Nguyen et al.

    Driver-automation cooperative approach for shared steering control under multiple system constraints: design and experiments

    IEEE Trans. Ind. Electron.

    (2017)
  • Y. Wang et al.

    Emergency Steering Evasion Torque Assistance Based on Optimized Trajectory

    Technical Report

    (2019)
  • A.M. Bencloucif et al.

    Cooperative trajectory planning for haptic shared control between driver and automation in highway driving

    IEEE Trans. Ind. Electron.

    (2019)
  • L. Zuo et al.

    A hierarchical path planning approach and least-squares policy iteration for mobile robots

    Neurocomputing

    (2015)
  • K. Chu et al.

    Local path planning for off-road autonomous driving with avoidance of static obstacles

    IEEE Trans. Intell. Transp. Syst.

    (2012)
  • M.D. Phung et al.

    Enhanced discrete particle swarm optimization path planning for uav vision-based surface inspection

    Autom. Constr.

    (2017)
  • Cited by (0)

    View full text