Deep reinforcement learning based energy management strategy for range extend fuel cell hybrid electric vehicle

https://doi.org/10.1016/j.enconman.2023.116678Get rights and content

Highlight

  • A new topology structure and operating pattern using fuel cell as range extenders is proposed.

  • An energy management strategy framework based on deep reinforcement learning with dual DDPG switching is proposed.

  • The previous action guidance mechanism can enhance the global optimal search and the stability of convergence.

  • Validation results denote that fuel economy and durability of battery and fuel cell are improved in both two modes.

Abstract

To meet the power and long-range driving requirements of the vehicle, this paper presents a dual mode operation scheme for a range extend fuel cell hybrid vehicle for the first time, with an in-depth study of the pure electric mode and the range extend mode. The deep deterministic policy gradient algorithm is a well-known deep reinforcement learning algorithm that can solve complex nonlinear problems. To achieve the optimal power distribution among energy sources in the two modes, a dual deep deterministic policy gradient algorithm framework is proposed for the first time in this paper. In addition, a pervious action guidance mechanism is proposed to enable networks to approximate the action value function more efficiently in training. The training results show that the adopted previous action guidance mechanism helps to improve the learning convergence and exploration ability. The validation results show that the proposed strategy improves the operating economy by about 30% compared to the rule-based strategy, reduces the average fuel cell output fluctuation to less than 100 W, and reduces the fuel cell lifetime loss greatly. It is hoped that the proposed new structure, patterns, and energy management strategy will provide more ideas for scholars in future research.

Introduction

Nowadays, a large amount of polluting gases emitted by conventional fuel vehicles directly endanger human health, and the greenhouse effect, which is causing glaciers melting, sea level rise and La Niña, poses a serious challenges to human survival. These harmful effects have prompted the scientific community to turn to sustainable energy sources [1], [2]. The popularity of electric vehicles offers a promising solution to the growing global greenhouse effect [3]. In recent years, scholars have predicted that hydrogen fuel cells can overcome the shortcomings of conventional oil vehicles and pure electric vehicles, making hydrogen the transportation fuel of the future [4], and therefore hydrogen fuel cells are increasingly studied in hybrid electric vehicles.

Fuel cell hybrid electric vehicle (FCHEV) is called a hybrid electric vehicle equipped with a fuel cell system (FCS) [5]. The characteristics of high energy density and good energy buffering effect possessed by batteries [6], and the characteristics of high-power density and long cycle life possessed by supercapacitors [7]. Both as auxiliary energy sources, the advantages of lithium batteries and supercapacitors happen to provide fuel cell vehicles with the advantages of high energy density and high dynamic response, which have outstanding advantages such as higher energy efficiency and lower emissions compared to internal combustion engine vehicles. Hybrid power systems combining fuel cells and different types of batteries are described in [8], [9], [10] papers: fuel cell and NiMH battery [8], fuel cell and lead-acid battery [9], and fuel cell and lithium battery [10]. P. Rodatz. et al. proposed a hybrid power system composed of fuel cell and supercapacitor [11]. Zhu M. et al. proposed a hybrid power system with fuel cell as the main energy source and battery and supercapacitor as the auxiliary energy source of hybrid power [12]. To solve the problems of short distance driving power, long distance driving mileage concern and slow dynamic response of fuel cells. In this study, the fuel cell is designed as the range extender to form the hybrid power system of the FCHEV with lithium battery and supercapacitor.

While the increase in the number of energy sources with different characteristics satisfy the energy demands and power requirements, it also directly leads to a more complex FCHEV power system and therefore a powerful and advanced energy management strategy (EMS) is required. The energy management problem of hybrid vehicles can be equated to the problem of energy management system to distribute different energy sources during vehicle operation. Usually, EMSs are divided into three categories: rule-based, optimization-based, and learning-based EMS [13], [14], [15].

Rule-based EMSs, which need to be developed with rich expertise, engineering knowledge, can be divided into deterministic rule-based method and fuzzy rule-based method. Deterministic rule-based methods include (on/off) control strategies, state machine control strategies, etc. The fuzzy rule-based method has such as conventional fuzzy strategy, fuzzy adaptive strategy, etc. The advantage of this EMS is that the development process is easier, but the disadvantage is that the robustness and universality of the algorithm cannot be guaranteed, and the flexibility and optimality under different driving conditions of the vehicle are also limited [16].

Optimization-based EMSs can be divided into global optimization and real-time optimization. Global optimization methods such as dynamic programming (DP), genetic algorithm, linear programming, etc. Real-time optimization methods are such as equivalent fuel consumption minimization, model predictive control (MPC), etc. DP, as a representative of global optimization strategy, divides the big problems into small ones, and obtains the optimal solution step by step from bottom to top, which is often applied to the vehicle EMS control problem, but the global optimization EMS often falls into the ‘dimensional disaster’ problem in practice because of the large computational workload of the controller [17]. Among the real-time optimization algorithms, MPC is a representative method that can be applied to linear and nonlinear systems and its finite-time domain rolling optimization feature ensures that the calculation workload of the controller is limited. However, the control effect of this method largely depends on the accuracy of model prediction, so it has not been widely applied to the industry [18].

Learning-based EMSs, such as reinforcement learning (RL), can learn from historical experience and gradually optimize the control scheme by interacting with the environment, providing a substitute solution to challenging energy management control problems in real-world environments [19]. In contrast to rule-based EMSs, reinforcement learning algorithms are data-driven learning algorithms that do not rely on expert experience to set rules in advance and can achieve iterative optimization by relying only on the data they collected [20]. Meanwhile, the core idea of reinforcement learning algorithm comes from dynamic planning algorithm, and good training results can achieve control effects close to the global optimum. Compared with the dynamic programming, RL has good optimization and learning [21]. In addition, the combination of RL and other algorithms such as neural network can effectively reduce the amount of calculation based on realizing real-time control of vehicles [22]. Q learning (QL) is a famous and effective RL algorithm, which has been widely applied in vehicle energy management strategy. Its main idea is to store Q-values in a Q-table of state and action components, and then select the action that can get the most benefit according to the Q-values. Xu B. et al. [23] proposed a minimization of energy consumption and battery aging of battery/supercapacitor electric vehicles using QL strategy, and the results showed that compared with pure lithium battery vehicles without supercapacitors, the QL strategy slowed down the rate of battery degradation while increasing the range of the vehicle. However, the ‘dimensional catastrophe’ of QL makes the algorithm inappropriate for optimization problems in high-dimensional state spaces and action spaces. To solve this problem, Wu J. et al. [24] used a deep Q learning (DQL) algorithm using deep reinforcement learning (DRL) on EMS, which accelerates the training time and convergence speed compared to QL strategy, The idea of approximating Q-value by using deep neural network effectively avoids the limitation of state space discretization, but the performance of the EMS will also be affected by the action space discretization. To better perform continuous control tasks, Wu Y. et al. [25] used deep deterministic policy gradient (DDPG) to conduct energy management research on PHEBs of series parallel plug-in hybrid electric buses and confirmed that DDPG can avoid discretization errors compared with most EMSs of RL, obtain more reliable control strategies, and its performance is close to the global optimal dynamic programming. Li W. et al. [26] similarly used DDPG to explore a cloud-based multi-objective energy management strategy for hybrid architectures and confirmed that DDPG not only achieves higher convergence speed but also has better results in minimizing energy losses and aging costs manifested. However, DRL algorithms such as DQL and DDPG with actor-critic network architectures have the problem of difficult policy convergence. To address such problems, Li X. et al. [27] combined human experience with machine learning to design a supervised expert loss function for the DRL model, and the results demonstrated that the use of a supervised expert loss mechanism provided favorable guidance for policy improvement and improved learning convergence performance. Liu Y. et al. [28] used an expert experience-guided DDPG algorithm for multi-degree-of-freedom fruit picking robots for path planning, and simulation results showed that the expert experience-guided algorithm can effectively improve model performance and learning efficiency at the beginning of training. This kind of DRL guided by expert experience is also applicable to the EMS of fuel cell hybrid vehicles. Hu H. et al. [29] proposed expert experience-guided DDPG to manage the energy between three energy sources of fuel cell, lithium battery, and supercapacitor for fuel cell hybrid vehicle, and the results ultimately reduced the equivalent hydrogen consumption while improving the efficiency of FCS compared to the fuzzy control-based EMS.

As far as the author knows, in the current study of EMSs for FCHEV, researchers have taken fuel cell as the main energy source and have not yet taken fuel cell as range extenders in researches, whether it is the combination of hydrogen fuel cells and batteries (FC + BT) or the combination of hydrogen fuel cells and batteries and supercapacitors (FC + BT + SC). However, the FCHEV vehicle with extended range capabilities may be the destination of future vehicles, and the purpose of this paper is to fill the above research gaps. A range extend fuel cell hybrid electric vehicle (REFCHEV) is proposed for the first time, and a dual DDPG EMS framework is proposed to explore the optimal continuous power distribution scheme for different modes of REFCHEV. A previous action guided DDPG (PA-DDPG) is investigated for the first time, which can better guide the learning and exploration of agents by introducing previous actions into the reward function. The REFCHEV was trained under a combination of suburban, urban and highway conditions. The training results show that the proposed EMS can deal with different modes of continuous power allocation of different modes, improve the operating efficiency and economy of the whole system, and improve the life span of lithium battery and fuel cell.

In this study, a REFCHEV is proposed in the first part. The remaining articles are structured as follows: Section 2 introduces the working patterns and the powertrain modelling of REFCHEV. Section 3 describes the DDPG algorithm structure, framework, and algorithm details. Section 4 reports the best simulation results of the Simulink model of REFCHEV under different operation conditions. Section 5 summarizes the research.

Section snippets

Configuration of REFCHEV powertrain and working pattern

In this study, lithium battery, supercapacitor and fuel cell are selected as the energy source of hybrid electric vehicle, and the fuel cell is not used as the main energy source and battery and supercapacitor as the auxiliary energy source. In most cases, battery is used as the main energy source to cooperate with supercapacitor for daily commuting, while fuel cell as range extender only run when the battery is insufficient, which is called REFCHEV. The powertrain of REFCHEV is shown in Fig. 1

EMS based on deep deterministic policy gradient

According to RL algorithm theory, the learning process can be seen as a trial process, where the agent selects an action at for the environment, and the environment is transformed to a new state st+1 by the state st after the action, and an instantaneous reward rt is generated to the agent, which selects the next action at+1 based on the instantaneous reward rt and the current state of the environment. The agent relies on its own experience to learn to acquire knowledge and then improve the

Driving cycle and simulation results

The study proposes that the pure electric mode of REFCHEV can meet the daily commuting demand and the range extend mode can meet the long-distance driving demand, so this section designs a special working condition to simulate the daily commuting and long-distance driving situations and explain the two operating patterns. Finally, the simulation results of PA-DDPG EMS and rule-based EMS are also compared and illustrated.

Conclusions

In this paper, a REFCHEV is firstly proposed to determine the parameters of each energy source and explain its operating patterns. Unlike pure electric vehicles and ordinary fuel cell hybrid vehicles, the capacity of the battery pack of the REFCHEV can meet the commute to work for a week or even more than a week in most cities, and the range extend mode can provide energy security for long-distance driving, effectively solving the problems of short-distance driving power and long-distance

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (38)

Cited by (0)

View full text