当前位置: X-MOL 学术IEEE Trans. Veh. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Autonomous PEV Charging Scheduling Using Dyna-Q Reinforcement Learning
IEEE Transactions on Vehicular Technology ( IF 6.8 ) Pub Date : 2020-11-01 , DOI: 10.1109/tvt.2020.3026004
Fan Wang , Jie Gao , Mushu Li , Lian Zhao

This paper proposes a demand response method to reduce the long-term charging cost of single plug-in electric vehicles (PEV) while overcoming obstacles such as the stochastic nature of the user's driving behaviour, traffic condition, energy usage, and energy price. The problem is formulated as a Markov Decision Process (MDP) with an unknown transition probability matrix and solved using deep reinforcement learning (RL) techniques. The proposed method does not require any initial data on the PEV driver's behaviour and shows improvement on learning speed when compared to a pure model-free reinforcement learning method. A combination of model-based and model-free learning methods called Dyna-Q reinforcement learning is utilized in our strategy. Every time a real experience is obtained, the model is updated, and the RL agent will learn from both the real experience and “imagined” experiences from the model. Due to the vast amount of state space, a table-lookup method is impractical, and a value approximation method using deep neural networks is employed for estimating the long-term expected reward of all state-action pairs. An average of historical price and a long short-term memory (LSTM) network are used to predict future price. Simulation results demonstrate the effectiveness of this approach and its ability to reach an optimal policy quicker while avoiding state of charge (SOC) depletion during trips when compared to existing PEV charging schemes.

中文翻译:

使用 Dyna-Q 强化学习的自主 PEV 充电调度

本文提出了一种需求响应方法,以降低单一插电式电动汽车 (PEV) 的长期充电成本,同时克服用户驾驶行为、交通状况、能源使用和能源价格的随机性等障碍。该问题被表述为具有未知转移概率矩阵的马尔可夫决策过程 (MDP),并使用深度强化学习 (RL) 技术解决。所提出的方法不需要任何关于 PEV 驾驶员行为的初始数据,并且与纯无模型强化学习方法相比,学习速度有所提高。我们的策略中使用了称为 Dyna-Q 强化学习的基于模型和无模型学习方法的组合。每次获得真实体验时,都会更新模型,RL 代理将从模型的真实经验和“想象”经验中学习。由于状态空间巨大,查表方法不切实际,使用深度神经网络的值逼近方法用于估计所有状态-动作对的长期预期回报。历史价格的平均值和长短期记忆 (LSTM) 网络用于预测未来价格。仿真结果证明了这种方法的有效性及其与现有 PEV 充电方案相比,能够更快地达到最佳策略,同时避免行程中的充电状态 (SOC) 耗尽。使用深度神经网络的值近似方法用于估计所有状态-动作对的长期预期回报。历史价格的平均值和长短期记忆 (LSTM) 网络用于预测未来价格。仿真结果证明了这种方法的有效性及其与现有 PEV 充电方案相比,能够更快地达到最佳策略,同时避免行程中的充电状态 (SOC) 耗尽。使用深度神经网络的值近似方法用于估计所有状态-动作对的长期预期回报。历史价格的平均值和长短期记忆 (LSTM) 网络用于预测未来价格。仿真结果证明了这种方法的有效性及其与现有 PEV 充电方案相比,能够更快地达到最佳策略,同时避免行程中的充电状态 (SOC) 耗尽。
更新日期:2020-11-01
down
wechat
bug