当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2019-11-01 , DOI: 10.1109/tcyb.2018.2853582
Zhen Ni , Naresh Malla , Xiangnan Zhong

The adaptive dynamic programming controller usually needs a long training period because the data usage efficiency is relatively low by discarding the samples once used. Prioritized experience replay (ER) promotes important experiences and is more efficient in learning the control process. This paper proposes integrating an efficient learning capability of prioritized ER design into heuristic dynamic programming (HDP). First, a one time-step backward state-action pair is used to design the ER tuple and, thus, avoids the model network. Second, a systematic approach is proposed to integrate the ER into both critic and action networks of HDP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial weight parameters and initial starting states for both traditional HDP and the proposed approach under the same simulation environment. The proposed approach improves the required average number of trials to succeed by 60.56% for cart-pole, and 56.89% for triple-link balancing tasks, in comparison with the traditional HDP approach. Also, we have added results of ER-based HDP for comparison. Moreover, theoretical convergence analysis is presented to guarantee the stability of the proposed control design.

中文翻译:

优先考虑基于启发式动态编程的学习系统的有用经验重播

自适应动态编程控制器通常需要较长的训练周期,因为通过丢弃曾经使用过的样本,数据使用效率相对较低。优先级的经验重播(ER)可以促进重要的经验,并且在学习控制过程中效率更高。本文提出将优先级ER设计的有效学习能力集成到启发式动态编程(HDP)中。首先,使用一个时步向后的状态-动作对来设计ER元组,从而避免了模型网络。其次,提出了一种系统的方法来将ER集成到HDP控制器设计的评论家和动作网络中。所提出的方法已通过两个案例研究进行了测试:车杆平衡任务和三连杆摆平衡任务。为了公平比较,我们在相同的模拟环境下,为传统的HDP和建议的方法设置了相同的初始权重参数和初始起始状态。与传统的HDP方法相比,拟议的方法将成功完成测试所需的平均试验次数提高了60.56%(三极杆平衡)和56.89%(三杆平衡任务)。此外,我们还添加了基于ER的HDP的结果以进行比较。此外,提出了理论收敛性分析以保证所提出的控制设计的稳定性。我们添加了基于ER的HDP的结果以进行比较。此外,提出了理论收敛性分析以保证所提出的控制设计的稳定性。我们添加了基于ER的HDP的结果以进行比较。此外,提出了理论收敛性分析以保证所提出的控制设计的稳定性。
更新日期:2019-11-01
down
wechat
bug