当前位置: X-MOL 学术Knowl. Eng. Rev. › 论文详情
Introspective Q-learning and learning from demonstration
Knowledge Engineering Review ( IF 0.814 ) Pub Date : 2019-01-01 , DOI: 10.1017/s0269888919000031
Mao Li; Tim Brys; Daniel Kudenko

One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.
更新日期:2020-01-04

 

全部期刊列表>>
化学/材料学中国作者研究精选
Springer Nature 2019高下载量文章和章节
《科学报告》最新环境科学研究
ACS材料视界
自然科研论文编辑服务
中南大学国家杰青杨华明
剑桥大学-
中国科学院大学化学科学学院
材料化学和生物传感方向博士后招聘
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug