Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft,Cognitive Systems Research

当前位置： X-MOL 学术 › Cogn. Syst. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft
Cognitive Systems Research ( IF 2.1 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.cogsys.2020.08.012
Alexey Skrynnik , Aleksey Staroverov , Ermek Aitygulov , Kirill Aksenov , Vasilii Davydov , Aleksandr I. Panov

We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In this paper, we present the details of the HDQfD algorithm and give the experimental results in the Minecraft domain.

中文翻译：

来自 Minecraft 中不完美演示的分层深度 Q 网络

我们展示了在 MineRL 竞赛中获得第一名的分层深度 Q 网络 (HDQfD)。HDQfD 处理不完美的演示并利用专家轨迹的层次结构。我们介绍了从演示数据中提取有效的元动作和子目标序列的过程。我们提出了一种结构化的依赖于任务的重放缓冲区和自适应优先级排序技术，允许 HDQfD 代理逐渐从缓冲区中删除质量较差的专家数据。在本文中，我们介绍了 HDQfD 算法的细节，并给出了 Minecraft 领域的实验结果。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11