Learning to Discover Task-Relevant Features for Interpretable Reinforcement Learning,IEEE Robotics and Automation Letters

当前位置： X-MOL 学术 › IEEE Robot. Automation Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning to Discover Task-Relevant Features for Interpretable Reinforcement Learning
IEEE Robotics and Automation Letters ( IF 4.6 ) Pub Date : 2021-07-21 , DOI: 10.1109/lra.2021.3091885
Qiyuan Zhang , Xiaoteng Ma , Yiqin Yang , Chenghao Li , Jun Yang , Yu Liu , Bin Liang

Reinforcement Learning (RL) agents are often fed with large-dimensional observations to achieve the ideal performance in complex environments. Unfortunately, the massive observation space usually contains useless or even adverse features, which leads to low sample efficiency. Existing methods rely on domain knowledge and cross-validation to discover efficient features which are informative for decision-making. To minimize the impact of prior knowledge, we propose a temporal-adaptive feature attention algorithm (TAFA). We adopt a non-linear attention module, automatically choosing task-relevant components of hand-crafted state features without any domain knowledge. Our experiments on MuJoCo and TORCS tasks show that the agent achieves competitive performance with state-of-the-art methods while successfully identifying the most task-relevant features for free. We believe our work takes a step towards the interpretability of RL. Our code is available at https://github.com/QiyuanZhang19/Temporal-Adaptive-Feature-Attention/tree/master.

中文翻译：

学习发现可解释强化学习的任务相关特征

强化学习（RL）代理通常会接受大维观察，以在复杂环境中实现理想的性能。不幸的是，巨大的观测空间通常包含无用甚至不利的特征，这导致样本效率低下。现有方法依赖于领域知识和交叉验证来发现为决策提供信息的有效特征。为了最大限度地减少先验知识的影响，我们提出了一种时间自适应特征注意算法（TAFA）。我们采用非线性注意模块，自动选择手工制作的状态特征的任务相关组件，而无需任何领域知识。我们在 MuJoCo 和 TORCS 任务上的实验表明，该代理通过最先进的方法实现了具有竞争力的性能，同时成功地免费识别了与任务最相关的特征。我们相信我们的工作朝着强化学习的可解释性迈出了一步。我们的代码位于 https://github.com/QiyuanZhang19/Temporal-Adaptive-Feature-Attention/tree/master。

更新日期：2021-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文