Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning,IEEE Robotics and Automation Letters

当前位置： X-MOL 学术 › IEEE Robot. Automation Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
IEEE Robotics and Automation Letters ( IF 5.2 ) Pub Date : 2020-10-01 , DOI: 10.1109/lra.2020.3013937
Yijiong Lin , Jiancong Huang , Matthieu Zimmer , Yisheng Guan , Juan Rojas , Paul Weng

Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques: (i) Kaleidoscope Experience Replay exploits reflectional symmetries and (ii) Goal-augmented Experience Replay which takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Performance gains are also observed in similar tasks with obstacles and we successfully deployed a trained policy on a real Baxter robot. Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL. Code, video, and supplementary materials are available at [1].

中文翻译：

不变变换经验回放：深度强化学习的数据增强

深度强化学习 (RL) 是一种很有前途的自适应机器人控制方法，但其目前在机器人技术中的应用目前受到高样本要求的阻碍。为了缓解这个问题，我们建议利用机器人任务中存在的对称性。直观地说，观察轨迹的对称性定义了使可行 RL 轨迹空间不变的变换，可用于生成新的可行轨迹，可用于训练。基于这种数据增强思想，我们制定了一个通用框架，称为不变变换体验重放，我们提出了两种技术：（i）万花筒体验重放利用反射对称性和（ii）利用宽松目标定义的目标增强体验重放. 在 OpenAI Gym 的 Fetch 任务中，我们的实验结果表明学习率和成功率显着提高。特别是，我们在多目标设置中分别在推动、滑动和拾放任务中获得了 13、3 和 5 倍的加速。在有障碍物的类似任务中也观察到了性能提升，我们成功地在真正的 Baxter 机器人上部署了训练有素的策略。我们的工作表明，强化学习轨迹上的不变变换是加速深度强化学习的一种很有前途的方法。代码、视频和补充材料可在 [1] 处获得。在有障碍物的类似任务中也观察到了性能提升，我们成功地在真正的 Baxter 机器人上部署了训练有素的策略。我们的工作表明，RL 轨迹上的不变变换是加速深度 RL 学习的一种很有前途的方法。代码、视频和补充材料可在 [1] 处获得。在有障碍物的类似任务中也观察到了性能提升，我们成功地在真正的 Baxter 机器人上部署了训练有素的策略。我们的工作表明，RL 轨迹上的不变变换是加速深度 RL 学习的一种很有前途的方法。代码、视频和补充材料可在 [1] 处获得。

更新日期：2020-10-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>