当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Grounded action transformation for sim-to-real reinforcement learning
Machine Learning ( IF 4.3 ) Pub Date : 2021-05-13 , DOI: 10.1007/s10994-021-05982-z
Josiah P. Hanna , Siddharth Desai , Haresh Karnan , Garrett Warnell , Peter Stone

Reinforcement learning in simulation is a promising alternative to the prohibitive sample cost of reinforcement learning in the physical world. Unfortunately, policies learned in simulation often perform worse than hand-coded policies when applied on the target, physical system. Grounded simulation learning (gsl) is a general framework that promises to address this issue by altering the simulator to better match the real world (Farchy et al. 2013 in Proceedings of the 12th international conference on autonomous agents and multiagent systems (AAMAS)). This article introduces a new algorithm for gsl—Grounded Action Transformation (GAT)—and applies it to learning control policies for a humanoid robot. We evaluate our algorithm in controlled experiments where we show it to allow policies learned in simulation to transfer to the real world. We then apply our algorithm to learning a fast bipedal walk on a humanoid robot and demonstrate a 43.27% improvement in forward walk velocity compared to a state-of-the art hand-coded walk. This striking empirical success notwithstanding, further empirical analysis shows that gat may struggle when the real world has stochastic state transitions. To address this limitation we generalize gat to the stochastic gat (sgat) algorithm and empirically show that sgat leads to successful real world transfer in situations where gat may fail to find a good policy. Our results contribute to a deeper understanding of grounded simulation learning and demonstrate its effectiveness for applying reinforcement learning to learn robot control policies entirely in simulation.



中文翻译:

从地面进行动作转换以实现从模拟到真实的强化学习

模拟中的强化学习是有希望的替代现实世界中强化学习的高昂样本成本的一种有希望的选择。不幸的是,在目标物理系统上应用时,从模拟中学到的策略通常比手工编写的策略性能差。扎实的模拟学习(gsl)是一个通用框架,有望通过更改模拟器以更好地匹配现实世界来解决此问题(第12届国际自动代理和多代理系统国际会议论文集(Farchy et al.2013))。本文介绍了一种用于gsl的新算法(地面行动转换(GAT)),并将其应用于类人机器人的学习控制策略。我们在受控实验中评估我们的算法,在该实验中我们展示了该算法以使仿真中学习到的策略能够转移到现实世界中。然后,我们将算法应用于在人形机器人上学习快速的双足步行,并证明与先进的手动编码步行相比,向前步行速度提高了43.27%。尽管取得了惊人的经验成功,但进一步的经验分析表明,当现实世界具有随机状态转换时,盖特可能会遇到困难。为了解决这个限制,我们概括水道随机 GATsgat)算法和经验表明sgat导致在成功的情况下,现实世界转移水道可能无法找到一个很好的政策。我们的结果有助于更深入地了解基础的模拟学习,并证明其在应用强化学习完全学习模拟中的机器人控制策略方面的有效性。

更新日期:2021-05-14
down
wechat
bug