当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-19 , DOI: arxiv-2101.07415
Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

We introduce ES-ENAS, a simple neural architecture search (NAS) algorithm for the purpose of reinforcement learning (RL) policy design, by combining Evolutionary Strategies (ES) and Efficient NAS (ENAS) in a highly scalable and intuitive way. Our main insight is noticing that ES is already a distributed blackbox algorithm, and thus we may simply insert a model controller from ENAS into the central aggregator in ES and obtain weight sharing properties for free. By doing so, we bridge the gap from NAS research in supervised learning settings to the reinforcement learning scenario through this relatively simple marriage between two different lines of research, and are one of the first to apply controller-based NAS techniques to RL. We demonstrate the utility of our method by training combinatorial neural network architectures for RL problems in continuous control, via edge pruning and weight sharing. We also incorporate a wide variety of popular techniques from modern NAS literature, including multiobjective optimization and varying controller methods, to showcase their promise in the RL field and discuss possible extensions. We achieve >90% network compression for multiple tasks, which may be special interest in mobile robotics with limited storage and computational resources.

中文翻译:

ES-ENAS:将进化策略与神经架构搜索相结合,无需额外费用即可进行强化学习

我们引入ES-ENAS,这是一种用于增强学习(RL)策略设计的简单神经体系结构搜索(NAS)算法,通过以高度可扩展和直观的方式结合了进化策略(ES)和高效NAS(ENAS)。我们的主要见解是注意到ES已经是一种分布式黑盒算法,因此我们可以简单地将ENAS中的模型控制器插入ES的中央聚合器中,并免费获得权重分配属性。通过这样做,我们通过两个不同研究领域之间的这种相对简单的结合,弥合了从监督学习环境中的NAS研究到强化学习场景的鸿沟,并且是将基于控制器的NAS技术最早应用于RL的人之一。通过训练用于边缘控制和权重共享的连续控制中的RL问题的组合神经网络体系结构,我们证明了该方法的实用性。我们还结合了现代NAS文献中的多种流行技术,包括多目标优化和各种控制器方法,以展示其在RL领域的前景并讨论可能的扩展。我们为多个任务实现了> 90%的网络压缩,这在存储和计算资源有限的移动机器人中可能会引起特别关注。
更新日期:2021-01-20
down
wechat
bug