当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Autonomous navigation of stratospheric balloons using reinforcement learning
Nature ( IF 64.8 ) Pub Date : 2020-12-02 , DOI: 10.1038/s41586-020-2939-8
Marc G. Bellemare , Salvatore Candido , Pablo Samuel Castro , Jun Gong , Marlos C. Machado , Subhodeep Moitra , Sameera S. Ponda , Ziyu Wang

Efficiently navigating a superpressure balloon in the stratosphere1 requires the integration of a multitude of cues, such as wind speed and solar elevation, and the process is complicated by forecast errors and sparse wind measurements. Coupled with the need to make decisions in real time, these factors rule out the use of conventional control techniques2,3. Here we describe the use of reinforcement learning4,5 to create a high-performing flight controller. Our algorithm uses data augmentation6,7 and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems8. We deployed our controller to station Loon superpressure balloons at multiple locations across the globe, including a 39-day controlled experiment over the Pacific Ocean. Analyses show that the controller outperforms Loon's previous algorithm and is robust to the natural diversity in stratospheric winds. These results demonstrate that reinforcement learning is an effective solution to real-world autonomous control problems in which neither conventional methods nor human intervention suffice, offering clues about what may be needed to create artificially intelligent agents that continuously interact with real, dynamic environments.

中文翻译:

使用强化学习的平流层气球自主导航

在平流层中有效地导航超压气球 1 需要整合多种线索,例如风速和太阳高度,并且该过程因预报错误和稀疏的风测量而变得复杂。再加上需要实时做出决策,这些因素排除了使用传统控制技术 2,3 的可能性。在这里,我们描述了使用强化学习 4,5 来创建高性能飞行控制器。我们的算法使用数据增强 6,7 和自校正设计来克服从不完美数据中进行强化学习的关键技术挑战,这已被证明是其应用于物理系统的主要障碍 8。我们将控制器部署到全球多个地点的 Loon 超压气球,包括在太平洋上空进行的为期 39 天的对照实验。分析表明,控制器优于 Loon 之前的算法,并且对平流层风的自然多样性具有鲁棒性。这些结果表明,强化学习是解决现实世界自主控制问题的有效解决方案,在这些问题中,传统方法和人工干预都不够,为创建与真实动态环境持续交互的人工智能代理可能需要什么提供线索。
更新日期:2020-12-02
down
wechat
bug