当前位置: X-MOL 学术Phys. Rev. E › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to flock through reinforcement.
Physical Review E ( IF 2.2 ) Pub Date : 2020-07-02 , DOI: 10.1103/physreve.102.012601
Mihir Durve 1 , Fernando Peruani 2 , Antonio Celani 3
Affiliation  

Flocks of birds, schools of fish, and insect swarms are examples of the coordinated motion of a group that arises spontaneously from the action of many individuals. Here, we study flocking behavior from the viewpoint of multiagent reinforcement learning. In this setting, a learning agent tries to keep contact with the group using as sensory input the velocity of its neighbors. This goal is pursued by each learning individual by exerting a limited control on its own direction of motion. By means of standard reinforcement learning algorithms we show that (i) a learning agent exposed to a group of teachers, i.e., hard-wired flocking agents, learns to follow them, and (ii) in the absence of teachers, a group of independently learning agents evolves towards a state where each agent knows how to flock. In both scenarios, the emergent policy (or navigation strategy) corresponds to the polar velocity alignment mechanism of the well-known Vicsek model. These results (a) show that such a velocity alignment may have naturally evolved as an adaptive behavior that aims at minimizing the rate of neighbor loss, and (b) prove that this alignment does not only favor (local) polar order, but it corresponds to the best policy or strategy to keep group cohesion when the sensory input is limited to the velocity of neighboring agents. In short, to stay together, steer together.

中文翻译:

学习通过强化来聚集。

鸟群,鱼群和昆虫群是一群由许多人的行为自发产生的协调运动的例子。在这里,我们从多主体强化学习的角度研究植绒行为。在这种设置下,学习代理尝试使用邻居的速度作为感官输入来与该组保持联系。每个学习者都通过对其运动方向施加有限的控制来实现这一目标。通过标准的强化学习算法,我们表明(i)暴露于一群教师的学习代理,即硬连线的植绒代理,学会跟随他们,以及(ii)在没有教师的情况下,一群独立的学习型代理人逐渐发展为每个代理人都知道如何聚集的状态。在这两种情况下,紧急策略(或导航策略)对应于众所周知的Vicsek模型的极速度对准机制。这些结果(a)表明,这种速度对准可能已自然发展为一种自适应行为,旨在最大程度地减少邻居损失率,并且(b)证明这种对准不仅有利于(局部)极序,而且对应最佳策略或策略,以在感觉输入受限于邻近主体的速度时保持团队凝聚力。简而言之,要保持在一起,共同努力。(b)证明这种对齐方式不仅有利于(本地)极地秩序,而且在感觉输入受限于邻近主体的速度时,它对应于保持群体凝聚力的最佳策略或策略。简而言之,要保持在一起,共同努力。(b)证明这种对齐方式不仅有利于(本地)极地秩序,而且在感觉输入受限于邻近主体的速度时,它对应于保持群体凝聚力的最佳策略或策略。简而言之,要保持在一起,共同努力。
更新日期:2020-07-02
down
wechat
bug