当前位置: X-MOL 学术Trans. GIS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Local motion simulation using deep reinforcement learning
Transactions in GIS ( IF 2.568 ) Pub Date : 2020-03-30 , DOI: 10.1111/tgis.12620
Dong Xu 1, 2 , Xiao Huang 2 , Zhenlong Li 2 , Xiang Li 1
Affiliation  

Traditional local motion simulation focuses largely on avoiding collisions in the next frame. However, due to its lack of forward looking, the global trajectory of agents usually seems unreasonable. As a method of optimizing the overall reward, deep reinforcement learning (DRL) can better correct the problems that exist in the traditional local motion simulation method. In this article, we propose a local motion simulation method integrating optimal reciprocal collision avoidance (ORCA) and DRL, referred to as ORCA‐DRL. The main idea of ORCA‐DRL is to perform local collision avoidance detection via ORCA and smooth the trajectory at the same time via DRL. We use a deep neural network (DNN) as the state‐to‐action mapping function, where the state information is detected by virtual visual sensors and the action space includes two continuous spaces: speed and direction. To improve data utilization and speed up the training process, we use the proximal policy optimization based on the actor–critic (AC) framework to update the DNN parameters. Three scenes (circle, hallway, and crossing) are designed to evaluate the performance of ORCA‐DRL. The results reveal that, compared with the ORCA, our proposed ORCA‐DRL method can: (a) reduce the total number of frames, leading to less time for agents to reach their destination; and (b) effectively avoid local optima, evidenced by smoothed global trajectories.

中文翻译:

使用深度强化学习的局部运动模拟

传统的局部运动仿真主要集中在避免下一帧的碰撞上。但是,由于缺乏前瞻性,代理商的全球轨迹通常看起来是不合理的。作为优化总体奖励的一种方法,深度强化学习(DRL)可以更好地纠正传统局部运动模拟方法中存在的问题。在本文中,我们提出了一种局部运动仿真方法,该方法结合了最佳的双向碰撞避免(ORCA)和DRL,称为ORCA‐DRL。ORCA‐DRL的主要思想是通过ORCA执行局部避碰检测,并同时通过DRL平滑轨迹。我们使用深度神经网络(DNN)作为状态到动作的映射功能,其中状态信息由虚拟视觉传感器检测,并且动作空间包括两个连续的空间:速度和方向。为了提高数据利用率并加快训练过程,我们使用基于参与者-批评者(AC)框架的近端策略优化来更新DNN参数。设计了三个场景(圆形,走廊和交叉)来评估ORCA‐DRL的性能。结果表明,与ORCA相比,我们提出的ORCA‐DRL方法可以:(a)减少帧总数,从而减少座席到达目的地的时间;(b)有效地避免局部最优,这是由平滑的全球轨迹所证明的。结果表明,与ORCA相比,我们提出的ORCA‐DRL方法可以:(a)减少帧总数,从而减少座席到达目的地的时间;(b)有效地避免局部最优,这是由平滑的全球轨迹所证明的。结果表明,与ORCA相比,我们提出的ORCA‐DRL方法可以:(a)减少帧总数,从而减少座席到达目的地的时间;(b)有效地避免局部最优,这是由平滑的全球轨迹所证明的。
更新日期:2020-03-30
down
wechat
bug