当前位置: X-MOL 学术IET Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UAV-enabled computation migration for complex missions: A reinforcement learning approach
IET Communications ( IF 1.6 ) Pub Date : 2020-09-24 , DOI: 10.1049/iet-com.2019.1188
Shichao Zhu 1, 2 , Lin Gui 1 , Nan Cheng 3 , Qi Zhang 1 , Fei Sun 1 , Xiupu Lang 1
Affiliation  

The implementation of computation offloading is a challenging issue in the remote areas where traditional edge infrastructures are sparsely deployed. In this study, the authors propose a unmanned aerial vehicle (UAV)-enabled edge computing framework, where a group of UAVs fly around to provide the near-users edge computing service. They study the computation migration problem for the complex missions, which can be decomposed as some typical task-flows considering the inter-dependency of tasks. Each time a task appears, it should be allocated to a proper UAV for execution, which is defined as the computation migration or task migration. Since the UAV-ground communication data rate is strongly associated with the UAV location, selecting a proper UAV to execute each task will largely benefit the missions response time. They formulate the computation migration decision making problem as a Markov decision process, in which the state contains the extracted observations from the environment. To cope with the dynamics of the environment, they propose an advantage actor–critic reinforcement learning approach to learn the near-optimal policy on-the-fly. Simulation results show that the proposed approach has a desirable convergence property, and can significantly reduce the average response time of missions compared with the benchmark greedy method.

中文翻译:

无人机支持的复杂任务的计算迁移:一种强化学习方法

实施 在稀疏部署传统边缘基础架构的偏远地区,计算卸载负担是一个具有挑战性的问题。在这项研究中,作者提出了一种支持无人飞行器(UAV)的边缘计算框架,一组无人机在其中飞行以提供接近用户的边缘计算服务。他们研究了复杂任务的计算迁移问题,考虑到任务的相互依赖性,可以将其分解为一些典型的任务流。每次出现任务时,都应将其分配给适当的UAV执行,这被定义为计算迁移或任务迁移。由于UAV地面通信数据速率与UAV位置密切相关,因此选择合适的UAV执行每项任务将大大有利于任务响应时间。他们将计算迁移决策问题表述为马尔可夫决策过程,其中状态包含从环境中提取的观察结果。为了应对环境的动态变化,他们提出了一种有利的行为者-批评者强化学习方法,以实时学习近乎最优的政策。仿真结果表明,该方法具有良好的收敛性,与基准贪婪方法相比,可以显着减少任务的平均响应时间。
更新日期:2020-09-25
down
wechat
bug