当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Self-Attention-Based Deep Reinforcement Learning Approach for AGV Dispatching Systems.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2022-11-30 , DOI: 10.1109/tnnls.2022.3222206
Qinglai Wei 1 , Yutian Yan 1 , Jie Zhang 1 , Jun Xiao 2 , Cong Wang 3
Affiliation  

The automated guided vehicle (AGV) dispatching problem is to develop a rule to assign transportation tasks to certain vehicles. This article proposes a new deep reinforcement learning approach with a self-attention mechanism to dynamically dispatch the tasks to AGV. The AGV dispatching system is modeled as a less complicated Markov decision process (MDP) using vehicle-initiated rules to dispatch a workcenter to an idle AGV. In order to deal with the highly dynamical environment, the self-attention mechanism is introduced to calculate the importance of different information. The invalid action masking technique is performed to alleviate false actions. A multimodal structure is employed to mix the features of various sources. Comparative experiments are performed to show the effectiveness of the proposed method. The properties of the learned policies are also investigated under different environment settings. It is discovered that the policies explore and learn the properties of different systems, and also smooth the traffic congestion. Under certain environment settings, the policy converges to a heuristic rule that assigns the idle AGV to the workcenter with the shortest queue length, which shows the adaptiveness of the proposed method.

中文翻译:

一种用于 AGV 调度系统的基于自我注意的深度强化学习方法。

自动导引车 (AGV) 调度问题是制定规则,将运输任务分配给某些车辆。本文提出了一种新的深度强化学习方法,它具有自我注意机制,可以动态地将任务分派给 AGV。AGV 调度系统被建模为一个不太复杂的马尔可夫决策过程 (MDP),使用车辆启动的规则将工作中心调度到空闲的 AGV。为了应对高度动态的环境,引入self-attention机制来计算不同信息的重要性。执行无效动作屏蔽技术以减少错误动作。采用多模态结构来混合各种来源的特征。进行了比较实验以显示所提出方法的有效性。学习策略的属性也在不同的环境设置下进行了调查。人们发现,这些政策探索和学习了不同系统的特性,也缓解了交通拥堵。在特定的环境设置下,策略收敛到一个启发式规则,将空闲的 AGV 分配到队列长度最短的工作中心,这显示了所提出方法的适应性。
更新日期:2022-11-30
down
wechat
bug