Reinforcement learning approach for resource allocation in humanitarian logistics,Expert Systems with Applications

当前位置： X-MOL 学术 › Expert Syst. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement learning approach for resource allocation in humanitarian logistics
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2021-02-09 , DOI: 10.1016/j.eswa.2021.114663
Lina Yu , Canrong Zhang , Jingyan Jiang , Huasheng Yang , Huayan Shang

When a disaster strikes, it is important to allocate limited disaster relief resources to those in need. This paper considers the allocation of resources in humanitarian logistics using three critical performance indicators: efficiency, effectiveness and equity. Three separate costs are considered to represent these metrics, namely, the accessibility-based delivery cost, the starting state-based deprivation cost, and the terminal penalty cost. A mixed-integer nonlinear programming model with multiple objectives and multiple periods is proposed. A Q-learning algorithm, a type of reinforcement learning method, is developed to address the complex optimization problem. The principles of the proposed algorithm, including the learning agent and its actions, the environment and its states, and reward functions, are presented in detail. The parameter settings of the proposed algorithm are also discussed in the experimental section. In addition, the solution quality of the proposed algorithm is compared with that of the exact dynamic programming method and a heuristic algorithm. The experimental results show that the efficiency of the algorithm is better than that of the dynamic programming method and the accuracy of the algorithm is higher than that of the heuristic algorithm. Moreover, the Q-learning algorithm provides close to or even optimal solutions to the resource allocation problem by adjusting the value of the training episode K in practical applications.

中文翻译：

在人道主义后勤中分配资源的强化学习方法

当灾难来袭时，重要的是将有限的救灾资源分配给有需要的人。本文使用三个关键绩效指标来考虑人道主义后勤中的资源分配：效率，有效性和公平性。考虑三个单独的成本来表示这些度量，即基于可访问性的交付成本，基于状态的初始剥夺成本和终端罚款成本。提出了具有多个目标，多个周期的混合整数非线性规划模型。提出了一种Q学习算法，一种强化学习方法，以解决复杂的优化问题。详细介绍了所提出算法的原理，包括学习代理及其动作，环境及其状态以及奖励函数。实验部分还讨论了所提出算法的参数设置。另外，将所提算法的求解质量与精确动态规划方法和启发式算法的求解质量进行了比较。实验结果表明，该算法的效率优于动态规划方法，并且其精度高于启发式算法。此外，Q学习算法通过调整训练情节的值，为资源分配问题提供了接近甚至最佳的解决方案实验结果表明，该算法的效率优于动态规划方法，并且其精度高于启发式算法。此外，Q学习算法通过调整训练情节的值，为资源分配问题提供了接近甚至最佳的解决方案实验结果表明，该算法的效率优于动态规划方法，并且其精度高于启发式算法。此外，Q学习算法通过调整训练情节的值，为资源分配问题提供了接近甚至最佳的解决方案实际应用中的K。

更新日期：2021-02-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>