当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning Provides a Flexible Approach for Realistic Supply Chain Safety Stock Optimisation
arXiv - CS - Multiagent Systems Pub Date : 2021-07-02 , DOI: arxiv-2107.00913
Edward Elson Kosasih, Alexandra Brintrup

Although safety stock optimisation has been studied for more than 60 years, most companies still use simplistic means to calculate necessary safety stock levels, partly due to the mismatch between existing analytical methods' emphases on deriving provably optimal solutions and companies' preferences to sacrifice optimal results in favour of more realistic problem settings. A newly emerging method from the field of Artificial Intelligence (AI), namely Reinforcement Learning (RL), offers promise in finding optimal solutions while accommodating more realistic problem features. Unlike analytical-based models, RL treats the problem as a black-box simulation environment mitigating against the problem of oversimplifying reality. As such, assumptions on stock keeping policy can be relaxed and a higher number of problem variables can be accommodated. While RL has been popular in other domains, its applications in safety stock optimisation remain scarce. In this paper, we investigate three RL methods, namely, Q-Learning, Temporal Difference Advantage Actor-Critic and Multi-agent Temporal Difference Advantage Actor-Critic for optimising safety stock in a linear chain of independent agents. We find that RL can simultaneously optimise both safety stock level and order quantity parameters of an inventory policy, unlike classical safety stock optimisation models where only safety stock level is optimised while order quantity is predetermined based on simple rules. This allows RL to model more complex supply chain procurement behaviour. However, RL takes longer time to arrive at solutions, necessitating future research on identifying and improving trade-offs between the use of AI and mathematical models are needed.

中文翻译:

强化学习为现实的供应链安全库存优化提供了一种灵活的方法

尽管安全库存优化已经研究了 60 多年,但大多数公司仍然使用简单的方法来计算必要的安全库存水平,部分原因是现有分析方法侧重于推导出可证明的最佳解决方案与公司偏好牺牲最佳结果之间的不匹配赞成更现实的问题设置。人工智能 (AI) 领域的一种新兴方法,即强化学习 (RL),有望在找到最佳解决方案的同时适应更现实的问题特征。与基于分析的模型不同,RL 将问题视为黑盒模拟环境,以缓解过度简化现实的问题。因此,可以放宽对库存政策的假设,并且可以容纳更多的问题变量。虽然 RL 在其他领域很受欢迎,但它在安全库存优化方面的应用仍然很少。在本文中,我们研究了三种 RL 方法,即 Q-Learning、Temporal Difference Advantage Actor-Critic 和 Multi-agent Temporal Difference Advantage Actor-Critic,用于优化独立代理线性链中的安全库存。我们发现 RL 可以同时优化库存策略的安全库存水平和订单数量参数,这与经典的安全库存优化模型不同,后者仅优化安全库存水平,而订单数量是根据简单规则预先确定的。这允许 RL 对更复杂的供应链采购行为进行建模。但是,RL 需要更长的时间才能得出解决方案,
更新日期:2021-07-05
down
wechat
bug