Model-based controlled learning of MDP policies with an application to lost-sales inventory control,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-based controlled learning of MDP policies with an application to lost-sales inventory control
arXiv - CS - Machine Learning Pub Date : 2020-11-30 , DOI: arxiv-2011.15122
Willem van Jaarsveld

Recent literature established that neural networks can represent good MDP policies across a range of stochastic dynamic models in supply chain and logistics. To overcome limitations of the model-free algorithms typically employed to learn/find such neural network policies, a model-based algorithm is proposed that incorporates variance reduction techniques. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while also outperforming heuristic benchmarks. The algorithm may be an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics.

中文翻译：

基于模型的MDP策略的受控学习及其在销售损失库存控制中的应用

最近的文献证实，神经网络可以代表供应链和物流中一系列随机动态模型中的良好MDP策略。为了克服通常用于学习/查找这种神经网络策略的无模型算法的局限性，提出了一种基于模型的算法，该算法结合了方差减少技术。对于经典的销售损失库存模型，该算法学习的神经网络策略优于使用无模型算法学习的神经网络策略，同时也优于启发式基准。该算法可能是一个有趣的候选者，可以应用于供应链和物流中的其他随机动态问题。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文