当前位置: X-MOL 学术J. Intell. Manuf. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modular production control using deep reinforcement learning: proximal policy optimization
Journal of Intelligent Manufacturing ( IF 5.9 ) Pub Date : 2021-05-22 , DOI: 10.1007/s10845-021-01778-z
Sebastian Mayer , Tobias Classen , Christian Endisch

EU regulations on \(\textit{CO}_2\) limits and the trend of individualization are pushing the automotive industry towards greater flexibility and robustness in production. One approach to address these challenges is modular production, where workstations are decoupled by automated guided vehicles, requiring new control concepts. Modular production control aims at throughput-optimal coordination of products, workstations, and vehicles. For this np-hard problem, conventional control approaches lack in computing efficiency, do not find optimal solutions, or are not generalizable. In contrast, Deep Reinforcement Learning offers powerful and generalizable algorithms, able to deal with varying environments and high complexity. One of these algorithms is Proximal Policy Optimization, which is used in this article to address modular production control. Experiments in several modular production control settings demonstrate stable, reliable, optimal, and generalizable learning behavior. The agent successfully adapts its strategies with respect to the given problem configuration. We explain how to get to this learning behavior, especially focusing on the agent’s action, state, and reward design.



中文翻译:

使用深度强化学习的模块化生产控制:近端策略优化

欧盟关于\(\ textit {CO} _2 \)的法规极限和个性化趋势正在推动汽车行业朝着更大的生产灵活性和稳健性发展。解决这些挑战的一种方法是模块化生产,在这种生产中,工作站需要通过自动导引车进行解耦,从而需要新的控制概念。模块化生产控制的目标是对产品,工作站和车辆进行吞吐量优化的协调。对于这个难解决的问题,传统的控制方法缺乏计算效率,找不到最佳解决方案或无法推广。相比之下,深度强化学习提供了功能强大且可通用的算法,能够应对各种环境和高度复杂性。这些算法之一是“近端策略优化”,本文将其用于解决模块化生产控制。在几种模块化生产控制设置中的实验证明了稳定,可靠,最佳和通用的学习行为。代理针对给定的问题配置成功地调整了其策略。我们将说明如何达到这种学习行为,特别是关注代理的行为,状态和奖励设计。

更新日期:2021-05-22
down
wechat
bug