当前位置: X-MOL 学术Int. J. Prod. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A decision-making method for assembly sequence planning with dynamic resources
International Journal of Production Research ( IF 9.2 ) Pub Date : 2021-06-14 , DOI: 10.1080/00207543.2021.1937748
Wenbo Wu 1 , Zhengdong Huang 1 , Jiani Zeng 1 , Kuan Fan 1
Affiliation  

With the advent of mass customisation, solving the assembly sequence planning (ASP) problem not only involves a non-convex optimisation problem that is hard to solve but also requires a high-speed response to the changes of assembly resources. This paper proposes a deep reinforcement learning (DRL) approach for the ASP problem, aiming at promoting the response speed by exploiting the reusability and expandability of past decision-making experiences. First, the connector-based ASP problem is described in a matrix manner, and its objective function is set to minimise assembly cost under the precedence constraints. Secondly, an instance generation algorithm is developed for policy training, and a mask algorithm is adopted to screen out impracticable assembly operations in each decision-making step. Then, the Monte Carlo sampling method is used to evaluate the ASP policy. The policy is learned from an actor–criticbased DRL algorithm, which contains two networks, policy network and evaluation network. Next, the network structures are introduced and they are trained by a mini-batch algorithm. Finally, four cases are studied to validate this method, and the results are discussed. It is demonstrated that the proposed method can solve the ASP problem accurately and efficiently in the environment with dynamic resource changes.



中文翻译:

一种动态资源装配顺序规划的决策方法

随着大规模定制的出现,解决装配顺序规划(ASP)问题不仅涉及难以解决的非凸优化问题,而且还需要对装配资源的变化做出高速响应。本文针对 ASP 问题提出了一种深度强化学习 (DRL) 方法,旨在通过利用过去决策经验的可重用性和可扩展性来提高响应速度。首先,基于连接器的ASP问题以矩阵方式描述,其目标函数设置为在优先约束下最小化组装成本。其次,为策略训练开发了实例生成算法,并采用掩码算法在每个决策步骤中筛选出不切实际的组装操作。然后,蒙特卡罗抽样方法用于评估 ASP 策略。该策略是从基于actor-critic的DRL算法中学习的,该算法包含两个网络,策略网络和评估网络。接下来,介绍网络结构,并通过小批量算法对其进行训练。最后,研究了四个案例来验证该方法,并对结果进行了讨论。证明了所提出的方法可以在资源动态变化的环境中准确有效地解决ASP问题。

更新日期:2021-06-14
down
wechat
bug