Multi-agent reinforcement learning for online scheduling in smart factories

https://doi.org/10.1016/j.rcim.2021.102202Get rights and content

Highlights

  • A distributed architecture for smart factories is proposed by integrating heterogeneous manufacturing units (e.g., warehouses, machines, and material handlers).

  • We design an artificial intelligence (AI) scheduler with novel neural networks for each unit to schedule low-volume-high-mix workorders in real time.

  • New composite reward functions are designed to improve AI schedulers’ decision-making abilities in shortening makespan and balancing workloads.

  • AI schedulers make scheduling policies on their own and collaborate to handle unexpected events such as urgent workorders and machine failures.

  • The proposed methodology improves scheduling effectiveness and operating robustness of manufacturing systems.

Abstract

Rapid advances in sensing and communication technologies connect isolated manufacturing units, which generates large amounts of data. The new trend of mass customization brings a higher level of disturbances and uncertainties to production planning. Traditional manufacturing systems analyze data and schedule orders in a centralized architecture, which is inefficient and unreliable for the overdependence on central controllers and limited communication channels. Internet of things (IoT) and cloud technologies make it possible to build a distributed manufacturing architecture such as the multi-agent system (MAS). Recently, artificial intelligence (AI) methods are used to solve scheduling problems in the manufacturing setting. However, it is difficult for scheduling algorithms to process high-dimensional data in a distributed system with heterogeneous manufacturing units. Therefore, this paper presents new cyber-physical integration in smart factories for online scheduling of low-volume-high-mix orders. First, manufacturing units are interconnected with each other through the cyber-physical system (CPS) by IoT technologies. Attributes of machining operations are stored and transmitted by radio frequency identification (RFID) tags. Second, we propose an AI scheduler with novel neural networks for each unit (e.g., warehouse, machine) to schedule dynamic operations with real-time sensor data. Each AI scheduler can collaborate with other schedulers by learning from their scheduling experiences. Third, we design new reward functions to improve the decision-making abilities of multiple AI schedulers based on reinforcement learning (RL). The proposed methodology is evaluated and validated in a smart factory by real-world case studies. Experimental results show that the new architecture for smart factories not only improves the learning and scheduling efficiency of multiple AI schedulers but also effectively deals with unexpected events such as rush orders and machine failures.

Introduction

The new mass customization generates large numbers of low-volume-high-mix orders, which requires the manufacturing systems to be more flexible to handle the variations in design specifications. Traditional factories are inadequate for fluctuating markets with a lot of uncertainties. Smart factories integrate heterogeneous manufacturing resources (e.g., material, orders, and machines) based on internet of things (IoT) technologies and coordinate these resources by scheduling algorithms. The statuses of resources are derived by sensors and transmitted by the internet in real time. As a result, the smart factory turns into a data-rich environment, which provides an unprecedented opportunity to improve the “smartness” of factories. However, big data from smart factories are high-dimensional and updated dynamically due to resource complexity and unexpected disturbances (e.g., material shortages, rush orders, and machine failures). Traditional offline scheduling algorithms are implemented in fixed procedures with determined attributes, so they are ineffective in handling high-dimensional data from a dynamic environment.

The “smartness” of a factory can be improved from two aspects, including architectural redesign and scheduling optimization. Rapid advances of IoT promote the development of technologies for data sensing, transmission, and analysis. Different architectures for smart factories are proposed to derive sensor data from machines and orders. For example, a factory can be divided into different functional layers such as execution, adaptation, communication, and computation. Big data are accumulated and analyzed by the computation layer to monitor the factory or update scheduling policies made by simulation-based methods. However, the centralized architecture is ineffective in data processing when large numbers of operations are initialized simultaneously. Therefore, the distributed architecture is proposed to decompose the tasks of the central computing layer and assign them to different computing units in a manufacturing workshop. To handle uncertainties in the low-volume-high-mix manufacturing setting, early studies periodically reschedule resources to achieve new optimal schedules in an offline manner, which needs extra computing time. Cloud and sensing technologies connect heterogeneous manufacturing resources (e.g., orders, machines, warehouses, and material handling equipment) and realize online scheduling in a simulation way, e.g., multi-agent system (MAS). However, most simulation-based methods are limited in the ability to use real-time sensor data for online scheduling. Recently, artificial intelligence (AI) arouses an increasing interest in solving dynamic scheduling problems by learning from data and accumulated experiences. However, it is difficult to utilize high-dimensional sensor data for production scheduling, especially considering multiple objectives in a distributed architecture.

This paper presents a new architecture for cyber-physical integration to implement data-driven online scheduling by coordinating multiple AI schedulers in smart factories. We design an AI scheduler for each physical unit in a smart factory to schedule orders based on real-time statuses of operations and machines. Decision-making algorithms run on distributed computing units in the workshop rather than the central server, which reduces unnecessary communication and improves scheduling efficiency. Each AI scheduler has four novel neural networks that take high-dimensional data for online decision making. New composite reward functions are designed to help AI schedulers optimize multiple objectives such as minimizing makespan and balancing workloads. To achieve global optimization in a dynamic environment with multiple decision makers, the proposed AI scheduler learns from not only its own operating data but also the experiences of other AI schedulers.

The proposed methodology is implemented in a smart factory and evaluated with a series of real-world experiments. We benchmark the performance of multiple AI schedulers with a variety of common scheduling methods such as metaheuristic methods (e.g., genetic algorithm (GA)), contract net protocol (CNP), and centralized reinforcement learning (RL). Experimental results show that the new manufacturing architecture with AI schedulers can not only improve online scheduling performances for orders but also effectively handle unexpected events (e.g., rush orders, machine failures) in real time. The novel methodology for cyber-physical integration can be generally implemented in most manufacturing systems to make factories smarter in the low-volume-high-mix manufacturing setting.

The remaining sections of this paper are organized as follows. Section 2 presents a review of relevant literature in building and scheduling a manufacturing workshop. Section 3 presents the proposed methodology of cyber-physical integration in a smart factory to implement the collaboration of multiple AI schedulers based on RL. Section 4 designs a testbed to evaluate the performance of the smart factory with AI schedulers. Section 5 shows experimental results and makes comparisons with different methods for operating a smart factory. Section 6 rounds up the paper and discusses future researches on potential topics.

Section snippets

Manufacturing system architecture

A manufacturing system includes various equipment (e.g., machines, warehouses, and material handlers) to complete orders according to design requirements. We define the “work order” as a customized order which contains only one part, i.e., a work order represents an individualized part. A work order, containing one or more operations, is stored in the inventory (i.e., warehouse) and then handled among machines by the material handling system (MHS). The architecture design for a manufacturing

Cyber-physical architecture

A cyber-physical system connects all heterogeneous units and transmits data between the cloud and workshops. The proposed architecture is shown in Fig. 1 to illustrate the components and interconnections of the smart factory. IoT connects all units through the local area network (LAN) and shares data with the cloud. The smart factory is composed of machines, material handlers (e.g., auto-guided vehicle (AGV), robot), a warehouse, and a supervisor. Work order generating and process planning are

Experimental design

As shown in Fig. 10, various experiments are designed to evaluate the learning and scheduling performances of multiple AI schedulers in the proposed smart factory. A testbed is designed to realize IoT manufacturing by utilizing cyber-physical technologies. We benchmark the performances of AI schedulers with a variety of traditional scheduling methods such as GA, CNP, and centralized RL.

Leaning performance of RL-based scheduling methods

The first set of experiments aims to evaluate the learning performance of RL scheduling methods. Schedulers attempt to improve their decision-making abilities with the help of different reward functions. The weights in Eq. (10) are assigned with 1/3 (i.e., w1=w2=w3= 1/3) for the proposed methodology. Both r(W) in Eq. (8) and r(U) in Eq. (9) are related to makespan that is regarded as the only objective for most RL scheduling methods. However, if most work orders need not wait in a not busy

Conclusions and future work

This paper presents a new distributed architecture with multiple AI schedulers for online scheduling of work orders in a smart factory. Each machine is equipped with a customized IPC, which interacts with the corresponding machine and links the machine to the manufacturing system. AI schedulers are operated on distributed IPCs, which improves their learning and scheduling efficiency. The MARL method organizes multiple schedulers to improve their decision-making abilities under uncertainties. A

CRediT authorship contribution statement

Tong Zhou: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Dunbing Tang: Project administration, Funding acquisition, Writing - review & editing. Haihua Zhu: Supervision, Funding acquisition. Zequn Zhang: Conceptualization, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No. 2020YFB1710500), the National Natural Science Foundation of China (No. 52075257), and the Fundamental Research Funds for the Central Universities (No. NP2020304). The authors express great thanks and appreciation to the editors and anonymous reviewers whose constructive comments significantly improved the paper.

References (41)

  • M. Rauf et al.

    A smart algorithm for multi-criteria optimization of model sequencing problem in assembly lines

    Rob. Comput. Integr. Manuf.

    (2020)
  • C. Morariu et al.

    Machine learning for predictive scheduling and resource allocation in large scale manufacturing systems

    Comput. Ind.

    (2020)
  • M.A. Salido et al.

    Rescheduling in job-shop problems for sustainable manufacturing systems

    J. Clean. Prod.

    (2017)
  • Y. Koren et al.

    Design of reconfigurable manufacturing systems

    J. Manuf. Syst.

    (2010)
  • M. Ghaleb et al.

    Real-time production scheduling in the Industry-4.0 context: Addressing uncertainties in job arrivals and machine breakdowns

    Comput. Oper. Res.

    (2020)
  • B. Zhou et al.

    A novel knowledge graph-based optimization approach for resource allocation in discrete manufacturing workshops

    Rob. Comput. Integr. Manuf.

    (2021)
  • D. Li et al.

    A semantic-level component-based scheduling method for customized manufacturing

    Rob. Comput. Integr. Manuf.

    (2021)
  • Z. Zhang et al.

    Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning

    Comput. Oper. Res.

    (2012)
  • J. Harrington

    Computer integrated manufacturing

    (1974)
  • T. Hao et al.

    CASOA: an architecture for agent-based manufacturing system in the context of Industry 4.0

    IEEE Access

    (2017)
  • Cited by (57)

    • Deep reinforcement learning in smart manufacturing: A review and prospects

      2023, CIRP Journal of Manufacturing Science and Technology
      Citation Excerpt :

      While facing the challenges of new job insertions and machine breakdowns, Luo et al. [124] proposed to use MARL architecture with PPO to address such dynamic partial-no-wait multi-objective scheduling problems. Similarly, Zhou et al. [125] built a distributed manufacturing architecture integrating the PPO method, which can solve the problem of inefficiency and unreliability caused by over-dependence on central controllers and limited communication channels during low-volume-high-mix orders online scheduling. Palombarini et al. [126,127] designed a control policy to learn schedule policies via PPO using a color-rich Gantt chart and negligible prior knowledge directly from high-dimensional sensory inputs.

    View all citing articles on Scopus
    View full text