Elsevier

Journal of Process Control

Volume 118, October 2022, Pages 139-152
Journal of Process Control

Meta-reinforcement learning for the tuning of PI controllers: An offline approach

https://doi.org/10.1016/j.jprocont.2022.08.002Get rights and content

Highlights

  • An expert agent recognizes and controls any process in a large class.

  • Offline training leads to sample-efficient online operation and adaptation.

  • The online agent adapts automatically as plant characteristics drift.

  • The agent adjusts PI parameters instead of directly actuating the plant.

  • meta-RL design generalizes to various plant and controller structures.

Abstract

Meta-learning is a branch of machine learning which trains neural network models to synthesize a wide variety of data in order to rapidly solve new problems. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that can be used to tune proportional–integral controllers. Our meta-RL agent has a recurrent structure that accumulates “context” to learn a system’s dynamics through a hidden state variable in closed-loop. This architecture enables the agent to automatically adapt to changes in the process dynamics. In tests reported here, the meta-RL agent was trained entirely offline on first order plus time delay systems, and produced excellent results on novel systems drawn from the same distribution of process dynamics used for training. A key design element is the ability to leverage model-based information offline during training in simulated environments while maintaining a model-free policy structure for interacting with novel processes where there is uncertainty regarding the true process dynamics. Meta-learning is a promising approach for constructing sample-efficient intelligent controllers.

Introduction

Reinforcement learning (RL) is a branch of machine learning that formulates a goal-oriented “policy” for taking actions in a stochastic environment [1]. This general framework has attracted the interest of the process control community [2]. For example, one can consider feedback control problems without the need for a process model in this setting. Despite its appeal, an overarching challenge in RL is its need for a significant amount of data to learn a useful policy.

Meta-learning, or “learning to learn”, is an active area of research in which the objective is to learn an underlying structure governing a distribution of possible tasks [3]. In process control applications, meta-learning is appealing because many systems have similar dynamics or a known structure, which suggests training over a distribution could improve the sample efficiency2 when learning any single task. Moreover, extensive online learning is impractical for training over a large number of systems; by focusing on learning a underlying structure for the tasks, we can more readily adapt to a new system.

This paper proposes a method for improving the online sample efficiency of RL agents. Our approach is to train a “meta” RL agent offline by exposing it to a broad distribution of different dynamics. The agent synthesizes its experience from different environments to quickly learn an optimal policy for its present environment. The training is performed completely offline and the result is a single RL agent that can quickly adapt its policy to a new environment in a model-free fashion.

We apply this general method to the industrially-relevant problem of autonomous controller tuning. We show how our trained agent can adaptively fine-tune proportional–integral (PI) controller parameters when the underlying dynamics drift or are not contained in the distribution used for training. We apply the same agent to novel dynamics featuring nonlinearities and different time scales. Moreover, perhaps the most appealing consequence of this method is that it removes the need to accommodate a training algorithm on a system-by-system basis – for example, through extensive online training or transfer learning, hyperparameter tuning, or system identification – because the adaptive policy is pre-computed and represented in a single model.

In this work, we propose the use of meta-reinforcement learning (meta-RL) for process control applications. We create a recurrent neural network (RNN) based policy. The hidden state of the RNN serves as an encoding of the system dynamics, which provides the network with “context” for its policy. The controller is trained using a distribution of different processes referred to as “tasks”. We use this framework to develop a versatile controller which can quickly adapt to effectively control any process from a prescribed distribution of processes rather than a single task.

This paper extends McClement et al. [4] with the following additional contributions:

  • A simplified and improved meta-RL algorithm: while [4] required online training, the meta-RL agent in this work is trained entirely offline in advance.

  • Completely new simulation studies, including industrially-relevant examples dealing with PI controllers and nonlinear dynamics; and

  • A method of leveraging known, model-based system information offline for the purposes of training, with model-free online deployment.

This framework addresses key priorities in industrial process control, particularly:

  • Initial tuning and commissioning of a PID controller, and

  • Adaptive updates of the PID controller as the process changes over time.

  • Scalable maintenance of PID controllers across many different systems without case-by-case tuning.

This paper is organized as follows: In Section 2 we summarize key concepts from RL and meta-RL; in Section 3 we describe our algorithm for meta-RL and its practical implementation for process control applications. We demonstrate our approach through numerical examples in Section 4, and conclude in Section 5.

We review some related work at the intersection of RL and process control. For a more thorough overview the reader is referred to the survey papers by Shin et al. [5], Lee et al. [6], or the tutorial-style papers by Nian et al. [2], Spielberg et al. [7].

Some initial studies by Hoskins and Himmelblau [8], Kaisare et al. [9], Lee and Lee [10], Lee and Wong [11] in the 1990s and 2000s demonstrated the appeal of reinforcement learning and approximate dynamic programming for process control applications. More recently, there has been significant interest in deep RL methods for process control [12], [13], [14], [15], [16], [17], [18].

Spielberg et al. [7] adapted the deep deterministic policy gradient (DDPG) algorithm for setpoint tracking problems in a model-free fashion. Meanwhile, Wang et al. [19] developed a deep RL algorithm based on proximal policy optimization [20]. Petsagkourakis et al. [21] use transfer learning to adapt a policy developed in simulation to novel systems. Variations of DDPG, such as twin-delayed DDPG (TD3) [22] or a Monte-Carlo based strategy, have also shown promising results in complex control tasks [23], [24]. Other approaches to RL-based control utilize a fixed controller structure such as PID [25], [26], [27], [28]; some of these are applied to a physical system [29], [30], [31].

This present work differs significantly from the approaches mentioned so far. Other approaches to more sample-efficient RL in process control utilize apprenticeship learning, transfer learning, or model-based strategies augmented with deep RL algorithms [21], [32], [33]. Our method differs in two significant ways. First, the training and deployment process is simplified with our meta-RL agent through its synthesized training over a large distribution of systems. Therefore, only one model needs to be trained, rather than training models on a system-by-system basis. Second, the meta-RL agent in our framework does not rely on precise system identification and only a crude understanding of the process dynamics is required. By training across a distribution of process dynamics, the meta-RL agent learns to control a wide variety of processes with no online or task-specific training required. Although the meta-RL agent is trained in simulation, the key to our approach is that the policy only utilizes process data, and thus achieves efficient model-free control on novel dynamics. A similar concept has been reported in the robotics literature where a robust policy for a single agent is trained offline, leveraging “privileged” information about the system dynamics [34]. Most similar to this present work is a paper in the field of robotics where a recurrent PPO policy was trained with randomized dynamics to improve the adaptation from simulated environments to real ones [35].

Section snippets

Reinforcement learning

In this section, we give a brief overview of deep RL and highlight some popular meta-RL methods. We refer the reader to Nian et al. [2], Spielberg et al. [7], for tutorial overviews of deep RL with applications to process control. We use the standard RL terminology that can be found in Sutton and Barto [36]. Huisman et al. [37] gives a unified survey of deep meta-learning.

The RL framework consists of an agent and an environment. For each state stS (the state-space) the agent encounters, it

Meta-RL for process control

We apply the meta-RL framework to the problem of tuning proportional–integral (PI) controllers. The formulation can be applied to any fixed-structure controller, but due to their prevalence, we focus on PI controllers as a practical illustration.

Asymptotic performance of the meta-RL tuning algorithm

Fig. 3 depicts the asymptotic performance of the meta-RL tuning method. The intervals of K, τ, and θ/τ in Table 1 define a 3D box in which each point corresponds to a different FOPTD system. After using the meta-RL agent to generate a PI controller for every such system, we could apply a setpoint step from 1 to 1, observe the closed-loop response (see Fig. 4), and compute its mean-squared deviation from the target trajectory in Eq. (16). The results could, in principle, be used to produce a

Conclusion

This work presents a meta-RL approach to tuning fixed- structure controllers in closed-loop without explicit system identification and demonstrates the approach using PI controllers. The method algorithm can be used to automate the initial tuning of controllers and, in continuous operation, to adaptively update controller parameters as process dynamics change over time. Assuming the magnitude of the process gain and time constant are known, the meta-RL tuning algorithm can be applied to any

CRediT authorship contribution statement

Daniel G. McClement: Conceptualization, Methodology, Formal analysis, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing. Nathan P. Lawrence: Conceptualization, Software, Writing – original draft, Writing – review & editing. Johan U. Backström: Writing – review & editing. Philip D. Loewen: Project administration, Supervision, Writing – review & editing. Michael G. Forbes: Funding acquisition, Supervision, Writing – review & editing. R. Bhushan Gopaluni:

Acknowledgments

We gratefully acknowledge the financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and Honeywell Connected Plant.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (54)

  • GeY. et al.

    An approximate dynamic programming method for the optimal control of alkai-surfactant-polymer flooding

    J. Process Control

    (2018)
  • PandianB.J. et al.

    Control of a bioreactor using a new partially supervised reinforcement learning algorithm

    J. Process Control

    (2018)
  • DogruO. et al.

    Online reinforcement learning for a continuous space system with experimental validation

    J. Process Control

    (2021)
  • WangY. et al.

    A novel approach to feedback control with deep reinforcement learning

    IFAC-PapersOnLine

    (2018)
  • PetsagkourakisP. et al.

    Reinforcement learning for batch bioprocess optimization

    Comput. Chem. Eng.

    (2020)
  • YooH. et al.

    Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation

    Comput. Chem. Eng.

    (2021)
  • ShipmanW.J. et al.

    Reinforcement learning and deep neural networks for PI controller tuning

    IFAC-PapersOnLine

    (2019)
  • CarluchoI. et al.

    Incremental Q-learning strategy for adaptive PID control of mobile robots

    Expert Syst. Appl.

    (2017)
  • DogruO. et al.

    Reinforcement learning approach to autonomous PID tuning

    Comput. Chem. Eng.

    (2022)
  • LawrenceN.P. et al.

    Deep reinforcement learning with shallow controllers: An experimental application to PID tuning

    Control Eng. Pract.

    (2022)
  • SkogestadS.

    Simple analytic rules for model reduction and PID controller tuning

    J. Process Control

    (2003)
  • SuttonR.S.

    Learning to predict by the methods of temporal differences

    Mach. Learn.

    (1988)
  • FinnC. et al.

    Model-agnostic meta-learning for fast adaptation of deep networks

    (2017)
  • SpielbergS. et al.

    Toward self-driving processes: A deep reinforcement learning approach to control

    AIChE J.

    (2019)
  • KaisareN.S. et al.

    Simulation based strategy for nonlinear optimal control: application to a microbial cell reactor

    Internat. J. Robust Nonlinear Control

    (2003)
  • CuiY. et al.

    Factorial kernel dynamic policy programming for vinyl acetate monomer plant model control

  • SchulmanJ. et al.

    Proximal policy optimization algorithms

    (2017)
  • Cited by (0)

    1

    Authors provided equal supervision.

    View full text