Elsevier

Automatica

Volume 146, December 2022, 110581
Automatica

Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning

https://doi.org/10.1016/j.automatica.2022.110581Get rights and content

Abstract

Reinforcement learning provides a powerful tool for designing a satisfactory controller through interactions with the environment. Although off-policy learning algorithms were recently designed for tracking problems, most of these results either are full-state feedback or have bounded control errors, which may not be flexible or desirable for engineering problems in the real world. To address these problems, we propose an output-feedback-based reinforcement learning approach that allows us to find the optimal control solution using input–output data and ensure the asymptotic tracking control of continuous-time systems. More specifically, we first propose a dynamical controller revised from the standard output regulation theory and use it to formulate an optimal output tracking problem. Then, a state observer is used to re-express the system state. Consequently, we address the rank issue of the parameterization matrix and analyze the state re-expression error that are crucial for transforming the off-policy learning into an output-feedback form. A comprehensive simulation study is given to demonstrate the effectiveness of the proposed approach.

Introduction

Reinforcement learning (RL) (Lewis, Vrabie, and Syrmos, 2012, Sutton and Barto, 1998, Vrabie et al., 2013) or approximate/adaptive dynamic programming (ADP) (Bertsekas, 1995, Lewis, Vrabie, and Vamvoudakis, 2012, Powell, 2007) has been a powerful technique in computational intelligence by feedbacking interactions with the unknown environment for decision making. From a perspective of the control community, RL integrates adaptive control and optimal control with a feature that an adaptive controller solves an optimal control problem (Lewis & Vrabie, 2009), which is different from that of model-based methods documented in Ioannou and Sun (2012) and Krstić et al. (1995).

The RL-based research has been undergoing a period of a major transition from regulation control to tracking control (Chen et al., 2022, Chen and Xie, 2021, Jiang and Jiang, 2017, Lee et al., 2012, Lee et al., 2014, Lewis, Vrabie, and Syrmos, 2012, Vrabie et al., 2013), which covers more practical problems by allowing the system output to track a trajectory. What remains unchanged is that RL algorithms not only stabilize the closed-loop system, but also adaptively achieve a satisfactory tracking performance (Lewis, Vrabie, & Syrmos, 2012). There are two ways to approach the optimal tracking problem via RL for continuous-time (CT) systems, which are the focus of this paper. One way is to build an augmented system containing the original system and command generator, based on which a discounting factor in the cost function and integral RL are used to formulate an optimal tracking control (Modares & Lewis, 2014a). Later, such a result is extended to nonlinear systems (Kamalapurkar et al., 2015, Liu et al., 2017, Luo et al., 2015, Qin et al., 2014), constrained-input systems (Modares & Lewis, 2014b), H control (Modares et al., 2015), and multi-agent systems (Chen et al., 2020, Modares et al., 2018, Modares, Nageshrao, et al., 2016). The other way is to bring the output regulation theory and optimal control together (Krener, 1992, Saberi et al., 2003). Using output regulation theory and ADP, Gao and Jiang (2016) proposed an asymptotic tracking controller, wherein system dynamics are assumed partially known. The work of Chen et al. (2019) removed such prior knowledge of system dynamics and achieved the optimal tracking control using the integral RL and experience replay.

However, most of the RL algorithms mentioned above require full-state measurements. As for control problems in the real world, full states usually are not available. Compared to full-state feedback, output feedback is more flexible and desirable for control implementation.

For output-feedback-based RL, Lewis and Vamvoudakis (2011) parameterized the system state into the Bellman equation and proposed an RL-based method for regulating discrete-time (DT) systems using input and output data. Later, output-feedback-based tracking control (Kiumarsi et al., 2015) was reported for DT systems, based on which H control (Fan et al., 2018) was considered. The work of Rizvi and Lin (2018b) used a method of parameterizing the system state through a DT state observer (Tao, 2003). As for CT systems, Jiang and Jiang (2010) considered an on-policy iteration using a state observer. Using the static output-feedback control and adaptive state observer, a suboptimal controller was given in Zhu et al. (2015). Methods for discrete approximation of CT systems were given in Gao et al. (2016). Based on Modares and Lewis (2014a) and Lewis and Vamvoudakis (2011), Modares, Lewis, and Jiang (2016) considered an output-feedback tracking control for CT systems without a static output-feedback stabilizable condition as required in Zhu et al. (2015), by setting the eigenvalues of the command generator within a predetermined region. Unlike discrete approximation in Gao et al. (2016), Rizvi and Lin (2018a) considered an output-feedback optimal control for regulating CT systems using a state observer to reparameterize system state, which is a CT version of Rizvi and Lin (2018b).

Even though some progress has been made, most of the current output-feedback optimal controls are designed for regulating CT systems. However, in many applications, it is also important to drive the system output to follow a prespecified trajectory with a zero error (see examples in flight control (Stevens et al., 2015) and robotics (Lewis et al., 2003)).

In this paper, we aim to study an optimal output tracking problem of CT systems. We propose an output-feedback-based RL approach, wherein input–output data along system trajectories are used to find the optimal solution without solving output regulation equations and ensure the asymptotic tracking control of CT systems. The contributions of this paper are summarized in the following three aspects.

  • (1)

    Compared to Gao et al. (2016) and Modares, Lewis, and Jiang (2016), we propose an output regulation and off-policy RL-based controller for the optimal output tracking problem.

  • (2)

    Compared to the state re-expression in Rizvi and Lin (2018a), we derive a condition for the parameterization matrix to be full row rank, based on which an approximate optimal control gain is uniquely obtained through output-feedback-based RL.

  • (3)

    The state re-expression error can be made small by running the system for a sufficiently long time before the data collection begins, based on which we propose an additional model-free pre-collection phase to supplement the off-policy learning for CT systems.

The remaining parts of this paper are organized as follows. In Section 2, we formulate an optimal output tracking control problem of CT systems by proposing a dynamical controller and review some preliminaries of optimal control. In Section 3, we present our main result on the optimal output tracking of CT systems via output-feedback-based RL. In Section 4, we provide a simulation example to demonstrate the effectiveness of the proposed approach. Conclusions are drawn in Section 5.

Notations: Throughout this paper, stands for the open left-half complex plane. For a matrix X, λ(X) denotes its spectrum; X>()0 means that the matrix X is positive (semi-positive) definite; and X<()0 denotes that X is negative (semi-negative) definite. For a matrix XRm×n, vec(X)=[x1T,x2T,,xnT]T denotes a vector-valued function of a matrix X with xiRm being the ith column of X. For a symmetric matrix XRm×m, vecs(X)=[x11,2x12,,2x1m,x22,2x23,,2xm1,m, xm,m]TR12m(m+1), where xij denotes the entry in the ith row and jth column of the matrix X. For a column vector vRn, vecv(v)=[v12,v1v2,,v1vn,v22,v2v3,,vn1vn,vn2]TR12n(n+1). I and O are an identity matrix and a zero matrix with appropriate dimensions, respectively. For a matrix X, det(X) denotes the determinant of the matrix and adj(X) the adjugate of the matrix. For the matrix pair (ARm×m, XRm×n), the controllability matrix is defined as CX(A)=[X,AX,A2X,,Am1X].

Section snippets

Problem formulation

In this paper, a class of CT linear systems is defined as ẋ=Ax+Bu,y=Cx, where xRn is the system state, uRm is the control input, yRp is the system output, and ARn×n, BRn×m, CRp×n are unknown constant matrices. The command signal is generated through the following dynamics ẋd=Sxd,yd=Rxd, where xdRq is the system state, ydRp is the output of the command generator, and SRq×q, RRp×q are unknown constant matrices.

For the system (1)–(4), some standard assumptions are made for the design

Main results on output-feedback-based reinforcement learning

In this section, we give an optimal tracking control of CT systems via off-policy learning in an output-feedback form.

To do this, an appropriate behavior policy is required for off-policy learning. We design the policy as z̄̇=Fz̄Gy+Gϑ,u=Tz̄+uc, where z̄Rpq is the state, ϑRp is an exploration noise, FRpq×pq and GRpq×p were given in (5), and ucRm is a free variable to be specified later. Here, the term Gϑ has a certain structural similarity to that in Lee et al., 2012, Lee et al., 2014,

Simulation study

We use the short-period approximation to the F-16 dynamics linearized about the nominal flight condition, and the dynamics are augmented to include the elevator actuator and angle-of-attack filter (Stevens et al., 2015). The system states of interest are vectorized as x=[α,q,δe]T, where α, q, and δe denote the angle of attack, pitch rate, and elevator deflection angle, respectively. The system output is the pitch rate. Thus, the F-16 dynamics are modeled as (1), (2) (Stevens et al., 2015),

Conclusion

This paper proposed an output-feedback-based RL for optimal tracking control of CT systems with unknown system dynamics. We have formulated the tracking control problem by proposing a dynamical controller modified from the standard output regulation theory. The optimal solution has been approximated online using the input–output data. We have derived a condition for the system parameterization matrix to be full row rank, based on which the optimal control gain is uniquely obtained through

Acknowledgments

The authors would like to thank the Associate Editor and anonymous reviewers for their constructive feedback that significantly improves the quality of this paper.

Ci Chen received the B.E. and Ph.D. degrees from School of Automation, Guangdong University of Technology, Guangzhou, China, in 2011 and 2016, respectively. From 2016 to 2018, he has been with The University of Texas at Arlington and The University of Tennessee at Knoxville as a Research Associate. He was awarded the Wallenberg-NTU Presidential Postdoctoral Fellowship. From 2018 to 2021, he was with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore and

References (48)

  • ChenC. et al.

    A data-driven prescribed convergence rate design for robust tracking of discrete-time systems

    Journal of Guangdong University of Technology

    (2021)
  • CichockiA. et al.

    Neural networks for solving systems of linear equations and related problems

    IEEE Transactions on Circuits and Systems I

    (1992)
  • Fan, J., Li, Z., Jiang, Y., Chai, T., & Lewis, F. L. (2018). Model-free linear discrete-time system H∞ control using...
  • GaoW. et al.

    Adaptive dynamic programming and adaptive optimal output regulation of linear systems

    IEEE Transactions on Automatic Control

    (2016)
  • GaoW. et al.

    Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems

    International Journal Control Autoation System

    (2018)
  • HuangJ.

    Nonlinear output regulation: Theory and applications

    (2004)
  • IoannouP.A. et al.

    Robust adaptive control

    (2012)
  • Jiang, Y., & Jiang, Z.-P. (2010). Approximate dynamic programming for output feedback control. In Proceedings of the...
  • JiangY. et al.

    Robust adaptive dynamic programming

    (2017)
  • JiangY. et al.

    Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning

    IEEE Transactions on Cybernetics

    (2020)
  • KhalilH.K.

    Nonlinear systems

    (2002)
  • KiumarsiB. et al.

    Optimal tracking control of unknown discrete-time linear systems using input–output measured data

    IEEE Transactions on Cybernetics

    (2015)
  • KleinmanD.

    On an iterative technique for Riccati equation computations

    IEEE Transactions on Automatic Control

    (1968)
  • KrenerA.J.

    The construction of optimal linear and nonlinear regulators

    Systems, Models and Feedback: Theory and Applications

    (1992)
  • Cited by (17)

    View all citing articles on Scopus

    Ci Chen received the B.E. and Ph.D. degrees from School of Automation, Guangdong University of Technology, Guangzhou, China, in 2011 and 2016, respectively. From 2016 to 2018, he has been with The University of Texas at Arlington and The University of Tennessee at Knoxville as a Research Associate. He was awarded the Wallenberg-NTU Presidential Postdoctoral Fellowship. From 2018 to 2021, he was with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore and with Department of Automatic Control, Lund University, Sweden as a researcher. He is now a professor with School of Automation, Guangdong University of Technology, Guangzhou, China. His research interests include reinforcement learning, resilient control, and computational intelligence. He is an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, an Editor for International Journal of Robust and Nonlinear Control, and an Associate Editor for Advanced Control for Applications: Engineering and Industrial Systems.

    Lihua Xie received the Ph.D. degree in electrical engineering from University of Newcastle, Australia, in 1992. Since 1992, he has been with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, where he is currently a Professor and the Director of Delta-NTU Corporate Laboratory for Cyber Physical Systems and the Center for Advanced Robotics Technology Innovation. He served as the Head of Division of Control and Instrumentation from July 2011 to June 2014. He held teaching appointments with the Department of Automatic Control, Nanjing University of Science and Technology from 1986 to 1989.

    His research interests include robust control and estimation, networked control systems, multi-agent networks, localization, and unmanned systems. He is an Editor-in-Chief for Unmanned Systems and has served as an Editor of IET Book Series in Control and an Associate Editor of a number of journals, including AUTOMATICA, IEEE TRANSACTIONS ON AUTOMATIC CONTROL, IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, IEEE TRANSACTIONS ON NETWORK CONTROL SYSTEMS, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II. He was an IEEE Distinguished Lecturer from January 2012 to December 2014. He is a Fellow of Academy of Engineering Singapore, IFAC, IEEE, and CAA.

    Kan Xie received the Ph.D. degree in control science and engineering from Guangdong University of Technology, Guangzhou, China, in 2017. He joined the Institute of Intelligent Information Processing, Guangdong University of Technology, where he is currently an Associate Professor. His research interests include machine learning, nonnegative signal processing, blind signal processing, smart grid, and Internet of Things.

    Frank L. Lewis is Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow AAAS, Fellow U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer. UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at the University of Texas at Arlington Research Institute.

    He obtained the Bachelor’s Degree in Physics/EE and the MSEE at Rice University, the MS in Aeronautical Engineering from Univ. W. Florida, and the Ph.D. at Ga. Tech. He works in feedback control, intelligent systems, cooperative control systems, and nonlinear systems. He is author of 7 U.S. patents, numerous journal special issues, journal papers, and 20 books, including Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control which are used as university textbooks world-wide. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, IEEE Computational Intelligence Society Neural Networks Pioneer Award, AIAA Intelligent Systems Award. Received Outstanding Service Award from Dallas IEEE Section, selected as Engineer of the year by Ft. Worth IEEE Section. Was listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Texas Regents Outstanding Teaching Award 2013.

    Shengli Xie received the B.S. degree in mathematics from Jilin University, Changchun, China, in 1983, the M.S. degree in mathematics from Central China Normal University, Wuhan, China, in 1995, and the Ph.D. degree in control theory and applications from South China University of Technology, Guangzhou, China, in 1997. He is currently a Full Professor and the Head of the Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou. He has coauthored two books and more than 150 research papers in refereed journals and conference proceedings and was awarded Highly Cited Researcher in 2020. His research interests include blind signal processing, machine learning, and Internet of Things. He was awarded the Second Prize of National Natural Science Award of China in 2009. He is a Fellow of IEEE, and serves as an Associate Editor for IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS. He is a Foreign Full Member (Academician) of the Russian Academy of Engineering.

    This work was supported in part by the National Natural Science Foundation of China under Grants 61973087 and U1911401, in part by State Key Laboratory of Synthetical Automation for Process Industries, China (2020-KF-21-02), in part by the Wallenberg-NTU Presidential Postdoctoral Fellowship, and in part by the Research and Development Program of Key Science and Technology Fields in Guangzhou City under Grant 202206030005 . The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Raul Ordonez under the direction of Editor Miroslav Krstic.

    View full text