Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning☆
Introduction
Reinforcement learning (RL) (Lewis, Vrabie, and Syrmos, 2012, Sutton and Barto, 1998, Vrabie et al., 2013) or approximate/adaptive dynamic programming (ADP) (Bertsekas, 1995, Lewis, Vrabie, and Vamvoudakis, 2012, Powell, 2007) has been a powerful technique in computational intelligence by feedbacking interactions with the unknown environment for decision making. From a perspective of the control community, RL integrates adaptive control and optimal control with a feature that an adaptive controller solves an optimal control problem (Lewis & Vrabie, 2009), which is different from that of model-based methods documented in Ioannou and Sun (2012) and Krstić et al. (1995).
The RL-based research has been undergoing a period of a major transition from regulation control to tracking control (Chen et al., 2022, Chen and Xie, 2021, Jiang and Jiang, 2017, Lee et al., 2012, Lee et al., 2014, Lewis, Vrabie, and Syrmos, 2012, Vrabie et al., 2013), which covers more practical problems by allowing the system output to track a trajectory. What remains unchanged is that RL algorithms not only stabilize the closed-loop system, but also adaptively achieve a satisfactory tracking performance (Lewis, Vrabie, & Syrmos, 2012). There are two ways to approach the optimal tracking problem via RL for continuous-time (CT) systems, which are the focus of this paper. One way is to build an augmented system containing the original system and command generator, based on which a discounting factor in the cost function and integral RL are used to formulate an optimal tracking control (Modares & Lewis, 2014a). Later, such a result is extended to nonlinear systems (Kamalapurkar et al., 2015, Liu et al., 2017, Luo et al., 2015, Qin et al., 2014), constrained-input systems (Modares & Lewis, 2014b), control (Modares et al., 2015), and multi-agent systems (Chen et al., 2020, Modares et al., 2018, Modares, Nageshrao, et al., 2016). The other way is to bring the output regulation theory and optimal control together (Krener, 1992, Saberi et al., 2003). Using output regulation theory and ADP, Gao and Jiang (2016) proposed an asymptotic tracking controller, wherein system dynamics are assumed partially known. The work of Chen et al. (2019) removed such prior knowledge of system dynamics and achieved the optimal tracking control using the integral RL and experience replay.
However, most of the RL algorithms mentioned above require full-state measurements. As for control problems in the real world, full states usually are not available. Compared to full-state feedback, output feedback is more flexible and desirable for control implementation.
For output-feedback-based RL, Lewis and Vamvoudakis (2011) parameterized the system state into the Bellman equation and proposed an RL-based method for regulating discrete-time (DT) systems using input and output data. Later, output-feedback-based tracking control (Kiumarsi et al., 2015) was reported for DT systems, based on which control (Fan et al., 2018) was considered. The work of Rizvi and Lin (2018b) used a method of parameterizing the system state through a DT state observer (Tao, 2003). As for CT systems, Jiang and Jiang (2010) considered an on-policy iteration using a state observer. Using the static output-feedback control and adaptive state observer, a suboptimal controller was given in Zhu et al. (2015). Methods for discrete approximation of CT systems were given in Gao et al. (2016). Based on Modares and Lewis (2014a) and Lewis and Vamvoudakis (2011), Modares, Lewis, and Jiang (2016) considered an output-feedback tracking control for CT systems without a static output-feedback stabilizable condition as required in Zhu et al. (2015), by setting the eigenvalues of the command generator within a predetermined region. Unlike discrete approximation in Gao et al. (2016), Rizvi and Lin (2018a) considered an output-feedback optimal control for regulating CT systems using a state observer to reparameterize system state, which is a CT version of Rizvi and Lin (2018b).
Even though some progress has been made, most of the current output-feedback optimal controls are designed for regulating CT systems. However, in many applications, it is also important to drive the system output to follow a prespecified trajectory with a zero error (see examples in flight control (Stevens et al., 2015) and robotics (Lewis et al., 2003)).
In this paper, we aim to study an optimal output tracking problem of CT systems. We propose an output-feedback-based RL approach, wherein input–output data along system trajectories are used to find the optimal solution without solving output regulation equations and ensure the asymptotic tracking control of CT systems. The contributions of this paper are summarized in the following three aspects.
- (1)
Compared to Gao et al. (2016) and Modares, Lewis, and Jiang (2016), we propose an output regulation and off-policy RL-based controller for the optimal output tracking problem.
- (2)
Compared to the state re-expression in Rizvi and Lin (2018a), we derive a condition for the parameterization matrix to be full row rank, based on which an approximate optimal control gain is uniquely obtained through output-feedback-based RL.
- (3)
The state re-expression error can be made small by running the system for a sufficiently long time before the data collection begins, based on which we propose an additional model-free pre-collection phase to supplement the off-policy learning for CT systems.
The remaining parts of this paper are organized as follows. In Section 2, we formulate an optimal output tracking control problem of CT systems by proposing a dynamical controller and review some preliminaries of optimal control. In Section 3, we present our main result on the optimal output tracking of CT systems via output-feedback-based RL. In Section 4, we provide a simulation example to demonstrate the effectiveness of the proposed approach. Conclusions are drawn in Section 5.
Notations: Throughout this paper, stands for the open left-half complex plane. For a matrix , denotes its spectrum; means that the matrix is positive (semi-positive) definite; and denotes that is negative (semi-negative) definite. For a matrix , denotes a vector-valued function of a matrix with being the th column of . For a symmetric matrix , , , where denotes the entry in the th row and th column of the matrix . For a column vector , . and are an identity matrix and a zero matrix with appropriate dimensions, respectively. For a matrix , denotes the determinant of the matrix and the adjugate of the matrix. For the matrix pair (, ), the controllability matrix is defined as .
Section snippets
Problem formulation
In this paper, a class of CT linear systems is defined as where is the system state, is the control input, is the system output, and , , are unknown constant matrices. The command signal is generated through the following dynamics where is the system state, is the output of the command generator, and , are unknown constant matrices.
For the system (1)–(4), some standard assumptions are made for the design
Main results on output-feedback-based reinforcement learning
In this section, we give an optimal tracking control of CT systems via off-policy learning in an output-feedback form.
To do this, an appropriate behavior policy is required for off-policy learning. We design the policy as where is the state, is an exploration noise, and were given in (5), and is a free variable to be specified later. Here, the term has a certain structural similarity to that in Lee et al., 2012, Lee et al., 2014,
Simulation study
We use the short-period approximation to the F-16 dynamics linearized about the nominal flight condition, and the dynamics are augmented to include the elevator actuator and angle-of-attack filter (Stevens et al., 2015). The system states of interest are vectorized as , where , , and denote the angle of attack, pitch rate, and elevator deflection angle, respectively. The system output is the pitch rate. Thus, the F-16 dynamics are modeled as (1), (2) (Stevens et al., 2015),
Conclusion
This paper proposed an output-feedback-based RL for optimal tracking control of CT systems with unknown system dynamics. We have formulated the tracking control problem by proposing a dynamical controller modified from the standard output regulation theory. The optimal solution has been approximated online using the input–output data. We have derived a condition for the system parameterization matrix to be full row rank, based on which the optimal control gain is uniquely obtained through
Acknowledgments
The authors would like to thank the Associate Editor and anonymous reviewers for their constructive feedback that significantly improves the quality of this paper.
Ci Chen received the B.E. and Ph.D. degrees from School of Automation, Guangdong University of Technology, Guangzhou, China, in 2011 and 2016, respectively. From 2016 to 2018, he has been with The University of Texas at Arlington and The University of Tennessee at Knoxville as a Research Associate. He was awarded the Wallenberg-NTU Presidential Postdoctoral Fellowship. From 2018 to 2021, he was with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore and
References (48)
- et al.
Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming
Automatica
(2016) - et al.
Approximate optimal trajectory tracking for continuous-time nonlinear systems
Automatica
(2015) - et al.
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
Automatica
(2012) - et al.
Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning
Automatica
(2014) - et al.
Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning
Automatica
(2016) Dynamic programming and optimal control, vol. 1
(1995)Linear system theory and design
(1999)- et al.
Homotopic policy iteration-based learning design for unknown linear continuous-time systems
Automatica
(2022) - et al.
Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems
Automatica
(2020) - et al.
Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics
IEEE Transactions on Automatic Control
(2019)
A data-driven prescribed convergence rate design for robust tracking of discrete-time systems
Journal of Guangdong University of Technology
Neural networks for solving systems of linear equations and related problems
IEEE Transactions on Circuits and Systems I
Adaptive dynamic programming and adaptive optimal output regulation of linear systems
IEEE Transactions on Automatic Control
Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems
International Journal Control Autoation System
Nonlinear output regulation: Theory and applications
Robust adaptive control
Robust adaptive dynamic programming
Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning
IEEE Transactions on Cybernetics
Nonlinear systems
Optimal tracking control of unknown discrete-time linear systems using input–output measured data
IEEE Transactions on Cybernetics
On an iterative technique for Riccati equation computations
IEEE Transactions on Automatic Control
The construction of optimal linear and nonlinear regulators
Systems, Models and Feedback: Theory and Applications
Cited by (17)
Integral reinforcement learning-based angular acceleration autopilot for high dynamic flight vehicles
2024, Applied Soft ComputingTwo-dimensional model-free Q-learning-based output feedback fault-tolerant control for batch processes
2024, Computers and Chemical EngineeringNN-based reinforcement-learning optimal sliding mode control for drag-free and attitude of spacecraft with state constraints
2024, Advances in Space ResearchQuantized Control Design for Linear Systems Using Reinforcement Learning
2023, IFAC-PapersOnLineOff-policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems
2024, International Journal of Robust and Nonlinear Control
Ci Chen received the B.E. and Ph.D. degrees from School of Automation, Guangdong University of Technology, Guangzhou, China, in 2011 and 2016, respectively. From 2016 to 2018, he has been with The University of Texas at Arlington and The University of Tennessee at Knoxville as a Research Associate. He was awarded the Wallenberg-NTU Presidential Postdoctoral Fellowship. From 2018 to 2021, he was with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore and with Department of Automatic Control, Lund University, Sweden as a researcher. He is now a professor with School of Automation, Guangdong University of Technology, Guangzhou, China. His research interests include reinforcement learning, resilient control, and computational intelligence. He is an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, an Editor for International Journal of Robust and Nonlinear Control, and an Associate Editor for Advanced Control for Applications: Engineering and Industrial Systems.
Lihua Xie received the Ph.D. degree in electrical engineering from University of Newcastle, Australia, in 1992. Since 1992, he has been with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, where he is currently a Professor and the Director of Delta-NTU Corporate Laboratory for Cyber Physical Systems and the Center for Advanced Robotics Technology Innovation. He served as the Head of Division of Control and Instrumentation from July 2011 to June 2014. He held teaching appointments with the Department of Automatic Control, Nanjing University of Science and Technology from 1986 to 1989.
His research interests include robust control and estimation, networked control systems, multi-agent networks, localization, and unmanned systems. He is an Editor-in-Chief for Unmanned Systems and has served as an Editor of IET Book Series in Control and an Associate Editor of a number of journals, including AUTOMATICA, IEEE TRANSACTIONS ON AUTOMATIC CONTROL, IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, IEEE TRANSACTIONS ON NETWORK CONTROL SYSTEMS, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II. He was an IEEE Distinguished Lecturer from January 2012 to December 2014. He is a Fellow of Academy of Engineering Singapore, IFAC, IEEE, and CAA.
Kan Xie received the Ph.D. degree in control science and engineering from Guangdong University of Technology, Guangzhou, China, in 2017. He joined the Institute of Intelligent Information Processing, Guangdong University of Technology, where he is currently an Associate Professor. His research interests include machine learning, nonnegative signal processing, blind signal processing, smart grid, and Internet of Things.
Frank L. Lewis is Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow AAAS, Fellow U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer. UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at the University of Texas at Arlington Research Institute.
He obtained the Bachelor’s Degree in Physics/EE and the MSEE at Rice University, the MS in Aeronautical Engineering from Univ. W. Florida, and the Ph.D. at Ga. Tech. He works in feedback control, intelligent systems, cooperative control systems, and nonlinear systems. He is author of 7 U.S. patents, numerous journal special issues, journal papers, and 20 books, including Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control which are used as university textbooks world-wide. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, IEEE Computational Intelligence Society Neural Networks Pioneer Award, AIAA Intelligent Systems Award. Received Outstanding Service Award from Dallas IEEE Section, selected as Engineer of the year by Ft. Worth IEEE Section. Was listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Texas Regents Outstanding Teaching Award 2013.
Shengli Xie received the B.S. degree in mathematics from Jilin University, Changchun, China, in 1983, the M.S. degree in mathematics from Central China Normal University, Wuhan, China, in 1995, and the Ph.D. degree in control theory and applications from South China University of Technology, Guangzhou, China, in 1997. He is currently a Full Professor and the Head of the Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou. He has coauthored two books and more than 150 research papers in refereed journals and conference proceedings and was awarded Highly Cited Researcher in 2020. His research interests include blind signal processing, machine learning, and Internet of Things. He was awarded the Second Prize of National Natural Science Award of China in 2009. He is a Fellow of IEEE, and serves as an Associate Editor for IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS. He is a Foreign Full Member (Academician) of the Russian Academy of Engineering.
- ☆
This work was supported in part by the National Natural Science Foundation of China under Grants 61973087 and U1911401, in part by State Key Laboratory of Synthetical Automation for Process Industries, China (2020-KF-21-02), in part by the Wallenberg-NTU Presidential Postdoctoral Fellowship, and in part by the Research and Development Program of Key Science and Technology Fields in Guangzhou City under Grant 202206030005 . The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Raul Ordonez under the direction of Editor Miroslav Krstic.