Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning

doi:10.1016/j.automatica.2022.110581

Automatica

Volume 146, December 2022, 110581

https://doi.org/10.1016/j.automatica.2022.110581 Get rights and content

Abstract

Reinforcement learning provides a powerful tool for designing a satisfactory controller through interactions with the environment. Although off-policy learning algorithms were recently designed for tracking problems, most of these results either are full-state feedback or have bounded control errors, which may not be flexible or desirable for engineering problems in the real world. To address these problems, we propose an output-feedback-based reinforcement learning approach that allows us to find the optimal control solution using input–output data and ensure the asymptotic tracking control of continuous-time systems. More specifically, we first propose a dynamical controller revised from the standard output regulation theory and use it to formulate an optimal output tracking problem. Then, a state observer is used to re-express the system state. Consequently, we address the rank issue of the parameterization matrix and analyze the state re-expression error that are crucial for transforming the off-policy learning into an output-feedback form. A comprehensive simulation study is given to demonstrate the effectiveness of the proposed approach.

Introduction

Reinforcement learning (RL) (Lewis, Vrabie, and Syrmos, 2012, Sutton and Barto, 1998, Vrabie et al., 2013) or approximate/adaptive dynamic programming (ADP) (Bertsekas, 1995, Lewis, Vrabie, and Vamvoudakis, 2012, Powell, 2007) has been a powerful technique in computational intelligence by feedbacking interactions with the unknown environment for decision making. From a perspective of the control community, RL integrates adaptive control and optimal control with a feature that an adaptive controller solves an optimal control problem (Lewis & Vrabie, 2009), which is different from that of model-based methods documented in Ioannou and Sun (2012) and Krstić et al. (1995).

The RL-based research has been undergoing a period of a major transition from regulation control to tracking control (Chen et al., 2022, Chen and Xie, 2021, Jiang and Jiang, 2017, Lee et al., 2012, Lee et al., 2014, Lewis, Vrabie, and Syrmos, 2012, Vrabie et al., 2013), which covers more practical problems by allowing the system output to track a trajectory. What remains unchanged is that RL algorithms not only stabilize the closed-loop system, but also adaptively achieve a satisfactory tracking performance (Lewis, Vrabie, & Syrmos, 2012). There are two ways to approach the optimal tracking problem via RL for continuous-time (CT) systems, which are the focus of this paper. One way is to build an augmented system containing the original system and command generator, based on which a discounting factor in the cost function and integral RL are used to formulate an optimal tracking control (Modares & Lewis, 2014a). Later, such a result is extended to nonlinear systems (Kamalapurkar et al., 2015, Liu et al., 2017, Luo et al., 2015, Qin et al., 2014), constrained-input systems (Modares & Lewis, 2014b), $H_{\infty}$ control (Modares et al., 2015), and multi-agent systems (Chen et al., 2020, Modares et al., 2018, Modares, Nageshrao, et al., 2016). The other way is to bring the output regulation theory and optimal control together (Krener, 1992, Saberi et al., 2003). Using output regulation theory and ADP, Gao and Jiang (2016) proposed an asymptotic tracking controller, wherein system dynamics are assumed partially known. The work of Chen et al. (2019) removed such prior knowledge of system dynamics and achieved the optimal tracking control using the integral RL and experience replay.

However, most of the RL algorithms mentioned above require full-state measurements. As for control problems in the real world, full states usually are not available. Compared to full-state feedback, output feedback is more flexible and desirable for control implementation.

For output-feedback-based RL, Lewis and Vamvoudakis (2011) parameterized the system state into the Bellman equation and proposed an RL-based method for regulating discrete-time (DT) systems using input and output data. Later, output-feedback-based tracking control (Kiumarsi et al., 2015) was reported for DT systems, based on which $H_{\infty}$ control (Fan et al., 2018) was considered. The work of Rizvi and Lin (2018b) used a method of parameterizing the system state through a DT state observer (Tao, 2003). As for CT systems, Jiang and Jiang (2010) considered an on-policy iteration using a state observer. Using the static output-feedback control and adaptive state observer, a suboptimal controller was given in Zhu et al. (2015). Methods for discrete approximation of CT systems were given in Gao et al. (2016). Based on Modares and Lewis (2014a) and Lewis and Vamvoudakis (2011), Modares, Lewis, and Jiang (2016) considered an output-feedback tracking control for CT systems without a static output-feedback stabilizable condition as required in Zhu et al. (2015), by setting the eigenvalues of the command generator within a predetermined region. Unlike discrete approximation in Gao et al. (2016), Rizvi and Lin (2018a) considered an output-feedback optimal control for regulating CT systems using a state observer to reparameterize system state, which is a CT version of Rizvi and Lin (2018b).

Even though some progress has been made, most of the current output-feedback optimal controls are designed for regulating CT systems. However, in many applications, it is also important to drive the system output to follow a prespecified trajectory with a zero error (see examples in flight control (Stevens et al., 2015) and robotics (Lewis et al., 2003)).

In this paper, we aim to study an optimal output tracking problem of CT systems. We propose an output-feedback-based RL approach, wherein input–output data along system trajectories are used to find the optimal solution without solving output regulation equations and ensure the asymptotic tracking control of CT systems. The contributions of this paper are summarized in the following three aspects.

(1)
Compared to Gao et al. (2016) and Modares, Lewis, and Jiang (2016), we propose an output regulation and off-policy RL-based controller for the optimal output tracking problem.
(2)
Compared to the state re-expression in Rizvi and Lin (2018a), we derive a condition for the parameterization matrix to be full row rank, based on which an approximate optimal control gain is uniquely obtained through output-feedback-based RL.
(3)
The state re-expression error can be made small by running the system for a sufficiently long time before the data collection begins, based on which we propose an additional model-free pre-collection phase to supplement the off-policy learning for CT systems.

The remaining parts of this paper are organized as follows. In Section 2, we formulate an optimal output tracking control problem of CT systems by proposing a dynamical controller and review some preliminaries of optimal control. In Section 3, we present our main result on the optimal output tracking of CT systems via output-feedback-based RL. In Section 4, we provide a simulation example to demonstrate the effectiveness of the proposed approach. Conclusions are drawn in Section 5.

Notations: Throughout this paper, $ℂ^{-}$ stands for the open left-half complex plane. For a matrix $X$ , $λ (X)$ denotes its spectrum; $X > (\geq) 0$ means that the matrix $X$ is positive (semi-positive) definite; and $X < (\leq) 0$ denotes that $X$ is negative (semi-negative) definite. For a matrix $X \in R^{m \times n}$ , $v e c (X) = {[x_{1}^{T}, x_{2}^{T}, \dots, x_{n}^{T}]}^{T}$ denotes a vector-valued function of a matrix $X$ with $x_{i} \in R^{m}$ being the $i$ th column of $X$ . For a symmetric matrix $X \in R^{m \times m}$ , $v e c s (X) = [x_{11}, 2 x_{12}, \dots, 2 x_{1 m}, x_{22}, 2 x_{23}, \dots, 2 x_{m - 1, m}$ , $x_{m, m}]^{T} \in R^{\frac{1}{2} m (m + 1)}$ , where $x_{i j}$ denotes the entry in the $i$ th row and $j$ th column of the matrix $X$ . For a column vector $v \in R^{n}$ , $v e c v (v) = {[v_{1}^{2}, v_{1} v_{2}, \dots, v_{1} v_{n}, v_{2}^{2}, v_{2} v_{3}, \dots, v_{n - 1} v_{n}, v_{n}^{2}]}^{T} \in R^{\frac{1}{2} n (n + 1)}$ . $I$ and $O$ are an identity matrix and a zero matrix with appropriate dimensions, respectively. For a matrix $X$ , $det (X)$ denotes the determinant of the matrix and $adj (X)$ the adjugate of the matrix. For the matrix pair ( $A \in R^{m \times m}$ , $X \in R^{m \times n}$ ), the controllability matrix is defined as $C_{X} (A) = [X, A X, A^{2} X, \dots, A^{m - 1} X]$ .

Section snippets

Problem formulation

In this paper, a class of CT linear systems is defined as $\dot{x} = A x + B u,$ $y = C x,$ where $x \in R^{n}$ is the system state, $u \in R^{m}$ is the control input, $y \in R^{p}$ is the system output, and $A \in R^{n \times n}$ , $B \in R^{n \times m}$ , $C \in R^{p \times n}$ are unknown constant matrices. The command signal is generated through the following dynamics ${\dot{x}}_{d} = S x_{d},$ $y_{d} = R x_{d},$ where $x_{d} \in R^{q}$ is the system state, $y_{d} \in R^{p}$ is the output of the command generator, and $S \in R^{q \times q}$ , $R \in R^{p \times q}$ are unknown constant matrices.

For the system (1)–(4), some standard assumptions are made for the design

Main results on output-feedback-based reinforcement learning

In this section, we give an optimal tracking control of CT systems via off-policy learning in an output-feedback form.

To do this, an appropriate behavior policy is required for off-policy learning. We design the policy as $\dot{\bar{z}} = F \bar{z} - G y + G ϑ,$ $u = - T \bar{z} + u_{c},$ where $\bar{z} \in R^{p q}$ is the state, $ϑ \in R^{p}$ is an exploration noise, $F \in R^{p q \times p q}$ and $G \in R^{p q \times p}$ were given in (5), and $u_{c} \in R^{m}$ is a free variable to be specified later. Here, the term $G ϑ$ has a certain structural similarity to that in Lee et al., 2012, Lee et al., 2014,

Simulation study

We use the short-period approximation to the F-16 dynamics linearized about the nominal flight condition, and the dynamics are augmented to include the elevator actuator and angle-of-attack filter (Stevens et al., 2015). The system states of interest are vectorized as $x = {[α, q, δ_{e}]}^{T}$ , where $α$ , $q$ , and $δ_{e}$ denote the angle of attack, pitch rate, and elevator deflection angle, respectively. The system output is the pitch rate. Thus, the F-16 dynamics are modeled as (1), (2) (Stevens et al., 2015),

Conclusion

This paper proposed an output-feedback-based RL for optimal tracking control of CT systems with unknown system dynamics. We have formulated the tracking control problem by proposing a dynamical controller modified from the standard output regulation theory. The optimal solution has been approximated online using the input–output data. We have derived a condition for the system parameterization matrix to be full row rank, based on which the optimal control gain is uniquely obtained through

Acknowledgments

The authors would like to thank the Associate Editor and anonymous reviewers for their constructive feedback that significantly improves the quality of this paper.

References (48)

GaoW. et al.
Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming
Automatica
(2016)
KamalapurkarR. et al.
Approximate optimal trajectory tracking for continuous-time nonlinear systems
Automatica
(2015)
LeeJ.Y. et al.
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
Automatica
(2012)
ModaresH. et al.
Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning
Automatica
(2014)
ModaresH. et al.
Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning
Automatica
(2016)
BertsekasD.P.
Dynamic programming and optimal control, vol. 1
(1995)
ChenC.-T.
Linear system theory and design
(1999)
ChenC. et al.
Homotopic policy iteration-based learning design for unknown linear continuous-time systems
Automatica
(2022)
ChenC. et al.
Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems
Automatica
(2020)
ChenC. et al.
Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics
IEEE Transactions on Automatic Control
(2019)

ChenC. et al.

A data-driven prescribed convergence rate design for robust tracking of discrete-time systems

Journal of Guangdong University of Technology

(2021)

CichockiA. et al.

Neural networks for solving systems of linear equations and related problems

IEEE Transactions on Circuits and Systems I

(1992)

Fan, J., Li, Z., Jiang, Y., Chai, T., & Lewis, F. L. (2018). Model-free linear discrete-time system H∞ control using...

GaoW. et al.

Adaptive dynamic programming and adaptive optimal output regulation of linear systems

IEEE Transactions on Automatic Control

(2016)

GaoW. et al.

Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems

International Journal Control Autoation System

(2018)

HuangJ.

Nonlinear output regulation: Theory and applications

(2004)

IoannouP.A. et al.

Robust adaptive control

(2012)

Jiang, Y., & Jiang, Z.-P. (2010). Approximate dynamic programming for output feedback control. In Proceedings of the...

JiangY. et al.

Robust adaptive dynamic programming

(2017)

JiangY. et al.

Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning

IEEE Transactions on Cybernetics

(2020)

KhalilH.K.

Nonlinear systems

(2002)

KiumarsiB. et al.

Optimal tracking control of unknown discrete-time linear systems using input–output measured data

IEEE Transactions on Cybernetics

(2015)

KleinmanD.

On an iterative technique for Riccati equation computations

IEEE Transactions on Automatic Control

(1968)

KrenerA.J.

The construction of optimal linear and nonlinear regulators

Systems, Models and Feedback: Theory and Applications

(1992)

Cited by (17)

Integral reinforcement learning-based angular acceleration autopilot for high dynamic flight vehicles
2024, Applied Soft Computing
During the synthesis of acceleration autopilots for high dynamic flight vehicles (HDFV), autopilots with feedback of angular acceleration (AFAA) have become more perspective with stringent requirements on response speed and high maneuverability, compared with autopilots with feedback of angular rate (AFAR). Integral reinforcement learning (IRL) method has now proved to be an effective technique for adaptive optimal control of partially unknown nonlinear systems. In this paper, a novel data-driven IRL algorithm with “actor–critic” structure is proposed for HDFV utilizing AFAA. As an advanced model-free approach, “actor–critic” based IRL algorithm learns optimal behaviors by observing the real-time responses from the environment under the action of nonoptimal control policies. Instead of solving algebraic Riccati equation directly, the control policy updates online via the solution of proposed IRL Bellman equation with sensed quantities. Numerical simulation is carried out to validate the effectiveness of proposed online IRL-based angular acceleration autopilot for HDFVs. Besides, the tracking performance under different wave commands, the robustness against parameter uncertainties and the noise attenuation capacity between classical optimal tracking approach and proposed IRL method are analyzed for AFAR and AFAA, respectively. Simulation results show that, angular acceleration autopilot with proposed integral RL algorithm possesses better tracking performance against various disturbances.
Two-dimensional model-free Q-learning-based output feedback fault-tolerant control for batch processes
2024, Computers and Chemical Engineering
For batch processes with partial actuator failures and unknown system dynamics, an innovative two-dimensional (2D) model-free Q-learning algorithm is proposed to obtain the optimal controller's gains, achieving output feedback fault-tolerant control. First, a 2D linear model is constructed to describe batch processes with partial actuator failures. Then, the state increments in the batch direction and the output errors in the time direction are used as novel state variables to construct a multi-degree-of-freedom model. Second, a 2D Bellman equation is proposed through a connection between a 2D value and a 2D Q functions. Next, a 2D off-policy model-free Q-learning algorithm is highlighted, which incorporates target policies into a multi-degree-of-freedom model and focuses on using policy iteration to solve the fault-tolerant tracking control problem. The robustness analysis rigorously proves the stability of the closed-loop system. Lastly, the simulation results of the holding stage prove the feasibility and effectiveness of the presented algorithm.
NN-based reinforcement-learning optimal sliding mode control for drag-free and attitude of spacecraft with state constraints
2024, Advances in Space Research
This paper investigates a tracking control issue for a class of drag-free spacecraft with state constraints caused by the optical assembly. Firstly, a nonlinear kinematics and dynamics relative equation of coupling system of position and attitude is devised by a 6-degree of freedom (DOF) model of rigid body. Secondly, an optimal control method with a terminal functional is designed to meet state constraints. To solve the Hamilton-Jacobi-Bellman (HJB) equation generated by the optimal control of the nonlinear model, a policy iteration ideal is presented and the final numerical solution of the iteration is obtained by a critic neural network (NN). Thirdly, to enhance the robustness of the closed-loop system, an optimal sliding mode control is proposed, which can ensure the optimal and robustness performance simultaneously. At last, the simulation results demonstrate the performance of the proposed methods and the contrast of precision and robustness of two methods are showcased.
Quantized Control Design for Linear Systems Using Reinforcement Learning
2023, IFAC-PapersOnLine
In this paper, a quadratic stabilizing controller minimizing a cost function is designed through model-free and online reinforcement learning for systems with logarithmic quantized input. By introducing a new gain dependent on quantization density, the input and related weighting matrix in the cost function are deviated from their original ones. Then, using these deviated parameters, the controller is trained through reinforcement learning such that the closed-loop system satisfies the quadratic stability condition with the cost function minimized. An inverted pendulum example is used to show the effectiveness and merits of the proposed method.
Distributed output data-driven optimal robust synchronization of heterogeneous multi-agent systems
2023, Automatica
This work presents an output-feedback policy learning algorithm underlining input–output system data for distributed robust optimal synchronization of heterogeneous multi-agent systems. The output-feedback synchronization problem in the context of this work is formulated via robust output regulation and reinforcement learning modeling the interactions among agents by a zero-sum game. The proposed learning and control structure only requires the local system data for each agent and distributed output data among communicating neighbors. We utilize system-level synchysis for the continuous-time state reconstruction for the distributed learning with convergence and stability proofs under the proposed output-feedback policy for solving the zero-sum game. We further show that policy learning is assured under the proposed data criteria relating to input–output data only rather than any inter-immediate gains from policy iterations. Based on the cooperative robust output regulation, this work gains robustness after the learning is complete and establishes an output data-driven distributed optimal robust synchronization without knowing accurate system dynamics. A numerical example shows the effectiveness of the proposed learning algorithm.
Off-policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems
2024, International Journal of Robust and Nonlinear Control

View all citing articles on Scopus

Ci Chen received the B.E. and Ph.D. degrees from School of Automation, Guangdong University of Technology, Guangzhou, China, in 2011 and 2016, respectively. From 2016 to 2018, he has been with The University of Texas at Arlington and The University of Tennessee at Knoxville as a Research Associate. He was awarded the Wallenberg-NTU Presidential Postdoctoral Fellowship. From 2018 to 2021, he was with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore and with Department of Automatic Control, Lund University, Sweden as a researcher. He is now a professor with School of Automation, Guangdong University of Technology, Guangzhou, China. His research interests include reinforcement learning, resilient control, and computational intelligence. He is an Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, an Editor for International Journal of Robust and Nonlinear Control, and an Associate Editor for Advanced Control for Applications: Engineering and Industrial Systems.

Lihua Xie received the Ph.D. degree in electrical engineering from University of Newcastle, Australia, in 1992. Since 1992, he has been with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, where he is currently a Professor and the Director of Delta-NTU Corporate Laboratory for Cyber Physical Systems and the Center for Advanced Robotics Technology Innovation. He served as the Head of Division of Control and Instrumentation from July 2011 to June 2014. He held teaching appointments with the Department of Automatic Control, Nanjing University of Science and Technology from 1986 to 1989.

His research interests include robust control and estimation, networked control systems, multi-agent networks, localization, and unmanned systems. He is an Editor-in-Chief for Unmanned Systems and has served as an Editor of IET Book Series in Control and an Associate Editor of a number of journals, including AUTOMATICA, IEEE TRANSACTIONS ON AUTOMATIC CONTROL, IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, IEEE TRANSACTIONS ON NETWORK CONTROL SYSTEMS, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II. He was an IEEE Distinguished Lecturer from January 2012 to December 2014. He is a Fellow of Academy of Engineering Singapore, IFAC, IEEE, and CAA.

Kan Xie received the Ph.D. degree in control science and engineering from Guangdong University of Technology, Guangzhou, China, in 2017. He joined the Institute of Intelligent Information Processing, Guangdong University of Technology, where he is currently an Associate Professor. His research interests include machine learning, nonnegative signal processing, blind signal processing, smart grid, and Internet of Things.

Frank L. Lewis is Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow AAAS, Fellow U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer. UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at the University of Texas at Arlington Research Institute.

He obtained the Bachelor’s Degree in Physics/EE and the MSEE at Rice University, the MS in Aeronautical Engineering from Univ. W. Florida, and the Ph.D. at Ga. Tech. He works in feedback control, intelligent systems, cooperative control systems, and nonlinear systems. He is author of 7 U.S. patents, numerous journal special issues, journal papers, and 20 books, including Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control which are used as university textbooks world-wide. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, IEEE Computational Intelligence Society Neural Networks Pioneer Award, AIAA Intelligent Systems Award. Received Outstanding Service Award from Dallas IEEE Section, selected as Engineer of the year by Ft. Worth IEEE Section. Was listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Texas Regents Outstanding Teaching Award 2013.

Shengli Xie received the B.S. degree in mathematics from Jilin University, Changchun, China, in 1983, the M.S. degree in mathematics from Central China Normal University, Wuhan, China, in 1995, and the Ph.D. degree in control theory and applications from South China University of Technology, Guangzhou, China, in 1997. He is currently a Full Professor and the Head of the Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou. He has coauthored two books and more than 150 research papers in refereed journals and conference proceedings and was awarded Highly Cited Researcher in 2020. His research interests include blind signal processing, machine learning, and Internet of Things. He was awarded the Second Prize of National Natural Science Award of China in 2009. He is a Fellow of IEEE, and serves as an Associate Editor for IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS. He is a Foreign Full Member (Academician) of the Russian Academy of Engineering.

^☆: This work was supported in part by the National Natural Science Foundation of China under Grants 61973087 and U1911401, in part by State Key Laboratory of Synthetical Automation for Process Industries, China (2020-KF-21-02), in part by the Wallenberg-NTU Presidential Postdoctoral Fellowship, and in part by the Research and Development Program of Key Science and Technology Fields in Guangzhou City under Grant 202206030005 . The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Raul Ordonez under the direction of Editor Miroslav Krstic.

View full text

Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning☆

Abstract

Introduction

Section snippets

Problem formulation

Main results on output-feedback-based reinforcement learning

Simulation study

Conclusion

Acknowledgments

Automatica

Automatica

Automatica

Automatica

Automatica

Dynamic programming and optimal control, vol. 1

Linear system theory and design

Homotopic policy iteration-based learning design for unknown linear continuous-time systems

Automatica

Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems

Automatica

Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics

IEEE Transactions on Automatic Control

A data-driven prescribed convergence rate design for robust tracking of discrete-time systems

Journal of Guangdong University of Technology

Neural networks for solving systems of linear equations and related problems

IEEE Transactions on Circuits and Systems I

Adaptive dynamic programming and adaptive optimal output regulation of linear systems

IEEE Transactions on Automatic Control

Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems

International Journal Control Autoation System

Nonlinear output regulation: Theory and applications

Robust adaptive control

Robust adaptive dynamic programming

Optimal output regulation of linear discrete-time systems with unknown dynamics using reinforcement learning

IEEE Transactions on Cybernetics

Nonlinear systems

Optimal tracking control of unknown discrete-time linear systems using input–output measured data

IEEE Transactions on Cybernetics

On an iterative technique for Riccati equation computations

IEEE Transactions on Automatic Control

The construction of optimal linear and nonlinear regulators

Systems, Models and Feedback: Theory and Applications