Optimal control of nonlinear systems with dynamic programming

Isaac Tawiah; Yinglei Song

doi:10.1515/ijnsns-2017-0182

Published by De Gruyter February 15, 2021

Optimal control of nonlinear systems with dynamic programming

Isaac Tawiah and Yinglei Song

From the journal International Journal of Nonlinear Sciences and Numerical Simulation

https://doi.org/10.1515/ijnsns-2017-0182

Showing a limited preview of this publication:

Abstract

In this paper, a generalized technique for solving a class of nonlinear optimal control problems is proposed. The optimization problem is formulated based on the cost-to-go functional approach and the optimal solution can be obtained by Bellman’s technique. Specifically, a continuous nonlinear system is first discretized and a set of equality constraints can be obtained from the discretization. We show that, under a certain condition, the optimal solution of a problem in this class can be approximated by a solution of the set of equality constraints within any precision and the system is guaranteed to be stable under a control signal obtained from the solution. An iterative approach is then applied to numerically solve the set of equality constraints. The technique is tested on a nonlinear control problem from the class and simulation results show that the approach is not only effective but also leads to a fast convergence and accurate optimal solution.

Keywords: cost-to-go function; discretization; dynamic programming; nonlinear systems; optimal control

Corresponding author: Yinglei Song, School of Electronics and Information Science, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212003, China, E-mail: syinglei2013@163.com

Funding source: Government of Jiangsu ProvinceGovernment of Jiangsu Province

Award Identifier / Grant number: 1034901501

Award Identifier / Grant number: Unassigned

Acknowledgment

The authors are grateful for the constructive comments and suggestions from the anonymous reviewer on an earlier version of this paper. This work is fully supported by the Fund of Specially Appointed Professor of Jiangsu Province under the grant number: 1034901501.

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work is fully supported by the Fund of Specially AppointedProfessor of Jiangsu Province (Funder DOI: https://doi.org/10.13039/501100002949) under the grant number: 1034901501.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

[1] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, 2005. https://doi.org/10.1016/j.automatica.2004.11.034.Search in Google Scholar

[2] S. Tong and Y. Li, “Adaptive fuzzy output feedback tracking backstepping control of strict-feedback nonlinear systems with unknown dead zones,” IEEE Trans. Fuzzy Syst., vol. 20, no. 1, pp. 168–180, 2012. https://doi.org/10.1109/tfuzz.2011.2171189.Search in Google Scholar

[3] S. Tong and Y. Li, “Adaptive fuzzy output feedback control of MIMO nonlinear systems with unknown dead-zone inputs,” IEEE Trans. Fuzzy Syst., vol. 21, no. 1, pp. 134–146, 2013. https://doi.org/10.1109/tfuzz.2012.2204065.Search in Google Scholar

[4] T. Dierks and S. Jagannathan, “Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update,” IEEE Trans. Neural Netw. Learn. Syst., vol. 22, no. 7, pp. 1118–1129, 2012. https://doi.org/10.1109/tnnls.2012.2196708.Search in Google Scholar

[5] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica, vol. 49, no. 1, pp. 82–92, 2013. https://doi.org/10.1016/j.automatica.2012.09.019.Search in Google Scholar

[6] S. Tong, B. Huo, and Y. Li, “Observer-based adaptive decentralized fuzzy fault-tolerant control of nonlinear large-scale systems with actuator failures,” IEEE Trans. Fuzzy Syst., vol. 22, no. 1, pp. 1–15, 2014. https://doi.org/10.1109/tfuzz.2013.2241770.Search in Google Scholar

[7] C. Li, D. Liu, and D. Wang, “Data-based optimal control for weakly coupled nonlinear systems using policy iteration,” IEEE Trans. Syst. Man Cybern., vol. 48, no. 4, pp. 511–521, 2018. https://doi.org/10.1109/tsmc.2016.2606479.Search in Google Scholar

[8] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, in The Mathematical Theory of Optimal Processes, Translated from the Russian by K. N. Trirogo, L. W. Neustadt, Ed., New York-London, Interscience Publishers John Wiley & Sons, Inc., 1962.Search in Google Scholar

[9] H. J. Kelly, “Gradient theory of optimal flight path,” ARS J., vol. 30, no. 10, pp. 947–954, 1960. https://doi.org/10.2514/8.5282.Search in Google Scholar

[10] H. J. Kelly, in Method of Gradients, Optimization Techniques, G. Leitman, Ed., New York, Academic Press, 1962.Chapter 6.10.1016/S0076-5392(08)62094-9Search in Google Scholar

[11] R. Fletcher and M. J. D. Powell, “A rapidly convergent descent method for minimization,” Br. Comput. J., pp. 163–168, 1963. https://doi.org/10.1093/comjnl/6.2.163.Search in Google Scholar

[12] R. Bellman and S. Dreyfus, Applied Dynamic Programming, Princeton, N.J., Princeton University Press, 1962.10.1515/9781400874651Search in Google Scholar

[13] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Trans. Circuits Syst. Magazine, vol. 9, no. 3, pp. 32–50, 2009. https://doi.org/10.1109/mcas.2009.933854.Search in Google Scholar

[14] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof,” IEEE Trans. Syst. Man Cybern. Syst., vol. 38, no. 4, pp. 943–949, 2008. https://doi.org/10.1109/tsmcb.2008.926614.Search in Google Scholar

[15] F. Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: an introduction,” IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39–47, 2009. https://doi.org/10.1109/mci.2009.932261.Search in Google Scholar

[16] F. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with (epsilon)-error bound,” IEEE Trans. Neural Network., vol. 22, no. 1, pp. 24–36, 2011. https://doi.org/10.1109/tnn.2010.2076370.Search in Google Scholar

[17] H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method,” IEEE Trans. Neural Network., vol. 22, no. 12, pp. 2226–2236, 2011. https://doi.org/10.1109/tnn.2011.2168538.Search in Google Scholar

[18] X. Zhang, H. Zhang, Q. Sun, and Y. Luo, “Adaptive dynamic programming-based optimal control of unknown nonaffine discrete-time systems with proof of convergence,” Neurocomputing, vol. 91, pp. 48–55, 2012. https://doi.org/10.1016/j.neucom.2012.01.025.Search in Google Scholar

[19] D. Wang, D. Liu, and Q. Wei, “Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach,” Neurocomputing, vol. 78, no. 1, pp. 14–22, 2012. https://doi.org/10.1016/j.neucom.2011.03.058.Search in Google Scholar

[20] S. Sui, S. Tong, and K. Sun, “Adaptive-dynamic-programming-based fuzzy control for triangular structure nonlinear uncertain systems with unknown time delay,” Optim. Contr. Appl. Methods, vol. 39, no. 2, 2018. https://doi.org/10.1002/oca.2379.Search in Google Scholar

[21] T. Chettibi, H. Lehtihet, M. Haddad, and S. Hanchi, “Minimum cost trajectory planning for industrial robots,” Eur. J. Mech., vol. 23, no. 4, pp. 703–745, 2004. https://doi.org/10.1016/j.euromechsol.2004.02.006.Search in Google Scholar

[22] C. R. Hargraves and S. W. Paris, “Direct trajectory optimization using nonlinear programming and collocation,” J. Guid. Contr. Dynam., vol. 10, no. 4, pp. 338–342, 1987. https://doi.org/10.2514/3.20223.Search in Google Scholar

[23] P. F. Gath and K. H. Well, “Trajectory optimization using a combination of direct multiple shooting and collocation,” Proceedings of the 2001 AIAA Guidance, Navigation, and Control Conference, AIAA 2001-4047, Montreal, Quebec, Canada, 2001, pp. 6–9.10.2514/6.2001-4047Search in Google Scholar

[24] M. Y. Malik and T. Salahuddin, “Numerical solution of MHD stagnation point flow of williamson fluid model over a stretching cylinder,” Int. J. Nonlinear Sci. Numer. Simul., vol. 16, nos 3–4, pp. 161–164, 2015. https://doi.org/10.1515/ijnsns-2014-0035.Search in Google Scholar

[25] David. H. Jacobson, and David. Q. Mayne, Differential Dynamic Programming, Elsevier, 1970.Search in Google Scholar

[26] H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games,” Automatica, vol. 47, no. 1, pp. 207–214, 2011. https://doi.org/10.1016/j.automatica.2010.10.033.Search in Google Scholar

[27] H. Zhang, C. Qin, B. Jiang, and Y. Luo, “Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12, pp. 2706–2718, 2014. https://doi.org/10.1109/tcyb.2014.2313915.Search in Google Scholar

[28] H. Zhang, T. Feng, G. Yang, and H. Liang, “Distributed cooperative optimal control for multiagent systems on directed graphs: an inverse optimal approach,” IEEE Trans. Cybern., vol. 45, no. 7, pp. 1315–1326, 2015. https://doi.org/10.1109/TCYB.2014.2350511.Search in Google Scholar

[29] H. Zhang, Y. Luo, and D. Liu, “Neural-network-based near optimal control for a class of discrete-time affine nonlinear systems with control constraints,” IEEE Trans. Neural Network., vol. 20, no. 9, pp. 1490–1503, 2009. https://doi.org/10.1109/TNN.2009.2027233.Search in Google Scholar

[30] H. Zhang, C. Qin, and Y. Luo, “Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 3, pp. 839–849, 2014. https://doi.org/10.1109/tase.2014.2303139.Search in Google Scholar

[31] R. E. Bellman, Dynamic Programming, Princeton, NJ, Dover, Princeton University Press, 1957, republished in 2003, ISBN 0-486-42809-5.Search in Google Scholar

[32] F. S. Acton, Numerical Methods that Work, Chapter 2, Washington, DC, Mathematics Association of America, 1990.10.1090/spec/002Search in Google Scholar

[33] W. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, USA, NJ, Hoboken, Wiley, 2007.10.1002/9780470182963Search in Google Scholar

Received: 2017-08-21

Revised: 2018-08-21

Accepted: 2021-01-14

Published Online: 2021-02-15

Published in Print: 2021-04-27

Optimal control of nonlinear systems with dynamic programming

Abstract

Acknowledgment

References

Journal and Issue

Articles in the same Issue