Online model-free controller for flexible wing aircraft: a policy iteration-based reinforcement learning approach

Abouheaf, Mohammed; Gueaieb, Wail

doi:10.1007/s41315-019-00105-3

Online model-free controller for flexible wing aircraft: a policy iteration-based reinforcement learning approach

Regular Paper
Published: 18 October 2019

Volume 4, pages 21–43, (2020)
Cite this article

International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript

417 Accesses
6 Citations
Explore all metrics

Abstract

The aerodynamic model of flexible wing aircraft is highly nonlinear with continuously time-varying dynamics under kinematic constraints. The nonlinearities stem from the aerodynamic forces and continuous deformations in the flexible wing. In spite of the various experimental attempts and theoretical setups that were made to model these dynamics, an accurate formulation was not achieved. The control paradigms of the aircraft are concerned with the electro-mechanical coupling between the pilot and the wing. It is challenging to design a flight controller for such aircraft while complying with these constraints. In this paper, innovative machine learning technique is employed to design a robust online model-free control scheme for flexible wing aircraft. The controller maintains internal asymptotic stability for the aircraft in real-time using selected set of measurements or states in uncertain dynamical environment. It intelligently incorporates the varying dynamics, geometric parameters, and physical constraints of the aircraft into optimal control strategies. The adaptive learning structure employs a policy iteration approach, taking advantage of Bellman optimality principles, to converge to an optimal control solution for the problem. Artificial neural networks are adopted to implement the adaptive learning algorithm in real-time without prior knowledge of the aerodynamic model of the aircraft. The control scheme is generalized and shown to function effectively for different pilot/wing control mechanisms. It also demonstrated its ability to overcome the undesired stability problems caused by coupling the pilot’s dynamics with the flexible wing’s frame of motion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topology Optimization in Aircraft and Aerospace Structures Design

Article 14 April 2015

Ji-Hong Zhu, Wei-Hong Zhang & Liang Xia

Introduction to Reinforcement Learning

Machine learning-based CFD simulations: a review, models, open threats, and future tactics

Article 25 September 2022

Dhruvil Panchigar, Kunal Kar, … Senthil Kumaran Selvaraj

References

Abouheaf, M., Gueaieb, W.: Model-free value iteration solution for dynamic graphical games. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6 (2018)
Abouheaf, M., Gueaieb, W.: Multi-agent reinforcement learning approach based on reduced value function approximations. In: IEEE International Symposium on Robotics and Intelligent Sensors (IRIS), pp. 111–116 (2017)
Abouheaf, M., Gueaieb, W.: Reinforcement learning solution with costate approximation for a flexible wing aircraft. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6 (2018)
Abouheaf, M., Lewis, F.: Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Abouheaf, M.I., Mahmoud, M.S.: Online policy iteration solution for dynamic graphical games. In: 2016 13th International Multi-Conference on Systems, Signals Devices (SSD), pp. 787–797 (2016)
Abouheaf, M., Lewis, F.: Dynamic Graphical Games: Online Adaptive Learning Solutions Using Approximate Dynamic Programming, vol. 1, pp. 1–48. World Scientific, Singapore (2014)
MATH Google Scholar
Abouheaf, M.I., Mahmoud, M.S.: Chapter 5—online adaptive learning control schemes for microgrids. In: Mahmoud, M.S. (ed.) Microgrid, pp. 137–171. Butterworth-Heinemann, New York (2017)
Chapter Google Scholar
Abouheaf, M.I., Mahmoud, M.S.: Policy iteration and coupled riccati solutions for dynamic graphical games. Int. J. Digit. Signals Smart Syst. 1(2), 143 (2017)
Google Scholar
Abouheaf, Mohammed, Mahmoud, Magdi: Policy iteration and coupled riccati solutions for dynamic graphical games. Int. J. Digit. Signals Smart Syst. 1(2), 143–162 (2017)
Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(4), 943–949 (2008)
Article Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. Proc. IEEE Conf. Decis. Control 1, 560–564 (1995)
Google Scholar
Blake, D.: Modelling the aerodynamics, stability and control of the hang glider. MA thesis, College of Aeronautics, Cranfield Institute of Technology (1991)
Cook, M. V., Kilkenny, E. A.: An experimental investigation of the aerodynamics of the hang glider. In Proceedings of an International Conference on Aerodynamics, (1986)
Cook, M.V., Spottiswoode, M.: Modelling the flight dynamics of the hang glider. Aeronaut. J. 109(1102), I–XX (2005)
Cook, M.V.: The theory of the longitudinal static stability of the hang-glider. Aeronaut. J. 98(978), 292–304 (1994)
Article Google Scholar
Cook, M.V.: Flight Dynamics Principles: A Linear Systems Approach to Aircraft Stability and Control. Aerospace Engineering, 3rd edn. Butterworth-Heinemann, Oxford (2013)
Google Scholar
de Matteis, Guido: Response of hang gliders to control. Aeronaut. J. 94(938), 289–294 (1990)
Google Scholar
de Matteis, Guido: Dynamics of hang gliders. J. Guid. Control Dyn. 14(6), 1145–1152 (1991)
Article Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. phdthesis, Department of Electrical Engineering, Massachusetts Institute of Technology (1960)
Kilkenny, E. .: An evaluation of a mobile aerodynamic test facility for hang glider wings. Technical Report 8330, College of Aeronautics, Cranfield Institute of Technology (1983)
Kilkenny, E.A.: An experimental study of the longitudinal aerodynamic and static stability characteristics of hang gliders. phdthesis, Cranfield University, September (1986)
Kilkenny, E. A.: Full scale wind tunnel tests on hang glider pilots. Technical Report 8416, College of Aeronautics, Cranfield Institute of Technology, (1984)
Kroo, I.: Aerodynamics, Aeroelasticity and Stability of Hang Gliders. Stanford University, Stanford (1983)
Google Scholar
Lewis, Frank, Vrabie, Draguna, Syrmos, Vassilis: Optimal Control, 3rd edn. Wiley, New York (2012)
Book Google Scholar
Ochi, Y.: Modeling of flight dynamics and pilot’s handling of a hang glider. In: AIAA Modeling and Simulation Technologies Conference, pp. 1758–1776. American Institute of Aeronautics and Astronautics (2017)
Ochi, Y.: Modeling of the longitudinal dynamics of a hang glider. In: AIAA Modeling and Simulation Technologies Conference, pp. 1591–1608. American Institute of Aeronautics and Astronautics (2015)
Powton, J.: A theoretical study of the non-linear aerodynamic pitching moment characteristics of the hang glider and its influence on stability and control. MA thesis, College of Aeronautics, Cranfield Institute of Technology (1995)
Rollins, R.: Study of experimental data to assess the longitudinal stability and control of the hang glider. MA thesis, College of Aeronautics, Cranfield Institute of Technology (2000)
Si, J., Barto, A., Powell, W., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. The Institute of Electrical and Electronics Engineers., Inc., Piscataway (2004)
Book Google Scholar
Spottiswoode, M.: A theoretical study of the lateral-directional dynamics, stability and control of the hang glider. MA thesis, College of Aeronautics, Cranfield Institute of Technology (2001)
Sutton, Richard S., Barto, Andrew G.: Reinforcement Learning: An Introduction, 1st edn. MIT Press, Cambridge (1998)
MATH Google Scholar
Vamvoudakis, Kyriakos G., Lewis, Frank L., Hudas, Greg R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Article MathSciNet Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Article MathSciNet Google Scholar
Webros, P.J.: A menu of designs for reinforcement learning over time. In: Miller III, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1990)
Google Scholar
Webros, P.J.: Neurocontrol and supervised learning: an overview and evaluation. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pp. 65–89. Van Nostrand Reinhold, New York. (1992)
Google Scholar
Weiss, G. (ed.): Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
Mohammed Abouheaf & Wail Gueaieb
College of Energy Engineering, Aswan University, Aswan, Egypt
Mohammed Abouheaf

Authors

Mohammed Abouheaf
View author publications
You can also search for this author in PubMed Google Scholar
Wail Gueaieb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Abouheaf.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by Ontario Centers of Excellence (OCE).

Appendices

Appendix 1: State space matrices of Model 1

This position control model is derived at a trim speed of ${10.8}\, {\hbox {ms}^{-1}}$ (Cook and Spottiswoode 2005).

$$\begin{aligned} A^{Lon}= & {} \begin{bmatrix} -0.1730&\quad0.6538&\quad0.1388&\quad-9.7222\\ -1.4208&\quad-2.2535&\quad10.7370&\quad1.3093\\ 0.2685&\quad-0.4402&\quad-1.4113&\quad0\\ 0&\quad0&\quad1&\quad0 \\ \end{bmatrix} , B^{Lon}= \begin{bmatrix} 0\\0\\7.46\\0 \end{bmatrix} \\ A^{Lat}= & {} \begin{bmatrix} -0.2195&\quad-0.1580&\quad-10.798&\quad9.722&\quad-1.3098\\ -1.4670&\quad-21.318&\quad7.5163&\quad0&\quad0\\ 0.2906&\quad3.7362&\quad-2.1119&\quad0&\quad0\\ 0&\quad1&\quad0&\quad0&\quad0\\ 0&\quad0&\quad1&\quad0&\quad0 \end{bmatrix} , B^{Lat}= \begin{bmatrix} 0 \\ 3.6136 \\ -0.4311 \\ 0 \\ 0 \end{bmatrix} \end{aligned}$$

Appendix 2: State space matrices of Model 2

The dynamics of the second model are defined by Ochi (2017):

$$\begin{aligned} A^{Lon}= & {} \begin{bmatrix} -0.92236&\quad2.8006&\quad-10.188&\quad-7.5967*10^{-3}&\quad14.006&\quad-9.2715\\ -0.66112&\quad-1.7587&\quad7.6085&\quad-4.0687*10^{-3}&\quad5.1213&\quad-3.1743\\ 0.18844&\quad0.86754&\quad-5.1420&\quad-8.9318*10^{-5}&\quad0.14777&\quad0\\ 0.55841&\quad-2.2953&\quad10.204&\quad-1.4080*10^{-2}&\quad-17.394&\quad0\\ 0&\quad0&\quad0&\quad1&\quad0&\quad0\\ 0&\quad0&\quad1&\quad0&\quad0&\quad0 \end{bmatrix} \\ B^{Lon}= & {} \begin{bmatrix} -0.045395&\quad0.019802 \\ -0.015056&\quad0.0065679\\ -0.053977&\quad0.023546\\ 0.10987&\quad-0.047928\\ 0&\quad0\\ 0&\quad0 \end{bmatrix} \\ A^{Lat}= &{} \begin{bmatrix} -0.72750&\quad4.9509&\quad-9.6760&\quad4.9579*10^{-4}&\quad0&\quad-24.023&\quad-8.0919&\quad9.2715\\ -1.7156&\quad-26.591&\quad9.2368&\quad-4.1874*10^{-8}&\quad0&\quad-0.0035684&\quad-0.037915&\quad0\\ 0.089999&\quad-0.10056&\quad-0.49893&\quad-2.4473*10^{-6}&\quad0&\quad0.11877&\quad0.041214&\quad0\\ 0.98924&\quad25.065&\quad-8.5206&\quad-1.1526*10^{-2}&\quad0&\quad-27.819&\quad-9.3364&\quad0\\ -0.67411&\quad-9.0439&\quad3.6429&\quad2.2839*10^{-6}&\quad0&\quad0.19150&\quad-0.071821&\quad0\\ 0&\quad0&\quad0&\quad1&\quad-0.36595&\quad0&\quad0&\quad0\\ 0&\quad0&\quad0&\quad0&\quad1.0649&\quad0&\quad0&\quad0\\ 0&\quad1&\quad0.34238&\quad0&\quad0&\quad0&\quad0&\quad0 \end{bmatrix} \\ B^{Lat}= & {} \begin{bmatrix} -0.077600&\quad3.0442*10^{-6}&\quad0.015364\\ 0.017368&\quad-1.0473*10^{-3}&\quad-0.0029820\\ -0.00018834&\quad2.2290*10^{-3}&\quad-0.00093505\\ -0.10596&\quad1.4671*10^{-4}&\quad0.020917\\ 0.065159&\quad-2.9239*10^{-2}&\quad-0.00014671\\ 0&\quad0&\quad0\\ 0&\quad0&\quad0\\ 0&\quad0&\quad0 \end{bmatrix} \end{aligned}$$

Appendix 3: Flexible wing aircraft parameters

Tables 1, 2 and 3, list the experimental data of the Hiway Demon Hang Glider (Cook and Spottiswoode 2005; Ochi 2017). The notation $*$ describes the trim condition, $V_c$ is the air speed, $\alpha _w$ is the angle of attack, $\gamma _w$ is the flight path angle, $\alpha _p$ is the angle of attack of the pilot, and $J_{XZp}$ is the product of inertia.

Table 1 The Hang Glider (Hiway Demon) configuration data

Full size table

Table 2 Moments of inertia

Full size table

Table 3 Trim condition

Full size table

Appendix 4: Online adaptive learning solution (sample code)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abouheaf, M., Gueaieb, W. Online model-free controller for flexible wing aircraft: a policy iteration-based reinforcement learning approach. Int J Intell Robot Appl 4, 21–43 (2020). https://doi.org/10.1007/s41315-019-00105-3

Download citation

Received: 28 May 2019
Accepted: 09 October 2019
Published: 18 October 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s41315-019-00105-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online model-free controller for flexible wing aircraft: a policy iteration-based reinforcement learning approach

Abstract

Access this article

Similar content being viewed by others