Skip to main content
Log in

Online model-free controller for flexible wing aircraft: a policy iteration-based reinforcement learning approach

  • Regular Paper
  • Published:
International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript

Abstract

The aerodynamic model of flexible wing aircraft is highly nonlinear with continuously time-varying dynamics under kinematic constraints. The nonlinearities stem from the aerodynamic forces and continuous deformations in the flexible wing. In spite of the various experimental attempts and theoretical setups that were made to model these dynamics, an accurate formulation was not achieved. The control paradigms of the aircraft are concerned with the electro-mechanical coupling between the pilot and the wing. It is challenging to design a flight controller for such aircraft while complying with these constraints. In this paper, innovative machine learning technique is employed to design a robust online model-free control scheme for flexible wing aircraft. The controller maintains internal asymptotic stability for the aircraft in real-time using selected set of measurements or states in uncertain dynamical environment. It intelligently incorporates the varying dynamics, geometric parameters, and physical constraints of the aircraft into optimal control strategies. The adaptive learning structure employs a policy iteration approach, taking advantage of Bellman optimality principles, to converge to an optimal control solution for the problem. Artificial neural networks are adopted to implement the adaptive learning algorithm in real-time without prior knowledge of the aerodynamic model of the aircraft. The control scheme is generalized and shown to function effectively for different pilot/wing control mechanisms. It also demonstrated its ability to overcome the undesired stability problems caused by coupling the pilot’s dynamics with the flexible wing’s frame of motion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abouheaf, M., Gueaieb, W.: Model-free value iteration solution for dynamic graphical games. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6 (2018)

  • Abouheaf, M., Gueaieb, W.: Multi-agent reinforcement learning approach based on reduced value function approximations. In: IEEE International Symposium on Robotics and Intelligent Sensors (IRIS), pp. 111–116 (2017)

  • Abouheaf, M., Gueaieb, W.: Reinforcement learning solution with costate approximation for a flexible wing aircraft. In: 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), pp. 1–6 (2018)

  • Abouheaf, M., Lewis, F.: Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)

  • Abouheaf, M.I., Mahmoud, M.S.: Online policy iteration solution for dynamic graphical games. In: 2016 13th International Multi-Conference on Systems, Signals Devices (SSD), pp. 787–797 (2016)

  • Abouheaf, M., Lewis, F.: Dynamic Graphical Games: Online Adaptive Learning Solutions Using Approximate Dynamic Programming, vol. 1, pp. 1–48. World Scientific, Singapore (2014)

    MATH  Google Scholar 

  • Abouheaf, M.I., Mahmoud, M.S.: Chapter 5—online adaptive learning control schemes for microgrids. In: Mahmoud, M.S. (ed.) Microgrid, pp. 137–171. Butterworth-Heinemann, New York (2017)

    Chapter  Google Scholar 

  • Abouheaf, M.I., Mahmoud, M.S.: Policy iteration and coupled riccati solutions for dynamic graphical games. Int. J. Digit. Signals Smart Syst. 1(2), 143 (2017)

    Google Scholar 

  • Abouheaf, Mohammed, Mahmoud, Magdi: Policy iteration and coupled riccati solutions for dynamic graphical games. Int. J. Digit. Signals Smart Syst. 1(2), 143–162 (2017)

    Google Scholar 

  • Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(4), 943–949 (2008)

    Article  Google Scholar 

  • Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  • Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. Proc. IEEE Conf. Decis. Control 1, 560–564 (1995)

    Google Scholar 

  • Blake, D.: Modelling the aerodynamics, stability and control of the hang glider. MA thesis, College of Aeronautics, Cranfield Institute of Technology (1991)

  • Cook, M. V., Kilkenny, E. A.: An experimental investigation of the aerodynamics of the hang glider. In Proceedings of an International Conference on Aerodynamics, (1986)

  • Cook, M.V., Spottiswoode, M.: Modelling the flight dynamics of the hang glider. Aeronaut. J. 109(1102), I–XX (2005)

  • Cook, M.V.: The theory of the longitudinal static stability of the hang-glider. Aeronaut. J. 98(978), 292–304 (1994)

    Article  Google Scholar 

  • Cook, M.V.: Flight Dynamics Principles: A Linear Systems Approach to Aircraft Stability and Control. Aerospace Engineering, 3rd edn. Butterworth-Heinemann, Oxford (2013)

    Google Scholar 

  • de Matteis, Guido: Response of hang gliders to control. Aeronaut. J. 94(938), 289–294 (1990)

    Google Scholar 

  • de Matteis, Guido: Dynamics of hang gliders. J. Guid. Control Dyn. 14(6), 1145–1152 (1991)

    Article  Google Scholar 

  • Howard, R.A.: Dynamic Programming and Markov Processes. phdthesis, Department of Electrical Engineering, Massachusetts Institute of Technology (1960)

  • Kilkenny, E. .: An evaluation of a mobile aerodynamic test facility for hang glider wings. Technical Report 8330, College of Aeronautics, Cranfield Institute of Technology (1983)

  • Kilkenny, E.A.: An experimental study of the longitudinal aerodynamic and static stability characteristics of hang gliders. phdthesis, Cranfield University, September (1986)

  • Kilkenny, E. A.: Full scale wind tunnel tests on hang glider pilots. Technical Report 8416, College of Aeronautics, Cranfield Institute of Technology, (1984)

  • Kroo, I.: Aerodynamics, Aeroelasticity and Stability of Hang Gliders. Stanford University, Stanford (1983)

    Google Scholar 

  • Lewis, Frank, Vrabie, Draguna, Syrmos, Vassilis: Optimal Control, 3rd edn. Wiley, New York (2012)

    Book  Google Scholar 

  • Ochi, Y.: Modeling of flight dynamics and pilot’s handling of a hang glider. In: AIAA Modeling and Simulation Technologies Conference, pp. 1758–1776. American Institute of Aeronautics and Astronautics (2017)

  • Ochi, Y.: Modeling of the longitudinal dynamics of a hang glider. In: AIAA Modeling and Simulation Technologies Conference, pp. 1591–1608. American Institute of Aeronautics and Astronautics (2015)

  • Powton, J.: A theoretical study of the non-linear aerodynamic pitching moment characteristics of the hang glider and its influence on stability and control. MA thesis, College of Aeronautics, Cranfield Institute of Technology (1995)

  • Rollins, R.: Study of experimental data to assess the longitudinal stability and control of the hang glider. MA thesis, College of Aeronautics, Cranfield Institute of Technology (2000)

  • Si, J., Barto, A., Powell, W., Wunsch, D.: Handbook of Learning and Approximate Dynamic Programming. The Institute of Electrical and Electronics Engineers., Inc., Piscataway (2004)

    Book  Google Scholar 

  • Spottiswoode, M.: A theoretical study of the lateral-directional dynamics, stability and control of the hang glider. MA thesis, College of Aeronautics, Cranfield Institute of Technology (2001)

  • Sutton, Richard S., Barto, Andrew G.: Reinforcement Learning: An Introduction, 1st edn. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  • Vamvoudakis, Kyriakos G., Lewis, Frank L., Hudas, Greg R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)

    Article  MathSciNet  Google Scholar 

  • Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)

    Article  MathSciNet  Google Scholar 

  • Webros, P.J.: A menu of designs for reinforcement learning over time. In: Miller III, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1990)

    Google Scholar 

  • Webros, P.J.: Neurocontrol and supervised learning: an overview and evaluation. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pp. 65–89. Van Nostrand Reinhold, New York. (1992)

    Google Scholar 

  • Weiss, G. (ed.): Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Abouheaf.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by Ontario Centers of Excellence (OCE).

Appendices

Appendix 1: State space matrices of Model 1

This position control model is derived at a trim speed of \({10.8}\, {\hbox {ms}^{-1}}\) (Cook and Spottiswoode 2005).

$$\begin{aligned} A^{Lon}= & {} \begin{bmatrix} -0.1730&\quad0.6538&\quad0.1388&\quad-9.7222\\ -1.4208&\quad-2.2535&\quad10.7370&\quad1.3093\\ 0.2685&\quad-0.4402&\quad-1.4113&\quad0\\ 0&\quad0&\quad1&\quad0 \\ \end{bmatrix} , B^{Lon}= \begin{bmatrix} 0\\0\\7.46\\0 \end{bmatrix} \\ A^{Lat}= & {} \begin{bmatrix} -0.2195&\quad-0.1580&\quad-10.798&\quad9.722&\quad-1.3098\\ -1.4670&\quad-21.318&\quad7.5163&\quad0&\quad0\\ 0.2906&\quad3.7362&\quad-2.1119&\quad0&\quad0\\ 0&\quad1&\quad0&\quad0&\quad0\\ 0&\quad0&\quad1&\quad0&\quad0 \end{bmatrix} , B^{Lat}= \begin{bmatrix} 0 \\ 3.6136 \\ -0.4311 \\ 0 \\ 0 \end{bmatrix} \end{aligned}$$

Appendix 2: State space matrices of Model 2

The dynamics of the second model are defined by Ochi (2017):

$$\begin{aligned} A^{Lon}= & {} \begin{bmatrix} -0.92236&\quad2.8006&\quad-10.188&\quad-7.5967*10^{-3}&\quad14.006&\quad-9.2715\\ -0.66112&\quad-1.7587&\quad7.6085&\quad-4.0687*10^{-3}&\quad5.1213&\quad-3.1743\\ 0.18844&\quad0.86754&\quad-5.1420&\quad-8.9318*10^{-5}&\quad0.14777&\quad0\\ 0.55841&\quad-2.2953&\quad10.204&\quad-1.4080*10^{-2}&\quad-17.394&\quad0\\ 0&\quad0&\quad0&\quad1&\quad0&\quad0\\ 0&\quad0&\quad1&\quad0&\quad0&\quad0 \end{bmatrix} \\ B^{Lon}= & {} \begin{bmatrix} -0.045395&\quad0.019802 \\ -0.015056&\quad0.0065679\\ -0.053977&\quad0.023546\\ 0.10987&\quad-0.047928\\ 0&\quad0\\ 0&\quad0 \end{bmatrix} \\ A^{Lat}= &{} \begin{bmatrix} -0.72750&\quad4.9509&\quad-9.6760&\quad4.9579*10^{-4}&\quad0&\quad-24.023&\quad-8.0919&\quad9.2715\\ -1.7156&\quad-26.591&\quad9.2368&\quad-4.1874*10^{-8}&\quad0&\quad-0.0035684&\quad-0.037915&\quad0\\ 0.089999&\quad-0.10056&\quad-0.49893&\quad-2.4473*10^{-6}&\quad0&\quad0.11877&\quad0.041214&\quad0\\ 0.98924&\quad25.065&\quad-8.5206&\quad-1.1526*10^{-2}&\quad0&\quad-27.819&\quad-9.3364&\quad0\\ -0.67411&\quad-9.0439&\quad3.6429&\quad2.2839*10^{-6}&\quad0&\quad0.19150&\quad-0.071821&\quad0\\ 0&\quad0&\quad0&\quad1&\quad-0.36595&\quad0&\quad0&\quad0\\ 0&\quad0&\quad0&\quad0&\quad1.0649&\quad0&\quad0&\quad0\\ 0&\quad1&\quad0.34238&\quad0&\quad0&\quad0&\quad0&\quad0 \end{bmatrix} \\ B^{Lat}= & {} \begin{bmatrix} -0.077600&\quad3.0442*10^{-6}&\quad0.015364\\ 0.017368&\quad-1.0473*10^{-3}&\quad-0.0029820\\ -0.00018834&\quad2.2290*10^{-3}&\quad-0.00093505\\ -0.10596&\quad1.4671*10^{-4}&\quad0.020917\\ 0.065159&\quad-2.9239*10^{-2}&\quad-0.00014671\\ 0&\quad0&\quad0\\ 0&\quad0&\quad0\\ 0&\quad0&\quad0 \end{bmatrix} \end{aligned}$$

Appendix 3: Flexible wing aircraft parameters

Tables 1, 2 and 3, list the experimental data of the Hiway Demon Hang Glider (Cook and Spottiswoode 2005; Ochi 2017). The notation \(*\) describes the trim condition, \(V_c\) is the air speed, \(\alpha _w\) is the angle of attack, \(\gamma _w\) is the flight path angle, \(\alpha _p\) is the angle of attack of the pilot, and \(J_{XZp}\) is the product of inertia.

Table 1 The Hang Glider (Hiway Demon) configuration data
Table 2 Moments of inertia
Table 3 Trim condition

Appendix 4: Online adaptive learning solution (sample code)

figure e
figure f

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abouheaf, M., Gueaieb, W. Online model-free controller for flexible wing aircraft: a policy iteration-based reinforcement learning approach. Int J Intell Robot Appl 4, 21–43 (2020). https://doi.org/10.1007/s41315-019-00105-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41315-019-00105-3

Keywords

Navigation