Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage

Ugurlu, Halil Ibrahim; Kalkan, Sinan; Saranli, Afsar

doi:10.1007/s10846-021-01412-3

Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage

Regular Paper
Published: 08 July 2021

Volume 102, article number 77, (2021)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

284 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we study the conventional and learning-based control approaches for multi-rotor platforms, with and without the presence of an actuated “tail” appendage. A comprehensive experimental comparison between the proven control-theoretic approaches and more recent learning-based ones is one of the contributions. Furthermore, an actuated tail appendage is considered as a deviation from the typical multi-rotor morphology, complicating the control problem but promising some useful applications. Our study also explores, as another contribution, the impact of such an actuated tail on the overall position control for both the conventional as well as learning-based controllers. For the conventional control part, we used a multi-loop architecture where the inner loop regulates the attitude while the outer loop controls the position of the platform. For the learning controller, a multi-layer neural network architecture is used to learn a nonlinear state-feedback controller. To improve the learning and generalization performance of this controller, we adopted a curricular learning approach which gradually increases the difficulty of training samples. For the experiments, a planar bi-rotor platform is modeled in a 2D simulation environment. The planar model avoids mathematical complications while preserving the main attributes of the problem making the results more useful. We observe that both types of controllers achieve reasonable control performance and can solve the position control task. However, neither one shows a clear advantage over the other. The learning-based controller is not intuitive and the system suffers from long training times. The architecture of the multi-loop controller is handcrafted (not required for the learning-based controller) but provides a guaranteed stable behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Nonlinear control strategies for 3-DOF control moment gyroscope using deep reinforcement learning

Article 23 January 2024

Yan Xiong, Siyuan Liu, … Liang Guo

Reinforcement Learning Applied to Position Control of a Robotic Leg: An Overview

Research on Motion Stability Control Algorithm of Multi-axis Industrial Robot Based on Deep Reinforcement Learning

References

Acosta, J. A.́, Sanchez, M., Ollero, A.: Robust control of underactuated aerial manipulators via Ida-Pbc. In: 53Rd IEEE Conference on Decision and Control, pp. 673–678. IEEE (2014)
Ankaralı, M.M., Saranlı, U., Saranlı, A.: Control of underactuated planar hexapedal pronking through a dynamically embedded slip monopod. In: 2010 IEEE International Conference on Robotics and Automation, pp. 4721–4727. IEEE (2010)
Bansal, S., Akametalu, A.K., Jiang, F.J., Laine, F., Tomlin, C.J.: Learning quadrotor dynamics using neural network for flight control. In: 2016 IEEE 55Th Conference on Decision and Control (CDC), pp. 4653–4660. IEEE (2016)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)
Beul, M., Behnke, S.: Analytical time-optimal trajectory generation and control for multirotors. In: 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 87–96. IEEE (2016)
Bitcraze, A.: Crazy ie 2.0 (2016)
Bou-Ammar, H., Voos, H., Ertel, W.: Controller design for quadrotor Uavs using reinforcement learning. In: IEEE International Conference on Control Applications (CCA), pp. 2130–2135. IEEE (2010)
Bouabdallah, S., Siegwart, R.: Full control of a quadrotor. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 153–158. Ieee (2007)
Briggs, R., Lee, J., Haberland, M., Kim, S.: Tails in biomimetic design: Analysis, simulation, and experiment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1473–1480. IEEE (2012)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Catto, E.: Box2d: A 2d physics engine for games (2011)
Daftry, S., Bagnell, J.A., Hebert, M.: Learning transferable policies for monocular reactive Mav control. In: International Symposium on Experimental Robotics, pp. 3–11. Springer (2016)
De Simone, M.C., Russo, S., Ruggiero, A.: Influence of aerodynamics on quadrotor dynamics
Demir, A., Ankaralı, M.M., Dyhr, J.P., Morgansen, K.A., Daniel, T.L., Cowan, N.J.: Inertial redirection of thrust forces for flight stabilization. In: Adaptive Mobile Robotics, pp. 239–246. World Scientific (2012)
Eberhart, R., Kennedy, J.: Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, vol. 4, pp. 1942–1948. Citeseer (1995)
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017)
Gao, F., Han, L.: Implementing the nelder-mead simplex algorithm with adaptive parameters. Comput. Optim. Appl. 51(1), 259–277 (2012)
Article MathSciNet Google Scholar
Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1311–1320. JMLR. org (2017)
Held, D., Geng, X., Florensa, C., Abbeel, P.: Automatic goal generation for reinforcement learning agents. arXiv preprint arXiv:1705.06366 (2017)
Hill, A., Raffin, A., Ernestus, M., Gleave, A., Traore, R., Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: Stable baselines https://github.com/hill-a/stable-baselines (2018)
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots, vol. 4 (2019)
Hwangbo, J., Sa, I., Siegwart, R., Hutter, M.: Control of a quadrotor with reinforcement learning. IEEE Robot. Autom. Lett. 2(4), 2096–2103 (2017)
Article Google Scholar
Imanberdiyev, N., Kayacan, E.: A fast learning control strategy for unmanned aerial manipulators. J. Intell. Robot. Syst. 94(3-4), 805–824 (2019)
Article Google Scholar
Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)
Kawaguchi, K., Kaelbling, L.P.: Elimination of all bad local minima in deep learning. arXiv preprint arXiv:1901.00279 (2019)
Koch, W., Mancuso, R., West, R., Bestavros, A.: Reinforcement learning for uav attitude control. ACM Trans Cyber-Phys Syst 3(2), 1–21 (2019)
Article Google Scholar
Lee, T., Leok, M., McClamroch, N.H.: Geometric tracking control of a quadrotor Uav on Se (3). In: 49Th IEEE Conference on Decision and Control (CDC), pp. 5420–5425. IEEE (2010)
Lepine, M.D.: Design of a personal aerial vehicle (2017)
Li, Q., Qian, J., Zhu, Z., Bao, X., Helwa, M.K., Schoellig, A.P.: Deep neural networks for improved, impromptu trajectory tracking of quadrotors. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5183–5189. IEEE (2017)
Libby, T., Moore, T.Y., Chang-Siu, E., Li, D., Cohen, D.J., Jusufi, A., Full, R.J.: Tail-assisted pitch control in lizards, robots and dinosaurs. Nature 481(7380), 181–184 (2012)
Article Google Scholar
Matiisen, T., Oliver, A., Cohen, T., Schulman, J.: Teacher-student curriculum learning. arXiv preprint arXiv:1707.00183 (2017)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Molchanov, A., Chen, T., Hönig, W., Preiss, J.A., Ayanian, N., Sukhatme, G.S.: Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors. arXiv preprint arXiv:1903.04628 (2019)
Ren, H., Zhao, Y., Xiao, W., Hu, Z.: A review of uav monitoring in mining areas: Current status and future perspectives. Int. J. Coal Sci. Technol. 6(3), 320–333 (2019)
Article Google Scholar
Ritz, R., Hehn, M., Lupashin, S., D’Andrea, R.: Quadrocopter performance benchmarking using optimal control. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5179–5186. IEEE (2011)
Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., Hebert, M.: Learning monocular reactive Uav control in cluttered natural environments. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1765–1772. IEEE (2013)
Sadeghi, F., Levine, S.: Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)
Tang, S., Kumar, V.: Autonomous flying. Annu. Rev. Control Robot. Auton. Syst. 1, 6.1–6.24 (2018)
Article Google Scholar
Tsouros, D.C., Bibi, S., Sarigiannidis, P.G.: A review on uav-based applications for precision agriculture. Information 10(11), 349 (2019)
Article Google Scholar
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with Mpc-Guided policy search. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535. IEEE (2016)
Zhao, J., Zhao, T., Xi, N., Cintrón, F.J., Mutka, M.W., Xiao, L.: Controlling aerial maneuvering of a miniature jumping robot using its tail. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3802–3807. IEEE (2013)

Download references

Funding

This research is partially supported by METU-BAP with project no GAP-312-2018-2705. H. I. Ugurlu is supported under a scholarship grant from Scientific and Technological Research Council of Turkey (TÜBİTAK). S. Kalkan is supported by Scientific and Technological Research Council of Turkey (TÜBİTAK) through BIDEB 2219 International Postdoctoral Research Scholarship Program and the BAGEP Award of the Science Academy.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
Halil Ibrahim Ugurlu
Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
Sinan Kalkan
Visiting Researcher at the Department of Computer Science, Technology, University of Cambridge, Cambridge, UK
Sinan Kalkan
Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey
Afsar Saranli

Authors

Halil Ibrahim Ugurlu
View author publications
You can also search for this author in PubMed Google Scholar
Sinan Kalkan
View author publications
You can also search for this author in PubMed Google Scholar
Afsar Saranli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Halil Ibrahim Ugurlu. The first draft of the manuscript was written by Halil Ibrahim Ugurlu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Halil Ibrahim Ugurlu.

Additional information

Availability of data and materials

The code and data will be available at https://github.com/halil93ibrahim/gym-bi-rotor.git.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ugurlu, H.I., Kalkan, S. & Saranli, A. Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage. J Intell Robot Syst 102, 77 (2021). https://doi.org/10.1007/s10846-021-01412-3

Download citation

Received: 15 September 2020
Accepted: 03 May 2021
Published: 08 July 2021
DOI: https://doi.org/10.1007/s10846-021-01412-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage

Abstract

Access this article

Similar content being viewed by others

Nonlinear control strategies for 3-DOF control moment gyroscope using deep reinforcement learning

Reinforcement Learning Applied to Position Control of a Robotic Leg: An Overview

Research on Motion Stability Control Algorithm of Multi-axis Industrial Robot Based on Deep Reinforcement Learning

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Availability of data and materials

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage

Abstract

Access this article

Similar content being viewed by others

Nonlinear control strategies for 3-DOF control moment gyroscope using deep reinforcement learning

Reinforcement Learning Applied to Position Control of a Robotic Leg: An Overview

Research on Motion Stability Control Algorithm of Multi-axis Industrial Robot Based on Deep Reinforcement Learning

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Availability of data and materials

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation