Skip to main content
Log in

Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage

  • Regular Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

In this paper, we study the conventional and learning-based control approaches for multi-rotor platforms, with and without the presence of an actuated “tail” appendage. A comprehensive experimental comparison between the proven control-theoretic approaches and more recent learning-based ones is one of the contributions. Furthermore, an actuated tail appendage is considered as a deviation from the typical multi-rotor morphology, complicating the control problem but promising some useful applications. Our study also explores, as another contribution, the impact of such an actuated tail on the overall position control for both the conventional as well as learning-based controllers. For the conventional control part, we used a multi-loop architecture where the inner loop regulates the attitude while the outer loop controls the position of the platform. For the learning controller, a multi-layer neural network architecture is used to learn a nonlinear state-feedback controller. To improve the learning and generalization performance of this controller, we adopted a curricular learning approach which gradually increases the difficulty of training samples. For the experiments, a planar bi-rotor platform is modeled in a 2D simulation environment. The planar model avoids mathematical complications while preserving the main attributes of the problem making the results more useful. We observe that both types of controllers achieve reasonable control performance and can solve the position control task. However, neither one shows a clear advantage over the other. The learning-based controller is not intuitive and the system suffers from long training times. The architecture of the multi-loop controller is handcrafted (not required for the learning-based controller) but provides a guaranteed stable behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acosta, J. A.́, Sanchez, M., Ollero, A.: Robust control of underactuated aerial manipulators via Ida-Pbc. In: 53Rd IEEE Conference on Decision and Control, pp. 673–678. IEEE (2014)

  2. Ankaralı, M.M., Saranlı, U., Saranlı, A.: Control of underactuated planar hexapedal pronking through a dynamically embedded slip monopod. In: 2010 IEEE International Conference on Robotics and Automation, pp. 4721–4727. IEEE (2010)

  3. Bansal, S., Akametalu, A.K., Jiang, F.J., Laine, F., Tomlin, C.J.: Learning quadrotor dynamics using neural network for flight control. In: 2016 IEEE 55Th Conference on Decision and Control (CDC), pp. 4653–4660. IEEE (2016)

  4. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)

  5. Beul, M., Behnke, S.: Analytical time-optimal trajectory generation and control for multirotors. In: 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 87–96. IEEE (2016)

  6. Bitcraze, A.: Crazy ie 2.0 (2016)

  7. Bou-Ammar, H., Voos, H., Ertel, W.: Controller design for quadrotor Uavs using reinforcement learning. In: IEEE International Conference on Control Applications (CCA), pp. 2130–2135. IEEE (2010)

  8. Bouabdallah, S., Siegwart, R.: Full control of a quadrotor. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 153–158. Ieee (2007)

  9. Briggs, R., Lee, J., Haberland, M., Kim, S.: Tails in biomimetic design: Analysis, simulation, and experiment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1473–1480. IEEE (2012)

  10. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  11. Catto, E.: Box2d: A 2d physics engine for games (2011)

  12. Daftry, S., Bagnell, J.A., Hebert, M.: Learning transferable policies for monocular reactive Mav control. In: International Symposium on Experimental Robotics, pp. 3–11. Springer (2016)

  13. De Simone, M.C., Russo, S., Ruggiero, A.: Influence of aerodynamics on quadrotor dynamics

  14. Demir, A., Ankaralı, M.M., Dyhr, J.P., Morgansen, K.A., Daniel, T.L., Cowan, N.J.: Inertial redirection of thrust forces for flight stabilization. In: Adaptive Mobile Robotics, pp. 239–246. World Scientific (2012)

  15. Eberhart, R., Kennedy, J.: Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, vol. 4, pp. 1942–1948. Citeseer (1995)

  16. Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300 (2017)

  17. Gao, F., Han, L.: Implementing the nelder-mead simplex algorithm with adaptive parameters. Comput. Optim. Appl. 51(1), 259–277 (2012)

    Article  MathSciNet  Google Scholar 

  18. Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1311–1320. JMLR. org (2017)

  19. Held, D., Geng, X., Florensa, C., Abbeel, P.: Automatic goal generation for reinforcement learning agents. arXiv preprint arXiv:1705.06366 (2017)

  20. Hill, A., Raffin, A., Ernestus, M., Gleave, A., Traore, R., Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: Stable baselines https://github.com/hill-a/stable-baselines (2018)

  21. Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots, vol. 4 (2019)

  22. Hwangbo, J., Sa, I., Siegwart, R., Hutter, M.: Control of a quadrotor with reinforcement learning. IEEE Robot. Autom. Lett. 2(4), 2096–2103 (2017)

    Article  Google Scholar 

  23. Imanberdiyev, N., Kayacan, E.: A fast learning control strategy for unmanned aerial manipulators. J. Intell. Robot. Syst. 94(3-4), 805–824 (2019)

    Article  Google Scholar 

  24. Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)

  25. Kawaguchi, K., Kaelbling, L.P.: Elimination of all bad local minima in deep learning. arXiv preprint arXiv:1901.00279 (2019)

  26. Koch, W., Mancuso, R., West, R., Bestavros, A.: Reinforcement learning for uav attitude control. ACM Trans Cyber-Phys Syst 3(2), 1–21 (2019)

    Article  Google Scholar 

  27. Lee, T., Leok, M., McClamroch, N.H.: Geometric tracking control of a quadrotor Uav on Se (3). In: 49Th IEEE Conference on Decision and Control (CDC), pp. 5420–5425. IEEE (2010)

  28. Lepine, M.D.: Design of a personal aerial vehicle (2017)

  29. Li, Q., Qian, J., Zhu, Z., Bao, X., Helwa, M.K., Schoellig, A.P.: Deep neural networks for improved, impromptu trajectory tracking of quadrotors. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5183–5189. IEEE (2017)

  30. Libby, T., Moore, T.Y., Chang-Siu, E., Li, D., Cohen, D.J., Jusufi, A., Full, R.J.: Tail-assisted pitch control in lizards, robots and dinosaurs. Nature 481(7380), 181–184 (2012)

    Article  Google Scholar 

  31. Matiisen, T., Oliver, A., Cohen, T., Schulman, J.: Teacher-student curriculum learning. arXiv preprint arXiv:1707.00183 (2017)

  32. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  33. Molchanov, A., Chen, T., Hönig, W., Preiss, J.A., Ayanian, N., Sukhatme, G.S.: Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors. arXiv preprint arXiv:1903.04628 (2019)

  34. Ren, H., Zhao, Y., Xiao, W., Hu, Z.: A review of uav monitoring in mining areas: Current status and future perspectives. Int. J. Coal Sci. Technol. 6(3), 320–333 (2019)

    Article  Google Scholar 

  35. Ritz, R., Hehn, M., Lupashin, S., D’Andrea, R.: Quadrocopter performance benchmarking using optimal control. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5179–5186. IEEE (2011)

  36. Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., Hebert, M.: Learning monocular reactive Uav control in cluttered natural environments. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1765–1772. IEEE (2013)

  37. Sadeghi, F., Levine, S.: Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201 (2016)

  38. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  39. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)

  40. Tang, S., Kumar, V.: Autonomous flying. Annu. Rev. Control Robot. Auton. Syst. 1, 6.1–6.24 (2018)

    Article  Google Scholar 

  41. Tsouros, D.C., Bibi, S., Sarigiannidis, P.G.: A review on uav-based applications for precision agriculture. Information 10(11), 349 (2019)

    Article  Google Scholar 

  42. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with Mpc-Guided policy search. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535. IEEE (2016)

  43. Zhao, J., Zhao, T., Xi, N., Cintrón, F.J., Mutka, M.W., Xiao, L.: Controlling aerial maneuvering of a miniature jumping robot using its tail. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3802–3807. IEEE (2013)

Download references

Funding

This research is partially supported by METU-BAP with project no GAP-312-2018-2705. H. I. Ugurlu is supported under a scholarship grant from Scientific and Technological Research Council of Turkey (TÜBİTAK). S. Kalkan is supported by Scientific and Technological Research Council of Turkey (TÜBİTAK) through BIDEB 2219 International Postdoctoral Research Scholarship Program and the BAGEP Award of the Science Academy.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Halil Ibrahim Ugurlu. The first draft of the manuscript was written by Halil Ibrahim Ugurlu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Halil Ibrahim Ugurlu.

Additional information

Availability of data and materials

The code and data will be available at https://github.com/halil93ibrahim/gym-bi-rotor.git.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ugurlu, H.I., Kalkan, S. & Saranli, A. Reinforcement Learning versus Conventional Control for Controlling a Planar Bi-rotor Platform with Tail Appendage. J Intell Robot Syst 102, 77 (2021). https://doi.org/10.1007/s10846-021-01412-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-021-01412-3

Keywords

Navigation