Abstract
We improve the traditional Q(\( \lambda \))-learning algorithm by adding the obstacle area expansion strategy. The new algorithm is named OAE-Q(\( \lambda \))-learning and applied to the path planning in the complex environment. The contributions of OAE-Q(\( \lambda \))-learning are as follows: (1) It expands the concave obstacle area in the environment to avoid repeated invalid actions when the agent falls into the obstacle area. (2) It removes the extended obstacle area, which reduces the learning state space and accelerates the convergence speed of the algorithm. Extensive experimental results validate the effectiveness and feasibility of OAE-Q(\( \lambda \))-learning on the path planning in complex environments.
Similar content being viewed by others
Abbreviations
- OAE-Q(\( \lambda \))-learning:
-
The algorithm of Q(\( \lambda \))-learning based on obstacle area expansion strategy
References
Galceran E, Cunningham AG, Eustice RM et al (2017) Multipolicy decision-making for autonomous driving via change point-based behavior prediction: theory and experiment. Auton Robots 41(6):1367–1382
Li Y, Li D, Maple C et al (2013) K-order surrounding roadmaps path planner for robot path planning. J Intell Robot Syst 75(3–4):493–516
Chen Y, Cheng L, Wu H et al (2015) Knowledge-driven path planning for mobile robots: relative state tree. Soft Comput 19(3):763–773
Hebecker T, Buchholz R, Ortmeier F (2015) Model-based local path planning for UAVs. J Intell Rob Syst 78(1):127–142
Chen YB, Luo GC, Mei YS et al (2016) UAV path planning using artificial potential field method updated by optimal control theory. Int J Syst Sci 47(6):14
Lee D, Shim DH (2018) A mini-drone development, genetic vector field-based multi-agent path planning, and flight tests. Int J Aeronaut Space Sci 19(3):785–797
Yue L, Chen H (2019) Unmanned vehicle path planning using a novel ant colony algorithm. EURASIP J Wirel Commun Netw 2019(1):136
Zhang B, Mao Z, Liu W et al (2015) Geometric reinforcement learning for path planning of UAVs. J Intell Rob Syst 77(2):391–409
Jiang J, Xin J (2019) Path planning of a mobile robot in a free-space environment using Q -learning. Progr Artif Intell 8(1):133–142
Haghzad Klidbary S, Bagheri Shouraki S, Sheikhpour Kourabbaslou S (2017) Path planning of modular robots on various terrains using Q-learning versus optimization algorithms[J]. Intel Serv Robot 10(2):121–136
Pakizeh E, Pedram MM, Palhang M (2015) Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43(3):487–498
Kim B, Pineau J (2016) Socially adaptive path planning in human environments using inverse reinforcement learning. Int J Social Robot 8(1):51–66
Martinez-Gil F, Lozano M, Fernández F (2014) MARL-Ped: a multi-agent reinforcement learning based framework to simulate pedestrian groups. Simul Model Pract Theory 47:259–275
Ito K, Takeuchi Y (2016) Reinforcement learning in dynamic environment: abstraction of state-action space utilizing properties of the robot body and environment]. Artif Life Robot 21(1):11–17
Yasini S, Naghibi Sitani MB, Kirampor A (2016) Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems. Int J Mach Learn Cybernet 7(6):967–980
Yu T, Wang HZ, Zhou B et al (2015) Multi-agent correlated equilibrium Q(λ) learning for coordinated smart generation control of interconnected power grids. IEEE Trans Power Syst 30(4):1669–1679
Acknowledgements
The research of this paper is supported by the National Natural Science Foundation of China.
Funding
The authors are partially supported by NSFC (61573285).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, H., Ji, Y. & Niu, L. Reinforcement learning path planning algorithm based on obstacle area expansion strategy. Intel Serv Robotics 13, 289–297 (2020). https://doi.org/10.1007/s11370-020-00313-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-020-00313-y