Optimal control of a two‐wheeled self‐balancing robot by reinforcement learning,International Journal of Robust and Nonlinear Control

当前位置： X-MOL 学术 › Int. J. Robust Nonlinear Control › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimal control of a two‐wheeled self‐balancing robot by reinforcement learning
International Journal of Robust and Nonlinear Control ( IF 3.2 ) Pub Date : 2020-07-27 , DOI: 10.1002/rnc.5058
Linyuan Guo ₁ , Syed Ali Asad Rizvi ₁ , Zongli Lin ₁

Affiliation

This article concerns optimal control of the linear motion, tilt motion, and yaw motion of a two‐wheeled self‐balancing robot (TWSBR). Traditional optimal control methods for the TWSBR usually require a precise model of the system, and other control methods exist that achieve stabilization in the face of parameter uncertainties. In practical applications, it is often desirable to realize optimal control in the absence of the precise knowledge of the system parameters. This article proposes to use a new feedback‐based reinforcement learning method to solve the linear quadratic regulation (LQR) control problem for the TWSBR. The proposed control scheme is completely online and does not require any knowledge of the system parameters. The proposed input decoupling mechanism and pre‐feedback law overcome the commonly encountered computational difficulties in implementing the learning algorithms. Both state feedback optimal control and output feedback optimal control are presented. Numerical simulation shows that the proposed optimal control scheme is capable of stabilizing the system and converging to the LQR solution obtained through solving the algebraic Riccati equation.

中文翻译：

强化学习对两轮自平衡机器人的最优控制

本文涉及对两轮自平衡机器人（TWSBR）的线性运动，倾斜运动和偏航运动的最佳控制。TWSBR的传统最佳控制方法通常需要系统的精确模型，并且存在面对参数不确定性实现稳定的其他控制方法。在实际应用中，通常需要在没有精确了解系统参数的情况下实现最佳控制。本文建议使用一种新的基于反馈的强化学习方法来解决TWSBR的线性二次调节（LQR）控制问题。所提出的控制方案是完全在线的，不需要任何系统参数知识。拟议的输入解耦机制和预反馈法则克服了在实现学习算法时经常遇到的计算难题。给出了状态反馈最优控制和输出反馈最优控制。数值仿真表明，所提出的最优控制方案能够稳定系统，并收敛于通过求解代数Riccati方程获得的LQR解。

更新日期：2020-07-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11