当前位置: X-MOL 学术J. Math. Industry › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A projected primal-dual gradient optimal control method for deep reinforcement learning
Journal of Mathematics in Industry Pub Date : 2020-04-07 , DOI: 10.1186/s13362-020-00075-3
Simon Gottschalk , Michael Burger , Matthias Gerdts

In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parameters of a distribution. Therefrom, the next control can be sampled. In this setting, the neural network is replaced by an ODE, which is based on a recently discussed interpretation of neural networks. The resulting infinite optimization problem is transformed into an optimization problem similar to the well-known optimal control problems. Afterwards, the necessary optimality conditions are established and from this a new numerical algorithm is derived. The operating principle is shown with two examples. It is applied to a simple example, where a moving point is steered through an obstacle course to a desired end position in a 2D plane. The second example shows the applicability to more complex problems. There, the aim is to control the finger tip of a human arm model with five degrees of freedom and 29 Hill’s muscle models to a desired end position.

中文翻译:

深度强化学习的投影原对偶梯度最优控制方法

在此贡献中,我们从使用神经网络的基于策略的强化学习ansatz开始。潜在的马尔可夫决策过程由代表动态系统的转移概率和通过将当前状态映射到分布参数的神经网络实现的策略组成。由此,可以对下一个对照进行采样。在这种情况下,神经网络被ODE取代,该ODE基于最近讨论的神经网络解释。由此产生的无限优化问题被转换为类似于众所周知的最佳控制问题的优化问题。然后,建立必要的最优性条件,并由此推导新的数值算法。通过两个示例显示了工作原理。它被应用于一个简单的例子,其中将移动点引导通过障碍物路线到达2D平面中的所需最终位置。第二个示例显示了对更复杂问题的适用性。此处的目的是将具有五个自由度和29个Hill肌肉模型的人手臂模型的指尖控制到所需的最终位置。
更新日期:2020-04-07
down
wechat
bug