当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hyperspace Neighbor Penetration Approach to Dynamic Programming for Model-Based Reinforcement Learning Problems with Slowly Changing Variables in A Continuous State Space
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05497
Vincent Zha, Ivey Chiu, Alexandre Guilbault, Jaime Tatis

Slowly changing variables in a continuous state space constitute an important category of reinforcement learning and see its application in many domains, such as modeling a climate control system where temperature, humidity, etc. change slowly over time. However, this subject is less addressed in recent studies. Classical methods with certain variants, such as Dynamic Programming with Tile Coding which discretizes the state space, fail to handle slowly changing variables because those methods cannot capture the tiny changes in each transition step, as it is computationally expensive or impossible to establish an extremely granular grid system. In this paper, we introduce a Hyperspace Neighbor Penetration (HNP) approach that solves the problem. HNP captures in each transition step the state's partial "penetration" into its neighboring hyper-tiles in the gridded hyperspace, thus does not require the transition to be inter-tile in order for the change to be captured. Therefore, HNP allows for a very coarse grid system, which makes the computation feasible. HNP assumes near linearity of the transition function in a local space, which is commonly satisfied. In summary, HNP can be orders of magnitude more efficient than classical method in handling slowly changing variables in reinforcement learning. We have made an industrial implementation of NHP with a great success.

中文翻译:

连续状态空间中变量缓慢变化的基于模型的强化学习问题的动态规划的超空间邻居渗透方法

在连续状态空间中缓慢变化的变量构成了强化学习的一个重要类别,并在许多领域看到了它的应用,例如对温度、湿度等随时间缓慢变化的气候控制系统进行建模。然而,这个主题在最近的研究中较少涉及。具有某些变体的经典方法,例如使用 Tile Coding 离散化状态空间的动态编程,无法处理缓慢变化的变量,因为这些方法无法捕获每个转换步骤中的微小变化,因为计算成本高或不可能建立极其细粒度的网格系统。在本文中,我们介绍了一种解决该问题的超空间邻居渗透(HNP)方法。HNP 在每个转换步骤中捕获状态的部分“渗透” 进入网格化超空间中的相邻超瓦片,因此不需要过渡是瓦片间的,以便捕获变化。因此,HNP 允许非常粗的网格系统,这使得计算可行。HNP 假设局部空间中的转换函数接近线性,这通常是满足的。总之,在处理强化学习中缓慢变化的变量方面,HNP 可以比经典方法高效几个数量级。我们已经成功地实现了 NHP 的工业化实施。这是普遍满意的。总之,在处理强化学习中缓慢变化的变量方面,HNP 可以比经典方法高效几个数量级。我们已经成功地实现了 NHP 的工业化实施。这是普遍满意的。总之,在处理强化学习中缓慢变化的变量方面,HNP 可以比经典方法高效几个数量级。我们已经成功地实现了 NHP 的工业化实施。
更新日期:2021-06-11
down
wechat
bug