当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Low-rank MDP Approximation via Moment Coupling
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-09-18 , DOI: arxiv-2009.08966
Amy B.Z. Zhang, Itai Gurvich

We propose a novel method---based on local moment matching---to approximate the value function of a Markov Decision Process. The method is grounded in recent work by Braverman et al (2020) that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is not required by our method. Instead we construct a "sister" Markov chain whose two local transition moments are (approximately) identical with those of the focal chain. Because they share these moments, the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. We show how this view can be embedded into the existing aggregation framework of ADP, providing a disciplined mechanism to tune the aggregation and disaggregation probabilities. This embedding into aggregation also reveals how the approximation's accuracy depends on a certain local linearity of the value function. The computational gains arise from the reduction of the effective state space from $N$ to $N^{\frac{1}{2}+\epsilon}$ is as one might intuitively expect from approximations grounded in the central limit theorem.

中文翻译:

通过矩耦合的低秩 MDP 逼近

我们提出了一种新的方法——基于局部矩匹配——来近似马尔可夫决策过程的价值函数。该方法以 Braverman 等人 (2020) 最近的工作为基础,该工作将 Bellman 方程的解与 PDE 的解相关联,其中,本着中心极限定理的精神,转移矩阵被简化为其局部一阶和二阶矩. 我们的方法不需要求解 PDE。相反,我们构建了一个“姐妹”马尔可夫链,它的两个局部转移矩(大约)与焦链的转移矩相同。因为它们共享这些时刻,原始链及其“姐妹”通过 PDE 耦合,这种耦合有助于保证最优性。我们展示了如何将此视图嵌入到 ADP 的现有聚合框架中,提供一种规范的机制来调整聚合和分解概率。这种嵌入到聚合中还揭示了近似的准确性如何取决于值函数的某个局部线性。计算增益来自有效状态空间从 $N$ 减少到 $N^{\frac{1}{2}+\epsilon}$,正如人们可以从以中心极限定理为基础的近似直观地预期的那样。
更新日期:2020-09-21
down
wechat
bug