First-Order Methods for Wasserstein Distributionally Robust MDP,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

First-Order Methods for Wasserstein Distributionally Robust MDP
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-14 , DOI: arxiv-2009.06790
Julien Grand-Clement, Christian Kroer

Markov Decision Processes (MDPs) are known to be sensitive to parameter specification. Distributionally robust MDPs alleviate this issue by allowing for ambiguity sets which give a set of possible distributions over parameter sets. The goal is to find an optimal policy with respect to the worst-case parameter distribution. We propose a first-order methods framework for solving Distributionally robust MDPs, and instantiate it for several types of Wasserstein ambiguity sets. By developing efficient proximal updates, our algorithms achieve a convergence rate of $O(NA^{2.5}S^{3.5}\log(S)\log(\epsilon^{-1})\epsilon^{-1.5})$ for the number of kernels $N$ in the support of the nominal distribution, states $S$, and actions $A$ (this rate varies slightly based on the Wasserstein setup). Our dependence on $N$, $A$ and $S$ is significantly better than existing methods; compared to Value Iteration, it is better by a factor of $O(N^{2.5}A S)$. Numerical experiments on random instances and instances inspired from a machine replacement example show that our algorithm is significantly more scalable than state-of-the-art approaches.

中文翻译：

Wasserstein 分布鲁棒 MDP 的一阶方法

众所周知，马尔可夫决策过程 (MDP) 对参数规范很敏感。分布鲁棒的 MDP 通过允许模糊集来缓解这个问题，这些模糊集给出了一组可能的参数集分布。目标是找到关于最坏情况参数分布的最佳策略。我们提出了一个用于解决分布式鲁棒 MDP 的一阶方法框架，并针对几种类型的 Wasserstein 歧义集对其进行实例化。通过开发高效的近端更新，我们的算法实现了 $O(NA^{2.5}S^{3.5}\log(S)\log(\epsilon^{-1})\epsilon^{-1.5}) 的收敛速度$ 表示支持名义分布的内核 $N$ 的数量，状态 $S$ 和动作 $A$（此比率根据 Wasserstein 设置略有不同）。我们对$N$的依赖，$A$ 和 $S$ 明显优于现有方法；与 Value Iteration 相比，它要好 $O(N^{2.5}AS)$ 的一个因子。随机实例和受机器替换实例启发的实例的数值实验表明，我们的算法比最先进的方法更具可扩展性。

更新日期：2020-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>