Simple Strategies in Multi-Objective MDPs (Technical Report),arXiv - CS - Logic in Computer Science

当前位置： X-MOL 学术 › arXiv.cs.LO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Simple Strategies in Multi-Objective MDPs (Technical Report)
arXiv - CS - Logic in Computer Science Pub Date : 2019-10-24 , DOI: arxiv-1910.11024
Florent Delgrange, Joost-Pieter Katoen, Tim Quatmann, Mickael Randour

We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining the Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case can be reduced to the stationary one by a product construction. Experimental results using \Storm and Gurobi show the feasibility of our algorithms.

中文翻译：

多目标 MDP 中的简单策略（技术报告）

我们考虑在马尔可夫决策过程 (MDP) 上一次验证多个预期奖励目标。这可以通过获得帕累托前沿在多个目标之间进行权衡分析。我们专注于易于使用和实施的策略。也就是说，纯策略（无随机化）并且具有有限内存。我们表明，即使对于两个目标，检查一个点是否可以通过纯平稳策略实现是 NP 完全的，并且我们提供了 MILP 编码来解决相应的问题。可以通过产品构造将有界内存情况简化为固定情况。使用 \Storm 和 Gurobi 的实验结果表明了我们算法的可行性。

更新日期：2020-02-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>