Efficient Strategy Synthesis for MDPs with Resource Constraints,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Strategy Synthesis for MDPs with Resource Constraints
arXiv - CS - Artificial Intelligence Pub Date : 2021-05-05 , DOI: arxiv-2105.02099
František Blahoudek, Petr Novotný, Melkior Ornik, Pranay Thangeda, Ufuk Topcu

We consider qualitative strategy synthesis for the formalism called consumption Markov decision processes. This formalism can model dynamics of an agents that operates under resource constraints in a stochastic environment. The presented algorithms work in time polynomial with respect to the representation of the model and they synthesize strategies ensuring that a given set of goal states will be reached (once or infinitely many times) with probability 1 without resource exhaustion. In particular, when the amount of resource becomes too low to safely continue in the mission, the strategy changes course of the agent towards one of a designated set of reload states where the agent replenishes the resource to full capacity; with sufficient amount of resource, the agent attempts to fulfill the mission again. We also present two heuristics that attempt to reduce expected time that the agent needs to fulfill the given mission, a parameter important in practical planning. The presented algorithms were implemented and numerical examples demonstrate (i) the effectiveness (in terms of computation time) of the planning approach based on consumption Markov decision processes and (ii) the positive impact of the two heuristics on planning in a realistic example.

中文翻译：

具有资源约束的MDP的高效策略综合

我们考虑称为消费马尔可夫决策过程的形式主义的定性策略综合。这种形式主义可以对在随机环境中在资源约束下运行的代理的动力学建模。提出的算法相对于模型的表示以时间多项式工作，并且它们综合了策略，以确保将以概率1达到给定的一组目标状态（一次或无限多次），而不会耗尽资源。特别是，当资源量太少而无法安全继续执行任务时，该策略会将代理程序的进程更改为一组指定的重载状态之一，在此状态下，代理程序将资源补充至最大容量；如果有足够的资源，代理会尝试再次执行任务。我们还提出了两种启发式方法，它们试图减少代理商完成给定任务所需的预期时间，这是在实际计划中很重要的参数。实施了提出的算法，并通过数值示例证明了（i）基于消耗马尔可夫决策过程的计划方法的有效性（就计算时间而言），以及（ii）两种启发式方法对现实示例的积极影响。

更新日期：2021-05-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>