当前位置: X-MOL 学术arXiv.cs.FL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Synthesis of Discounted-Reward Optimal Policies for Markov Decision Processes Under Linear Temporal Logic Specifications
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2020-11-01 , DOI: arxiv-2011.00632
Krishna C. Kalagarla, Rahul Jain, Pierluigi Nuzzo

We present a method to find an optimal policy with respect to a reward function for a discounted Markov Decision Process under general linear temporal logic (LTL) specifications. Previous work has either focused on maximizing a cumulative reward objective under finite-duration tasks, specified by syntactically co-safe LTL, or maximizing an average reward for persistent (e.g., surveillance) tasks. This paper extends and generalizes these results by introducing a pair of occupancy measures to express the LTL satisfaction objective and the expected discounted reward objective, respectively. These occupancy measures are then connected to a single policy via a novel reduction resulting in a mixed integer linear program whose solution provides an optimal policy. Our formulation can also be extended to include additional constraints with respect to secondary reward functions. We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.

中文翻译:

线性时间逻辑规范下马尔可夫决策过程的折扣奖励优化策略的综合

我们提出了一种在一般线性时间逻辑 (LTL) 规范下针对折扣马尔可夫决策过程寻找奖励函数的最佳策略的方法。以前的工作要么专注于最大化有限持续时间任务下的累积奖励目标,由句法共安全 LTL 指定,要么最大化持续(例如,监视)任务的平均奖励。本文通过引入一对占用率指标来分别表达 LTL 满意度目标和预期折扣奖励目标,从而扩展和概括了这些结果。然后,通过新的减少将这些占用度量连接到单个策略,从而产生混合整数线性程序,其解决方案提供了最佳策略。我们的公式也可以扩展到包括关于次要奖励函数的额外约束。我们说明了我们的方法在不确定性和性能目标下复杂任务的机器人运动规划背景下的有效性。
更新日期:2020-11-03
down
wechat
bug