Bounded Synthesis and Reinforcement Learning of Supervisors for Stochastic Discrete Event Systems with LTL Specifications,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bounded Synthesis and Reinforcement Learning of Supervisors for Stochastic Discrete Event Systems with LTL Specifications
arXiv - CS - Systems and Control Pub Date : 2021-05-07 , DOI: arxiv-2105.03081
Ryohei Oura, Toshimitsu Ushio, Ami Sakakibara

In this paper, we consider supervisory control of stochastic discrete event systems (SDESs) under linear temporal logic specifications. Applying the bounded synthesis, we reduce the supervisor synthesis into a problem of satisfying a safety condition. First, we consider a synthesis problem of a directed controller using the safety condition. We assign a negative reward to the unsafe states and introduce an expected return with a state-dependent discount factor. We compute a winning region and a directed controller with the maximum satisfaction probability using a dynamic programming method, where the expected return is used as a value function. Next, we construct a permissive supervisor via the optimal value function. We show that the supervisor accomplishes the maximum satisfaction probability and maximizes the reachable set within the winning region. Finally, for an unknown SDES, we propose a two-stage model-free reinforcement learning method for efficient learning of the winning region and the directed controllers with the maximum satisfaction probability. We also demonstrate the effectiveness of the proposed method by simulation.

中文翻译：

具有LTL规范的随机离散事件系统的主管的有界综合和强化学习

在本文中，我们考虑了线性时间逻辑规范下的随机离散事件系统（SDES）的监督控制。应用有界综合，我们将主管综合简化为满足安全条件的问题。首先，我们考虑使用安全条件的定向控制器的综合问题。我们对不安全的状态分配负奖励，并引入具有状态相关折扣系数的预期收益。我们使用动态规划方法来计算获胜区域和具有最大满足概率的定向控制器，其中期望收益用作值函数。接下来，我们通过最优值函数构造一个宽松的监督器。我们表明，主管实现了最大的满意度概率，并在获胜区域内最大化了可到达的范围。最后，针对未知的SDES，我们提出了一种两阶段的无模型强化学习方法，用于以最大的满意概率有效学习获胜区域和定向控制器。我们还通过仿真证明了该方法的有效性。

更新日期：2021-05-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文