当前位置: X-MOL 学术Form. Asp. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
L∗-based learning of Markov decision processes (extended version)
Formal Aspects of Computing ( IF 1.4 ) Pub Date : 2021-03-31 , DOI: 10.1007/s00165-021-00536-5
Martin Tappler 1, 2 , Bernhard K. Aichernig 1 , Giovanni Bacci 3 , Maria Eichlseder 4 , Kim G. Larsen 3
Affiliation  

Abstract

Automata learning techniques automatically generate systemmodels fromtest observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin’s L-algorithm. The L-algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language L by asking a succinct number of queries to a teacher who knows L.

In this work, we study L -based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.

Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.



中文翻译:

基于 L∗ 的马尔可夫决策过程学习(扩展版)

摘要

自动机学习技术从测试观察中自动生成系统模型。通常,这些技术分为两类:被动和主动。一方面,被动学习假设与正在学习的系统没有交互,并使用预定的训练集,例如系统日志。另一方面,主动学习技术通过主动查询正在学习的系统来收集训练数据,从而引导人们发现有关正在学习的系统的有意义的信息,从而产生有效的学习策略。正则语言主动学习技术的一个显着例子是 Angluin 的大号*-算法。这大号*-算法描述了学习未知正则语言的最小确定性有限自动机的学生的策略大号通过向知道的老师提出一些简洁的问题大号.

在这项工作中,我们研究大号*- 基于确定性马尔可夫决策过程的学习,一类马尔可夫决策过程,其中跟随动作的观察唯一地确定后继状态。为此,我们首先假设一个为学生提供完美信息的老师的理想环境。然后,我们放宽了这个假设,并提出了一种新颖的学习算法,该算法通过测试对系统的执行轨迹进行采样来收集信息。

在我们的基于采样的算法的实现上进行的实验表明,我们的方法比使用相同数量的测试观察的最先进的被动学习技术实现了更好的准确性。与假设预定义状态数量的现有学习算法相比,我们的算法学习包括状态空间在内的完整模型结构。

更新日期:2021-03-31
down
wechat
bug