当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian Approach to Policy Recognition and State Representation Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2017-06-01 , DOI: 10.1109/tpami.2017.2711024
Adrian Sosic , Abdelhak M. Zoubir , Heinz Koeppl

Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used, e.g., for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g., they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.

中文翻译:


政策识别和状态表示学习的贝叶斯方法



从演示中学习(LfD)是根据专家提供的演示构建任务行为模型的过程。这些模型可用于通过将专家演示推广到以前未遇到的情况来进行系统控制。然而,大多数 LfD 方法对专家行为做出强有力的假设,例如,它们假设存在确定性最优地面实况策略或需要直接监视专家的控制,这限制了它们作为通用系统识别框架的一部分的实际使用。在这项工作中,我们在更一般的环境中考虑 LfD 问题,在该环境中我们允许任意随机专家策略,而不推理演示的最优性。遵循贝叶斯方法,我们对可能的专家控制器的完整后验分布进行建模,以解释所提供的演示数据。此外,我们表明我们的方法可以应用于非参数环境中,以推断专家使用的状态表示的复杂性,并学习系统状态空间的适合任务的分区。
更新日期:2017-06-01
down
wechat
bug