当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Policy space identification in configurable environments
Machine Learning ( IF 7.5 ) Pub Date : 2021-09-05 , DOI: 10.1007/s10994-021-06033-3
Alberto Maria Metelli 1 , Guglielmo Manneschi 1 , Marcello Restelli 1
Affiliation  

We study the problem of identifying the policy space available to an agent in a learning process, having access to a set of demonstrations generated by the agent playing the optimal policy in the considered space. We introduce an approach based on frequentist statistical testing to identify the set of policy parameters that the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under different assumptions on the policy space, we provide a probabilistic analysis of the simplified one in the case of linear policies belonging to the exponential family. To improve the performance of our identification rules, we make use of the recently introduced framework of the Configurable Markov Decision Processes, exploiting the opportunity of configuring the environment to induce the agent to reveal which parameters it can control. Finally, we provide an empirical evaluation, on both discrete and continuous domains, to prove the effectiveness of our identification rules.



中文翻译:

可配置环境中的策略空间识别

我们研究了在学习过程中识别代理可用的策略空间的问题,可以访问代理在所考虑的空间中执行最优策略生成的一组演示。我们引入了一种基于频率统计测试的方法,以在更大的参数策略空间内识别代理可以控制的策略参数集。在提出了两个识别规则(组合和简化)后,在策略空间的不同假设下适用,我们在属于指数族的线性策略的情况下提供了简化规则的概率分析。为了提高我们的识别规则的性能,我们利用了最近引入的可配置马尔可夫决策过程框架,利用配置环境的机会诱导代理揭示它可以控制的参数。最后,我们提供了离散域和连续域的经验评估,以证明我们的识别规则的有效性。

更新日期:2021-09-06
down
wechat
bug