当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems
Machine Learning ( IF 7.5 ) Pub Date : 2021-03-14 , DOI: 10.1007/s10994-020-05939-8
Amarildo Likmeta , Alberto Maria Metelli , Giorgia Ramponi , Andrea Tirinzoni , Matteo Giuliani , Marcello Restelli

In real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.



中文翻译:

在逆向强化学习中与多位专家打交道和不稳定性:在现实生活中的应用

在现实世界的应用程序中,推断专家代理(例如人工操作员)的意图对于理解如何管理可能相互矛盾的目标,帮助解释所证明的行为非常重要。在本文中,我们讨论了如何采用逆向强化学习(IRL)来检索由实际应用中的专家代理隐式优化的奖励函数。将IRL扩展到实际案例已证明具有挑战性,因为通常只有固定的演示数据集可用,并且不允许与环境进行进一步交互。因此,我们诉诸于一类真正的无批处理IRL算法,并提出了三种应用场景:(1)高速公路驾驶场景中的高层决策问题,以及(2)推断用户的偏好。社交网络(Twitter),(三)科莫湖放水管理。对于每种情况,我们都提供形式化,实验和讨论以解释所获得的结果。

更新日期:2021-03-15
down
wechat
bug