当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
When Shall I Be Empathetic? The Utility of Empathetic Parameter Estimation in Multi-Agent Interactions
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-11-03 , DOI: arxiv-2011.02047
Yi Chen, Lei Zhang, Tanner Merry, Sunny Amatya, Wenlong Zhang, and Yi Ren

Human-robot interactions (HRI) can be modeled as dynamic or differential games with incomplete information, where each agent holds private reward parameters. Due to the open challenge in finding perfect Bayesian equilibria of such games, existing studies often consider approximated solutions composed of parameter estimation and motion planning steps, in order to decouple the belief and physical dynamics. In parameter estimation, current approaches often assume that the reward parameters of the robot are known by the humans. We argue that by falsely conditioning on this assumption, the robot performs non-empathetic estimation of the humans' parameters, leading to undesirable values even in the simplest interactions. We test this argument by studying a two-vehicle uncontrolled intersection case with short reaction time. Results show that when both agents are unknowingly aggressive (or non-aggressive), empathy leads to more effective parameter estimation and higher reward values, suggesting that empathy is necessary when the true parameters of agents mismatch with their common belief. The proposed estimation and planning algorithms are therefore more robust than the existing approaches, by fully acknowledging the nature of information asymmetry in HRI. Lastly, we introduce value approximation techniques for real-time execution of the proposed algorithms.

中文翻译:

我什么时候才能感同身受?移情参数估计在多智能体交互中的效用

人机交互 (HRI) 可以建模为具有不完整信息的动态或差分游戏,其中每个代理都持有私人奖励参数。由于在寻找此类博弈的完美贝叶斯均衡方面存在公开挑战,现有研究通常考虑由参数估计和运动规划步骤组成的近似解,以便将信念和物理动力学解耦。在参数估计中,当前的方法通常假设机器人的奖励参数是人类已知的。我们认为,通过错误地以这个假设为条件,机器人对人类参数进行非移情估计,即使在最简单的交互中也会导致不良值。我们通过研究反应时间短的两辆车不受控制的交叉路口案例来测试这一论点。结果表明,当两个智能体都在不知不觉中具有攻击性(或非攻击性)时,同理心会导致更有效的参数估计和更高的奖励值,这表明当智能体的真实参数与其共同信念不匹配时,同理心是必要的。因此,通过充分承认 HRI 中信息不对称的性质,所提出的估计和规划算法比现有方法更稳健。最后,我们介绍了用于实时执行所提出算法的值逼近技术。因此,通过充分承认 HRI 中信息不对称的性质,所提出的估计和规划算法比现有方法更稳健。最后,我们介绍了用于实时执行所提出算法的值逼近技术。因此,通过充分承认 HRI 中信息不对称的性质,所提出的估计和规划算法比现有方法更稳健。最后,我们介绍了用于实时执行所提出算法的值逼近技术。
更新日期:2020-11-05
down
wechat
bug