当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Confidence in the Estimation of Values and Norms
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-02 , DOI: arxiv-2004.01056 Luciano Cavalcante Siebert, Rijk Mercuur, Virginia Dignum, Jeroen van den Hoven, Catholijn Jonker
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-02 , DOI: arxiv-2004.01056 Luciano Cavalcante Siebert, Rijk Mercuur, Virginia Dignum, Jeroen van den Hoven, Catholijn Jonker
Autonomous agents (AA) will increasingly be interacting with us in our daily
lives. While we want the benefits attached to AAs, it is essential that their
behavior is aligned with our values and norms. Hence, an AA will need to
estimate the values and norms of the humans it interacts with, which is not a
straightforward task when solely observing an agent's behavior. This paper
analyses to what extent an AA is able to estimate the values and norms of a
simulated human agent (SHA) based on its actions in the ultimatum game. We
present two methods to reduce ambiguity in profiling the SHAs: one based on
search space exploration and another based on counterfactual analysis. We found
that both methods are able to increase the confidence in estimating human
values and norms, but differ in their applicability, the latter being more
efficient when the number of interactions with the agent is to be minimized.
These insights are useful to improve the alignment of AAs with human values and
norms.
中文翻译:
提高对价值和规范估计的信心
自治代理 (AA) 将越来越多地在我们的日常生活中与我们互动。虽然我们想要 AA 附带的好处,但他们的行为必须符合我们的价值观和规范。因此,AA 将需要估计与之交互的人类的价值观和规范,这在仅观察代理的行为时并不是一项简单的任务。本文分析了 AA 能够根据其在最后通牒游戏中的行为来估计模拟人类代理 (SHA) 的值和规范的程度。我们提出了两种方法来减少分析 SHA 时的歧义:一种基于搜索空间探索,另一种基于反事实分析。我们发现这两种方法都能够增加估计人类价值观和规范的信心,但它们的适用性不同,当与代理的交互次数最少时,后者更有效。这些见解有助于改善 AA 与人类价值观和规范的一致性。
更新日期:2020-04-03
中文翻译:
提高对价值和规范估计的信心
自治代理 (AA) 将越来越多地在我们的日常生活中与我们互动。虽然我们想要 AA 附带的好处,但他们的行为必须符合我们的价值观和规范。因此,AA 将需要估计与之交互的人类的价值观和规范,这在仅观察代理的行为时并不是一项简单的任务。本文分析了 AA 能够根据其在最后通牒游戏中的行为来估计模拟人类代理 (SHA) 的值和规范的程度。我们提出了两种方法来减少分析 SHA 时的歧义:一种基于搜索空间探索,另一种基于反事实分析。我们发现这两种方法都能够增加估计人类价值观和规范的信心,但它们的适用性不同,当与代理的交互次数最少时,后者更有效。这些见解有助于改善 AA 与人类价值观和规范的一致性。