当前位置: X-MOL 学术Philosophical Studies › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Existentialist risk and value misalignment
Philosophical Studies Pub Date : 2024-04-25 , DOI: 10.1007/s11098-024-02142-6
Ariela Tubert , Justin Tiehen

We argue that two long-term goals of AI research stand in tension with one another. The first involves creating AI that is safe, where this is understood as solving the problem of value alignment. The second involves creating artificial general intelligence, meaning AI that operates at or beyond human capacity across all or many intellectual domains. Our argument focuses on the human capacity to make what we call “existential choices”, choices that transform who we are as persons, including transforming what we most deeply value or desire. It is a capacity for a kind of value misalignment, in that the values held prior to making such choices can be significantly different from (misaligned with) the values held after making them. Because of the connection to existentialist philosophers who highlight these choices, we call the resulting form of risk “existentialist risk.” It is, roughly, the risk that results from AI taking an active role in authoring its own values rather than passively going along with the values given to it. On our view, human-like intelligence requires a human-like capacity for value misalignment, which is in tension with the possibility of guaranteeing value alignment between AI and humans.



中文翻译:

存在主义风险与价值错位

我们认为人工智能研究的两个长期目标是相互矛盾的。第一个涉及创建安全的人工智能,这被理解为解决价值一致性问题。第二个涉及创建通用人工智能,这意味着人工智能在所有或许多知识领域的运行能力达到或超出了人类的能力。我们的论点集中于人类做出所谓“生存选择”的能力,这些选择改变了我们作为人的身份,包括改变我们最珍视或渴望的东西。这是一种价值观错位的能力,因为在做出此类选择之前持有的价值观可能与做出选择后持有的价值观显着不同(错位)。由于与强调这些选择的存在主义哲学家的联系,我们将由此产生的风险形式称为“存在主义风险”。粗略地说,这是人工智能积极创造自己的价值观而不是被动地遵循赋予它的价值观所带来的风险。我们认为,类人智能需要类人的价值错位能力,这与保证人工智能和人类之间价值一致性的可能性存在紧张关系。

更新日期:2024-04-25
down
wechat
bug