Provably Safe Artificial General Intelligence via Interactive Proofs,Philosophies

当前位置： X-MOL 学术 › Philosophies › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Provably Safe Artificial General Intelligence via Interactive Proofs
Philosophies ( IF 0.6 ) Pub Date : 2021-10-07 , DOI: 10.3390/philosophies6040083
Kristen Carlson

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI¹ rapidly triggers a succession of more powerful AGIⁿ that differ dramatically in their computational capabilities (AGIⁿ << AGIⁿ⁺¹). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2⁻¹⁰⁰). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIⁿ ↔ AGIⁿ⁺¹ interaction hazards to an acceptably low level.

中文翻译：

通过交互式证明可证明安全的通用人工智能

目前缺乏证明通用人工智能 (AGI) 安全性的方法。AGI 的“硬起飞”是可能的，其中第一代AGI ¹迅速触发一系列更强大的AGI ⁿ，它们的计算能力有很大差异（AGI ⁿ << AGI ^{n +1}）。没有证据表明 AGI 将有益于人类或合理的价值调整方法。已经确定了许多通往人类灭绝或征服的途径。我们建议概率证明方法是证明不同强大的自主代理之间的安全性和价值一致性的基本范式。交互式证明系统 (IPS) 描述了数学通信协议，其中验证者查询计算上更强大的证明者并将证明者欺骗验证者的概率降低到任何指定的低概率（例如，2 ^-100）。IPS 程序可以测试包含硬编码道德或价值学习方法的 AGI 行为控制系统。将行为控制系统的公理和转换规则映射到有限的质数集，允许通过 IPS 数论方法验证“安全”行为。需要许多其他表示来证明各种 AGI 属性。多证明者 IPS、程序检查 IPS 和概率可检查证明进一步扩展了范式。总之，IPS 提供了一种将AGI ⁿ ↔ AGI ^{n +1}交互危害降低到可接受的低水平的方法。

更新日期：2021-10-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文