Article
Version 1
Preserved in Portico This version is not peer-reviewed
Provably Safe Artificial General Intelligence via Interactive Proofs
Version 1
: Received: 20 September 2021 / Approved: 21 September 2021 / Online: 21 September 2021 (11:35:34 CEST)
A peer-reviewed article of this Preprint also exists.
Carlson, K. Provably Safe Artificial General Intelligence via Interactive Proofs. Philosophies 2021, 6, 83. Carlson, K. Provably Safe Artificial General Intelligence via Interactive Proofs. Philosophies 2021, 6, 83.
Abstract
Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn≪AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2-100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn↔AGIn+1 interaction hazards to an acceptably low level.
Keywords
Artificial general intelligence; AGI; AI safety; AI value alignment; AI containment; interactive proof systems; multiple-prover systems
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment