If you've been learning about AI alignment for even a little while, you might've had a thought like "Why can't we just do [this thing that seems like it would solve the problem]?"
Unfortunately, many AI alignment and safety proposals that initially look like solutions to AI risk turn out to have hidden difficulties. It’s surprisingly easy to come up with “solutions” which don’t actually solve the problem. Some intuitive alignment proposals (which are generally agreed to be inadequate) include:
-
Why can’t we just turn the AI off if it starts to misbehave?
-
Why can’t we just tell an AI just to figure out what we want and then do that?
-
Why can’t we just tell the AI to figure out right from wrong itself?
-
Why can’t we just use a more powerful AI to control a potentially dangerous AI?
-
Why can’t we just treat AI like any other dangerous technological tool?
At a high level, some common pitfalls with alignment proposals include that the proposed solution:
-
…requires human observers to be smarter than the AI. Many safety solutions only work when an AI is relatively weak, but break when the AI reaches a certain level of capability (for many reasons, e.g. deceptive alignment).
-
…appears to make sense in natural language, but when properly unpacked is not philosophically clear enough to be usable.
-
… despite being philosophically coherent, we have no idea how to turn them into computer code (or if that’s even possible).
-
…they’re things which we can’t do.
-
…although we can do them, they don’t solve the problem.
-
…only solves a subcomponent of the problem, but leaves the core problem unresolved.
-
…solves the problem only as long as we stay “in distribution” with respect to the original training data (distributional shift will break it).
-
…might work eventually, but we can’t expect it to work on the first try (and we'll likely only get one try at aligning a superintelligence!).
See the related questions below for some proposals which come up often, or check out John Wentworth’s sequence
See the related questions below for some of the proposals which often come up.