Why would misaligned AI pose a threat that we can’t deal with?

2 min read

Suggest changes in Google Docs

Human civilization is pretty robust. New technologies, cultural changes, and malign actors have sometimes caused great harm. However, in many (but not all) cases, we’ve been better able to adapt to their consequences than expected. Even the worst cases haven’t irreversibly ruined human civilization (though some have come close).

But misaligned AI systems, if sufficiently powerful, would want to prevent us from interfering with their plans, in order to make the consequences of their actions permanent. Greater-than-human intelligence would allow them to improve themselves, invent new technologies at a much faster pace than we have, and out-strategize us by thinking and adapting their plans at humanly incomprehensible speeds.

This means that a strategy of trial and error – where we gradually learn our lessons in dealing with weaker misaligned systems and apply them to stronger misaligned systems – may not be good enough. We don’t know how quickly systems will gain capabilities, whether solutions that work on weaker systems will generalize to stronger ones, or how to coordinate between AI creators to only experiment in small incremental steps. We may need to succeed on the first critical try.

Various concrete solutions that have been proposed have deep problems. This includes only deploying AI in limited contexts or relying on competition between different kinds of misaligned agents to produce a good outcome.

For details on some of these points, see the related questions.