Is AI safety about systems becoming malevolent or conscious?

Concern about existential risk from misaligned AI is not based on worries that AI systems will become conscious, turn evil, or develop motivations like revenge or hatred.

AI systems of the future may or may not be conscious, but they can be dangerous regardless. Stuart Russell has written[1]: "The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions."

AI systems may become 1) extremely competent at making decisions that achieve their goals, while also being 2) indifferent to things we value highly (like human life or happiness). And unfortunately, many of a highly-competent AI’s plans would have destructive side-effects on anything the AI does not specifically value.

Here's one (highly condensed) version of the basic case for concern:

  1. As AIs improve, they're likely to become much better than us at producing plans for achieving their goals, and at carrying out those plans.

  2. We don't know how to reliably give an AI particular goals. Therefore, it's very likely that an AI created using current methods will end up with some goal or set of goals that we didn't intend to give it.

  3. For almost any goal maximized by a very powerful AI, it's probable that the most effective plan for achieving that goal will involve actions that are very bad for us, unless we're able to get the AI to "care" specifically about not doing things that are bad for us. Unfortunately, we don't know how to do that.


  1. In another publication, Russell has written “They need not be “conscious”; in some respects, they can even still be “stupid.” They just need to become very good at affecting the world and have goal systems that are not well understood and not in alignment with human goals (including the human goal of not going extinct).↩︎