Is AI safety about systems becoming malevolent or conscious?

2 min read

Suggest changes in Google Docs

Concern about existential risk from misaligned AI is not based on worries that AI systems will become conscious, turn evil, or develop motivations like revenge or hatred.

People sometimes assume AI risk is about AI acting like a human villain, and fictional representations of AI takeover often concentrate on AIs with villainous motivations because it makes for compelling storytelling. However, experts generally aren’t worried about this. Nor is AI becoming conscious the dangerous part; in Stuart Russell’s words¹, "The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions."

The core concern is that AI systems may become 1) extremely competent at making decisions that achieve their goals, while also being 2) indifferent to things we value highly (like human life or happiness). And unfortunately, many of a highly-competent AI’s plans could have destructive side-effects on anything the AI does not specifically value. Put simply, AI risk comes from systems which pursue their goals without caring about us. (Consider how human actions often kill a lot of insects, not out of malice on our part, but just as a side-effect of other things we're doing.)

In another publication, Russell has written “They need not be “conscious”; in some respects, they can even still be “stupid.” They just need to become very good at affecting the world and have goal systems that are not well understood and not in alignment with human goals (including the human goal of not going extinct).” ↩︎