Wouldn't a superintelligence be smart enough to avoid misunderstanding our instructions?

2 min read

Suggest changes in Google Docs

While a superintelligence would be able to figure out what humans want it to do, that alone would not cause it to "care." An AI will follow the programming it actually has, not that which we wanted it to have. For example, if, hypothetically, we instructed an AI only to “eradicate cancer”, a goal which it can achieve by eradicating all living things on the basis that they may develop cancer in the future, it might go ahead knowing full well that we didn’t intend that outcome. It was given a very specific command: eradicate cancer as effectively as possible. The command makes no reference to “doing this in a way humans will like”, so it doesn’t.

As an analogy: humans are smart enough to understand some of our own “programming”. For example, we know that natural selection "gave" us the urge to have sex so that we would reproduce.¹ But we still use contraception, because evolution did not give us the urge to satisfy evolution’s “values”. We can appreciate intellectually that having sex while using contraception doesn’t carry out evolution’s "intentions", but we don’t necessarily care. Similarly, a superintelligence could know our real intentions but ignore them in favor of its programmed objective.

Evolution does not act as an agent that “decides” something, but evolution is analogous in some ways to stochastic gradient descent in that it shaped the way that we, the agents, act. The relevance of this analogy is debated by some researchers. ↩︎

What is "Do what I mean"?

What is outer alignment?