Why might we expect a superintelligence to be hostile by default?
AI systems execute our precise commands without considering the many factors humans instinctively consider to determine if an action is acceptable to us. This can lead to potentially dangerous outcomes from seemingly simple instructions and goals.
One might argue: computers only do what we command them; no more, no less. So while it might be bad if terrorists or enemy countries develop superintelligence
An AI with cognitive abilities far greater than those of humans in a wide range of important domains.
Suppose we wanted a superintelligence to cure cancer. How might we specify the goal “cure cancer”? If we knew every individual step, then we could cure cancer ourselves. Instead, we have to give it the goal of curing cancer, and trust the superintelligence to come up with intermediate actions that further that goal.
If your only goal is “eradicating cancer”, and you lack humans’ instincts for the thousands of other important considerations, a relatively easy solution might be to release an engineered virus and kill everyone in the world. This satisfies all the AI’s goals: it reduces cancer down to zero, it's very fast, and it has a high probability of success. This type of specification gaming
Behavior where an AI performs a task in a way that scores highly according to the objective that was specified, while going against the task’s intended “spirit.”
It’s worth noting that the “kill all humans” solution might be “hostile” to us, but unlike the AI in some works of fiction such as The Terminator, the cancer-curing AI’s main aim is not to harm us. It simply has objectives that can be satisfied by killing us, and it won’t avoid doing so unless explicitly designed that way. As an analogy, the well being of ants is rarely considered when human engineers flood a plain to build a dam. The engineers aren’t explicitly hostile to the ants, they just don’t value their lives much. We don’t want humanity to suffer the fate of the ants in this situation.
To recap, simple goal architectures are likely to go very wrong and look downright hostile unless tempered by common sense and a broader understanding of what we do and do not value.