How likely is extinction from superintelligent AI?

Extinction from misaligned superintelligence is a tricky event to put a probability on: we don’t have a base rate of how many past civilizations like ours went extinct (whether from misaligned superintelligence or anything else), or a way to split all possible futures into a set of symmetrical and equally likely cases. That said, various people have tried putting numbers on their informed guesses of the chance of superintelligence leading to existential catastrophe, giving estimates ranging from under 1% to over 90%.

Eliezer Yudkowsky and Nate Soares at the Machine Intelligence Research Institute (MIRI) are examples of researchers who give high probabilities of extinction. In Yudkowsky’s view, humanity is on the bad end of a logistic success curve: because our response to the problem is seriously inadequate in multiple ways, any individual improvement won’t do much good by itself. By their mainline models, we’d need to “move up the curve” by doing better in several dimensions before we’d start seeing our probability of survival increase by a noticeable amount from the present ~0%.

Others, including Paul Christiano and Katja Grace, give lower probabilities (20%[1] & 19%)[2], but still think there is substantial risk of extinction.

Joe Carlsmith wrote a report which offers a framework for calculating the probability of power-seeking AI causing an existential catastrophe. The calculation involves multiplying factors like "how likely are AI systems to be agentic?" and "how likely is a warning shot?". Carlsmith gave a final estimate of >10%; various reviewers used the same model to come up with different probabilities..

Though the range of estimates is wide, even those at the low end are worryingly high. Ben Garfinkel estimates the existential risk from power-seeking AI by 2070 at only 0.4%, but despite this believes major efforts are justified to understand and reduce it.


  1. Paul Christiano gives 20% as his guess for “Probability that most humans die within 10 years of building powerful AI (powerful enough to make human labor obsolete)”, alongside guesses of 22% ‘Probability of AI takeover” and 55% on “humanity irreversibly mess[ing] up our future within 10 years of building powerful AI” ↩︎

  2. Katja Grace gives 19% as her “overall probability of doom”, which includes some non-extinction scenarios. Her talk outlines her overall model. ↩︎