What is a path to AGI being an existential risk?

Here is a conjunctive path[1] to AI takeover[2] inspired by Joe Carlsmith’s report on power-seeking AI. Each step is uncertain and depends on the realization of the previous one. Unlike Carlsmith, we do not assign probabilities for each of these steps as people have widely varying estimates for these probabilities, but argue that the end result is probable enough to warrant attention. You can find a more abstract version of this argument here.

The path goes something like this:

  1. Building human-level AGI is possible in principle[3]. In this context, AGI refers to AI which can do things like: think strategically, do independent scientific research, design new computer systems, engage in high levels of persuasion and make and carry out plans.[4]

  2. Within the foreseeable future, humanity could have the technological capability to construct AGI.

  3. Once feasible, humanity is expected to proceed with building agentic AGI to perform tasks autonomously because it will be profitable to do so. Furthermore, some actors will rush for it, which might trigger an AI arms race

  4. A singleton[5] AGI is deployed by a well intentioned actor but is misaligned (due to instrumental convergence, orthogonality thesis, inner/outer misalignment, treacherous turn, etc.)[6] and gains a decisive strategic advantage. Such an AI can outmaneuver humanity and can reach its misaligned aims either byacting in the physical world or influencinge humans to act in ways that harm humans or humanity.

  5. This leads to bad outcomes and possibly even human extinction[7]

Carlsmith arrives at a 5% chance of human extinction following this path. You can put your own probabilities into a similar model here[8] to derive your probability of existential catastrophes according to this model


  1. This makes the path liable to the multiple stage fallacy. ↩︎

  2. This path only covers AI takeover, for a case that covers misuse, see here. ↩︎

  3. Some argue that an AI does not need to be as general as humans to be existentially dangerous. One could imagine a powerful narrow AI specialized in weapon-making or political influence having these effets. ↩︎

  4. Examples include Karnofsky’s PASTA or Bensinger’s STEM-level AGI. ↩︎

  5. This might also be possible in a multipolar scenario. ↩︎

  6. Carlsmith estimates a similar scenario to be 40% likely conditioning on previous events. ↩︎

  7. Carlsmith estimates a similar scenario to be 20% likely conditioning on previous events. ↩︎

  8. Use this guide to learn how to use the Analytica model. ↩︎