AI safety is a research field founded to avoid catastrophic outcomes from advanced AI, though the term has since expanded to include reducing less extreme harms from AI.

AI existential safety, or AGI safety is about reducing the existential risk from artificial general intelligence (AGI). AGI is AI that is as competent as humans in all skills that are relevant for making a difference in the world. AGI has not been developed yet, but will likely be in this century.

A central part of AI existential safety is ensuring that AIs actually do what we want, a problem that is harder than it may sound. This is called AI alignment (or just alignment), because it’s about aligning an AI with human values. Alignment is difficult, and building AGI is probably very dangerous, so it’s important to mitigate the risks as much as possible. Examples of work on AI existential safety are

  • Agent foundations: Understanding what intelligence and agency are at a fundamental level

  • Outer and inner alignment: Ensuring the objective of the training process is actually what we want, and that the objective of the resulting system reflects what we want

  • AI policy/strategy: Researching how to best set up institutions and mechanisms for safe AGI development and how to make sure AI isn’t used by bad actors, for example

Work is also being done on preventing bad (but not existential) outcomes from current systems. Examples include

  • Getting content recommender systems to not radicalize their users

  • Ensuring autonomous cars don’t kill people

  • Advocating strict regulations for lethal autonomous weapons

There are areas of research which are useful for both existential and non-existential safety: for example, the robustness to distribution shift and interpretability. While all forms of AI safety are important, this FAQ is focused on providing information on existential safety as it has the potential to be dramatically more significant for humanity’s future.