What are the differences between AI safety, AI alignment, AI control, Friendly AI, AI ethics, AI existential safety, and AGI safety?

6 min read

Suggest changes in Google Docs

Terms like these have a fair amount of overlap and aren't always used consistently. The definitions below are how the terms are used on this website, but this isn't an authoritative guide on how these terms should be used:

AI safety generally means preventing AI systems from causing harm. AI safety generally refers to existential risks from AI systems, but sometimes also includes risks that are relevant at lower AI capability levels, such as near-term technical (e.g. self-driving cars) and governance risks. This makes AI safety an umbrella term used to refer to both current and future safety issues. Other terms are used in conjunction with AI safety to identify the specific risk being addressed.
AGI safety refers to safety concerns from artificial general intelligence. It overlaps with AI alignment (see below), in that misalignment would be the main cause of unsafe behavior in AGIs, but it also includes misuse and other governance issues.
AI existential safety: A risk is existential if it is comparable to or greater than human extinction in terms of its moral significance¹. AI existential safety aims to avoid those risks, whether or not the AI possesses an intelligence that is as general or as capable as that of humans.
AI alignment: Paul Christiano defines alignment as: “ …the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right. An aligned AI would try to figure out which thing is right, and like a human it may or may not succeed.”² Researchers in AI alignment focus on causing the goals of future superintelligent AI systems to align with human values. If aligned, AIs/AGIs/Artificial Superintelligence (ASI) would behave in a way that is compatible with human survival and flourishing. Alignment research is interdisciplinary and draws from computer science, mathematics, neuroscience, philosophy, and social sciences. Some places (e.g. the Alignment Forum) use the term AI alignment to mean the project of AI existential safety or “making AI go well”, including governance and excluding non-existentially risky misalignment.
AI control is an older, less commonly-used term which refers to roughly the same set of problems as AI alignment³. Some consider alignment to be only one potential method to control AI (focused on making sure that AI always tries to do what we want), along with other methods that e.g. try to ensure an AI can't do things we don't want it to, even if it is misaligned (such as tripwires and AI boxing).
AI governance refers to “identifying and enforcing norms for AI developers and AI systems themselves to follow.”⁴ Both AI governance and AI alignment have the goal of helping humanity develop beneficial AI. AI alignment focuses on the technical questions of how AI is built; AI governance focuses on the institutions and contexts in which AI is built and used.⁵ AI governance includes a broad range of subjects, from global coordination around regulating AI development to providing incentives for corporations to be more cautious in their AI research.
Friendly AI (FAI) is an older term coined and popularized by Eliezer Yudkowsky. FAI is a subset of all possible AGIs that includes those that help humans flourish while following some idealized version of human values such as Coherent Extrapolated Volition. In recent years, the term “aligned AI” is more often used to refer to this concept.
AI Ethics is defined by Andrew Critch as principles that AI developers and systems should follow.⁶ In the most general sense, AI ethics focuses on ensuring that in our attempt to harness this technology for good, we appropriately assess its potential for societal harm; in practice, is used to refer to work on a specific cluster of concerns, including preventing and mitigating algorithmic bias, having generative algorithms properly compensate the artists and authors they emulate, and ensuring the transparency of the models being used to make societal decisions. In this sense, AI Ethics often refers to concerns for existing technology whereas most of the terms above refer to future AIs with potentially world-altering scopes.

Many of these definitions overlap, in part because the meanings of some of these terms have drifted over time. Some people have attempted to resort to “AI notkilleveryoneism” to mitigate this dilution and distortion of terms, but this tongue-in-cheek term has not been widely adopted in serious work for obvious reasons.

Critch, Andrew (2020). Some AI research areas and their relevance to existential safety ↩︎
Paul Christiano (2018). Clarifying “AI alignment” ↩︎
Paul Christiano (2016). AI “safety” vs “control” vs “alignment” ↩︎
Critch, Andrew (2020). Some AI research areas and their relevance to existential safety ↩︎
Dafoe, Allan (2017). AI Governance: A Research Agenda ↩︎
Critch, Andrew (2020). Some AI research areas and their relevance to existential safety ↩︎