In the coming decades, AI systems could be invented that outperform humans on most tasks, including strategy, persuasion, economic productivity, scientific research and development, and AI design. We don't know how to align such systems with the intentions of their users, even when those intentions are good. This could lead to catastrophic outcomes.
The research field of AI safety was founded to prevent such disasters, and enable humanity to use the enormous potential of advanced AI to solve problems and improve the world. There are many kinds of AI risk, but the kind that this website focuses on, because it seems both plausible and extreme in scope, is existential risk from misaligned AI systems disempowering or killing humanity.
Examples of work on AI existential safety are
-
Agent foundations: Understanding what intelligence and agency are at a fundamental level
-
Prosaic alignment: Developing methods like debate and iterated distillation and amplification to align more powerful versions of current AI techniques
-
AI policy and governance: Setting up institutions and mechanisms that cause the major actors to implement good AI safety practices
Examples of work from the broader AI safety field are
-
Getting content recommender systems to not radicalize their users
-
Ensuring autonomous cars don’t kill people
-
Advocating strict regulations for lethal autonomous weapons
Some research is useful both for addressing existential risk and smaller-scale bad outcomes:
-
Robustness to distribution shift: making AI systems more able to function reliably outside of the context they were trained in
-
Interpretability: giving humans insight into the inner workings of AI systems such as neural networks
This website is designed as a single point of access where people can read summaries and find links to the best information on concepts related to AI existential safety. The goal is to help enable readers to contribute to the effort to ensure that humanity avoids these risks and reaches a wonderful future.