What is everyone working on in AI alignment?

See this page for an overview of four major types of alignment research: agent foundations, prosaic alignment, interpretability, and brain-based AI.

This page summarizes the alignment research of specific organizations. (See this page for an overview of four major types of alignment research.)

  • Aligned AI (website) was cofounded by Stuart Armstrong and Rebecca Gorman. They are building an “Alignment API” to detect environmental changes in data early, and have published on AI governance and strategy in addition to work on technical alignment.

Within AI alignment, they have published on goal misgeneralization, preference change, concept extrapolation, and value learning.

In addition to ELK, ARC’s work on alignment has included Iterated Amplification and Distillation (IDA), and developing a framework for “formal heuristic arguments”, based on mechanistic interpretability and formal proof methods.

  • Anthropic (website) was founded in 2021 by former executives from OpenAI. They are known for developing Claude line of, a large language model, which they waited until March 2023 to release, to avoid triggering an arms race.

They pursue a portfolio of approaches to AI alignment, perhaps most notably Constitutional AI (which Anthropic developed), but their work has included RLHF, interpretability, automated red-teaming, and more.

Recent work by ALTER includes a proposal for measuring “stubbornness” in AI agents, and applying value alignment to the use of AI in legal settings.

  • The Cambridge Computational and Biological Learning Lab (website) is a lab located in the Department of Engineering at the University of Cambridge. Much of the alignment research comes out of the “Machine Learning Group”, which has a “particular strengths in… Bayesian approaches to modeling and inference in statistical applications”.

Recent work by the group has included work on reward hacking, goal misgeneralisation, and interpretability.

  • The Center for AI Safety (CAIS) (website) is a San Francisco-based nonprofit directed by Dan Hendrycks, pursuing both technical and conceptual research alongside field building. They run a compute cluster specifically for ML safety research. CAIS has done research on robustness, anomaly detection, and machine ethics, and has developed several foundational benchmarks for evaluating AI safety and capabilities. It organized the May 2023 Statement on AI Risk.

Within technical alignment CAIS has done research on robustness, anomaly detection, and machine ethics, and have developed several foundational benchmarks for evaluating AI safety and capabilities.

  • The Center for Human-Compatible AI (CHAI) (website) is a research group founded by Stuart Russell. It is based at UC Berkeley, but has extensive collaborations with other academic institutions. The group works on developing “provably beneficial AI systems”, emphasizing representing uncertainty in AI objectives and getting AIs to defer to human judgment.

The work of CHAI researchers includes work on corrigibility, preference inference, transparency, oversight, agent foundations, robustness, and more.

  • The Center on Long-Term Risk (website) is a research group focused on avoiding s-risk scenarios where AI agents deliberately cause great suffering due to cooperation failure or conflict. To this end, their research is largely work on game theory and decision theory.

  • The Centre for the Study of Existential Risk (CSER) (website) at the University of Cambridge focuses on interdisciplinary research to mitigate existential threats, including those from biotechnology, climate change, global injustice, and AI.

From 2018-2021 they investigated "generality" in AI, exploring its definition, relation to computing power, the tradeoffs between generality and capability, and shifting from focusing on increasing capabilities to expanding the breadth of tasks AI is capable of. They collaborate with the Leverhulme Centre for the Future of Intelligence on AI:FAR.

  • Conjecture (website) was formed from EleutherAI in 2022. Their innovation lab division focuses on products; it has worked on building a tool for coding efficiency and for human-like voice interaction. Conjecture has also done work on AI governance.

Their alignment agenda focuses on building Cognitive Emulation (CoEm) - AI that uses reasoning processes that emulate human reasoning processes, so that their reasoning is more transparent.

  • Elicit (website; blog) is an automated research assistant tool. The team building it spun off of Ought in September 2023. The Elicit team aims to advance AI alignment by using AI to “scale up good reasoning”, to arrive at “true beliefs and good decisions”.

They aim to produce process-based systems, and to this goal have done research including factored cognition and task decomposition.

  • EleutherAI (website) is a non-profit research lab that started as a discord server in 2020, created by Connor Leahy, Sid Black, and Leo Gao.

EleutherAI has primarily worked on training LLMs, and has released some LLMs which were the most capable at the time. They provide open access to these LLMs and their codebases. They also research interpretability, corrigibility, and mesa-optimization.

  • Encultured AI (website) was founded by Andrew Critch, and Nick Hay. From 2022 to 2023 they were a “video game company focused on enabling the safe introduction of AI technologies into [their] game world”, to provide a platform for AI safety and alignment solutions to be tested.

From 2024 onwards, they are joined by their investor Jaan Tallinn in a shift towards healthcare applications of AI.

  • FAR AI (website), the Fund for Alignment Research, works to “incubate and accelerate research agendas that are too resource-intensive for academia but not yet ready for commercialisation by industry”.

Their alignment research has included adversarial robustness, interpretability and preference learning.

  • The Future of Humanity Institute (FHI) (website) is an Oxford University research center directed by Nick Bostrom. It has five research groups, which include AI safety, AI governance, and digital minds alongside macrostrategy and biosecurity.

Their safety work has included work on identifying principles to guide AI behavior and detecting novel risks alongside work on governance.

Their alignment research includes work on model evaluation, value learning, task decomposition, and robustness.

  • The Machine Intelligence Research Institute (MIRI) (website) began work on AI as the “Singularity Institute”. Originally founded by Eliezer Yudkowsky in 2000 with the aim of accelerating progress towards AGI, they are now known for their early shift to focus on AI existential safety, which raised awareness of the issue. They are also known for being pessimistic about existential risk from AI.

Their research has been “non-disclosed by default” since 2018, but they have two research agendas – a 2014 agent foundations agenda and a 2016 machine learning agenda.

Their alignment research spans many areas including interpretability, human-AI interaction, and multi-agent systems.

  • The NYU Alignment Research Group (website) is a research group led by Sam Bowman, which overlaps and works with other groups at NYU, that does “empirical work with language models that aims to address longer-term concerns…”

Their research agenda includes work on scalable oversight like debate, amplification, and recursive reward modeling; studying the behavior of language models, and design of experimental protocols that test for alignment.

  • Obelisk (website) is the AGI laboratory of Astera Institute. They focus on computational neuroscience, and work on developing “brain-like AI” – AI inspired by the architecture of human brains.

Their current work includes building a computation model based on neuroscience, research furthering neuroscience itself, building an evolutionary computation framework, and a training environment that scales in complexity.

  • Ought (website) is a California-based product-driven research lab. Elicit, an organization building an AI research assistant, was incubated at Ought. While building Elicit they were focused on factored cognition and supervising LLM processes (instead of outcomes).

Their mission is to “scale up good reasoning” so that machine learning “help[s] as much with thinking and reflection as it does with tasks that have clear short-term outcomes”.

  • OpenAI (website) is probably best known for ChatGPT, an LLM chatbot, but has also created DALL-E, a text-to-image generator. It was originally founded as a non-profit in 2015, transitioning to a capped for-profit in 2019.

Their alignment work focuses on using human feedback, scalable oversight, and automating alignment research.

They work primarily on the “question-answer counterfactual interval” (QACI) alignment proposal.

  • Redwood Research (website) is led by Buck Shlegeris and Nate Thomas. They have run the machine learning bootcamp MLAB and the research program REMIX.

They focus on prosaic alignment techniques “motivated by theoretical arguments for how they might scale”, including work on AI control, interpretability, and “causal scrubbing”.

This agenda aims to draw from the mathematical field of Singular Learning Theory (SLT) to detect and interpret phase transitions – moments where a machine learning model appears to change the way it thinks.