What is Anthropic's alignment research agenda?

Anthropic is a major AI lab that "develop[s] large-scale AI systems so that we can study their safety properties [and] use these insights to create safer, steerable, and more reliable models". They are currently focused on scaling supervision, mechanistic interpretability, process-oriented learning, and understanding and evaluating how AI systems learn and generalize.

Anthropic has worked on a number of approaches to alignment.

Anthropic has published their core views on AI safety.