What is superalignment?

2 min read

The superalignment team is a division within OpenAI with the goal of figuring out how to "steer and control AI systems much smarter than us." It is led by Jan Leike and Ilya Sutskever.

The team's primary strategy is to develop tools to train and align AIs which are themselves powerful enough to help with alignment research. They plan to use these AIs to offload a greater and greater proportion of alignment research tasks from human researchers while still always keeping humans "in the loop." They expect other labs to develop AIs that can produce research on AI capabilities, and they want to make sure that when this happens, there is a framework available to repurpose these AIs towards alignment.

In order to ensure that any successful techniques that they discover are used by other labs, they intend to focus much of their attention on publicly testing their solutions.