Take AISafety.info’s 3 minute survey to help inform our strategy and priorities

Take the survey
Basic concepts

Capabilities
Current systems
Algorithms
Prompting
Alignment concepts
Intelligence and optimization
AI goals
Risks and outcomes

What is the difference between inner and outer alignment?

The paper Risks from Learned Optimization in Advanced Machine Learning

Systems makes the distinction between inner and outer alignment: Outer alignment means making the optimization target of the training process (“outer optimization target”, e.g., the loss in supervised learning) aligned with what we want. Inner alignment means making the optimization target of the trained system (“inner optimization target”) aligned with the outer optimization target. A challenge here is that the inner optimization target does not have an explicit representation in current systems, and can differ very much from the outer optimization target (see for example Goal Misgeneralization in Deep Reinforcement Learning).

See also this post for an intuitive explanation of inner and outer alignment.

Keep Reading

Continue with the next entry in "Basic concepts"
What are the differences between AI safety, AI alignment, AI control, Friendly AI, AI ethics, AI existential safety, and AGI safety?
Next
Or jump to a related question


AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.