What is the difference between inner and outer alignment?
The paper Risks from Learned Optimization in Advanced Machine Learning An approach to AI in which, instead of designing an algorithm directly, we have the system search through possible algorithms based on how well they do on some training data.
The problem of making sure that the precise formulation of what we train the AI to do matches what we intend it to do.
When an AI system ends up pursuing a different objective than the one that was specified.
Pursuing a different goal during deployment than was intended to be learned in training.
A machine learning method in which the machine gets rewards based on its actions, and is adjusted to be more likely to take actions that lead to high reward.
See also this post for an intuitive explanation of inner and outer alignment.