which combines with other subagents to compose a larger agent. For example, in shard theory, each shard is a subagent which pursues its own goal, and the goals of the system as a whole emerge from the negotiation between these shards.
is similar to a subagent in that it also optimizes for its own goals. However, unlike a subagent, it is a separate trained model. The mesa-optimizer is shaped by the base optimizer
Base optimizer
In contrast to a mesa-optimizer, a base optimizer is the “outer” optimizer usually explicitly implemented by humans.
, but is not part of it. The base optimizer might be an AI system looking to find the best solution to some problem defined by its human designers. In some cases, that solution will be a simple algorithm which is not an optimizer, but in other cases, the best solution is itself an optimizer – such a solution is a mesa-optimizer. This mesa-optimizer may be optimizing for a goal that differs from the problem definition given by the designers, and may also not be agent-like in a narrow sense.
Even though we have no particular reason to expect subagents to emerge from a process of gradient descent1
Since having some part of the model turn into an agent probably doesn’t have an advantage in achieving the base goal
, there is a more plausible story as to why mesa-optimizers would emerge. For example, if a program is designed to solve a problem in a very unpredictable environment, the optimal solution might be to create a planner which generates new solutions in real time. This planner is itself an optimizer since it searches through possible plans and selects the best one. The standard by which the planner judges how good a plan is serves as a proxy for the base optimizer’s goal, but is not necessarily identical to that goal.
In short, a subagent is an agent that is a part of an agent; a mesa-optimizer is an optimizer that is optimized by an optimizer.
Since having some part of the model turn into an agent probably doesn’t have an advantage in achieving the base goal ↩︎
Keep Reading
Continue with the next entry in "Beyond the basics"
What should I read to learn about decision theory?
We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.