What are the differences between subagents and mesa-optimizers?
A subagent is an agent which combines with other subagents to compose a larger Agent
A system that can be understood as taking actions towards achieving a goal.
A system that is part of a larger agent and has its own agent-like qualities, such as pursuing its own goals.
A mesa-optimizer is similar to a subagent in that it also optimizes for its own goals. However, unlike a subagent, it is a separate trained model. The mesa-optimizer is shaped by the Base Optimizer Something that can improve some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements.
In contrast to a mesa-optimizer, a base optimizer is the “outer” optimizer usually explicitly implemented by humans.
An algorithm that is created by optimization and that is also itself an optimizer.
Even though we have no particular reason to expect subagents to emerge from a process of gradient descent1
In short, a subagent is an agent that is a part of an agent; a mesa-optimizer is an optimizer that is optimized by an optimizer.
Since having some part of the model turn into an agent probably doesn’t have an advantage in achieving the base goal ↩︎