Take AISafety.info’s 3 minute survey to help inform our strategy and priorities

Take the survey
Beyond the basics

Language models
Mesa-optimizers and subagents
Decision theory
Mathematics of agents
Strategy and outcomes
Brain emulation
Human intelligence enhancement
Computer science
Values
AI consciousness

What are the differences between subagents and mesa-optimizers?

A subagent

is an agent which combines with other subagents to compose a larger agent. For example, in shard theory, each shard is a subagent which pursues its own goal, and the goals of the system as a whole emerge from the negotiation between these shards.

A mesa-optimizer

is similar to a subagent in that it also optimizes for its own goals. However, unlike a subagent, it is a separate trained model. The mesa-optimizer is shaped by the base optimizer, but is not part of it. The base optimizer might be an AI system looking to find the best solution to some problem defined by its human designers. In some cases, that solution will be a simple algorithm which is not an optimizer, but in other cases, the best solution is itself an optimizer – such a solution is a mesa-optimizer. This mesa-optimizer may be optimizing for a goal that differs from the problem definition given by the designers, and may also not be agent-like in a narrow sense.

Even though we have no particular reason to expect subagents to emerge from a process of gradient descent1

, there is a more plausible story as to why mesa-optimizers would emerge. For example, if a program is designed to solve a problem in a very unpredictable environment, the optimal solution might be to create a planner which generates new solutions in real time. This planner is itself an optimizer since it searches through possible plans and selects the best one. The standard by which the planner judges how good a plan is serves as a proxy for the base optimizer’s goal, but is not necessarily identical to that goal.

In short, a subagent is an agent that is a part of an agent; a mesa-optimizer is an optimizer that is optimized by an optimizer.


  1. Since having some part of the model turn into an agent probably doesn’t have an advantage in achieving the base goal ↩︎

Keep Reading

Continue with the next entry in "Beyond the basics"
What should I read to learn about decision theory?
Next
Or jump to a related question


AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.