What is Vinge’s principle?

Vinge’s principle says that, in "rich" domains, a system that is less intelligent1 in that domain cannot predict the exact actions that a more intelligent agent will take, even if the less intelligent system knows the more intelligent agent's goal(s).2

A domain is rich, relative to a given level of intelligence, if it's complicated enough that perfect decision-making isn't possible with that level of intelligence.3 For instance, tic-tac-toe is a *non-*rich domain, since tic-tac-toe is solved: from every single board state, there is one (or a set of equally) optimal move(s), and it's not possible to play better than that. In other words, there's a "ceiling" on how good one can be at tic-tac-toe — if you4 were to play tic-tac-toe against a superintelligent computer, it couldn't come up with some undiscovered strategy to beat you, because no such strategy exists.5

By contrast, a domain is "rich" (from our perspective) if:

  1. The space of potential actions and strategies is too large and/or irregular for us to find the optimal strategies (e.g. chess), or

  2. The known mechanics of the domain do not permit us to place absolute bounds on which kinds of outcomes or goals are in principle achievable (e.g. geopolitics).

In general, any domain where we continue to discover better strategies over time is necessarily rich.

Vinge’s principle is relevant to AI safety since 1) it limits our ability to predict the actions of an AI with greater-than-human intelligence, and 2) it might limit the ability of an AI agent to safely design more intelligent AI (or to self-modify itself into a more intelligent version).

  1. Intelligence here refers to the capacity to choose actions that successfully achieve one's goals within some or many domains (closely related to instrumental rationality). ↩︎

  2. Vinge’s principle is named after Vinge’s Law, an idea about fiction writing which states that "characters cannot be significantly smarter than their authors… because to really know how a character like that would think, you’d have to be that smart yourself." ↩︎

  3. Domains are only rich relative to some level of intelligence. For instance, tic-tac-toe isn't rich to the average adult, but well might be for a young child or a chimp; likewise, there might be domains which are rich to us that wouldn't be to some more intelligent system. ↩︎

  4. This assumes you know optimal play, which is realistic for most human adults that have played a few games. ↩︎

  5. Since the first player in tic-tac-toe can always force a draw (at worst) with optimal play, nothing, no matter how capable, could ever beat you if you go first (assuming they're playing fair!). ↩︎