What are Cartesian frames?

What are Cartesian frames?

The theory of Cartesian frames introduces a paradigm for modeling agency. Rather than assuming the existence of Cartesian boundaries (i.e., conceptual boundaries that separate an agent and the environment) as the traditional models of agency do, this framework lets us construct such boundaries. Instead of taking concepts like input, output and time to be elementary, Cartesian frames allow us to carve up flexible models of the world by focusing on what an agent could do over what it should, considering “choice” to be elementary. This would be the difference between modelling a self-driving car that focusses on determining the “correct” control signals, and one that focusses on what it could do in terms of a broader range of actions and potential interactions with the environment - not limited to its immediate driving actions but also the ability to do things like anticipate traffic patterns or adapt to changing road conditions.

Since the emphasis is on exploring the possibilities and capabilities of the agent rather than solely adhering to predefined interactions curtailed by rigid input-output mappings, the precise way in which we draw the Cartesian boundary is less crucial. This allows flexible modeling of subagents and alternative conceptual divisions, such as either drawing a line between one sports team and the field with opposing team players, or drawing one to separate an individual player and the field with other players. More specifically, a Cartesian frame is an object that includes a set of possible states the agent can choose to be in, a set of possible states the environment can choose to be in, and a function that encodes the consequences of their combination, as a possible world given that combination. It is a frame because it represents a specific way to conceptually organize the world. For instance, the choice of an agent (Alice) and that of the environment (Bob) may result in a world where both choose to cooperate in a Prisoner’s Dilemma game. Naturally, this framing also allows for different levels of abstraction, by mapping the underlying detailed worlds to coarse high-level descriptions, such as mapping real-numbered utilities of Alice and Bob in the previous example, to a desciption that simply classifies it to be high, moderate or low.

In Cartesian frames, what are the equivalents to concepts like input, output and time?

Choice is fundamental in a Cartesian frame, and notions such as input, output and time are derived rather than basic. The analogues of input and output in Cartesian Frames are observations and controllables respectively, which in essence allow us to ask the questions ‘what can the agent learn from’ and ‘what can the agent do or force to be true’ instead of attempting to precisely answer questions regarding what the input and output should be, such as "Is the output value of a painting determined by the artististic technique or the emotions it evokes in the viewer's mind?".

The analogue of inputs in a Cartesian frame are observables. In general, they are any properties that the agent can make different decisions based on. For example, in the context of a traffic intersection, the observables could be weather conditions, volume of traffic, time of day, etc. The analogue of outputs in a Cartesian frame are controllables. These are outcomes that are both ensurable and preventable (i.e., basically controllable) by an agent. For example, if there is an action that an agent can take for every environment state that can ensure having smooth traffic flow, as well as one that can prevent it, then the outcome of having smooth traffic flow is controllable by the agent.

This framing blurs the distinction between immediate sensory input and other agent knowledge, encompassing all logically deducible information from observations. For instance, an agent that can observe a cool breeze, can also observe its complement (the absence of cool breeze). The same is true for outputs and controllables.

In Cartesian frames, time can be modeled by dividing the set of possible world histories (W) into partitions representing different moments or stages. Each partition corresponds to a specific point in time, grouping together world histories with agreed events and decisions leading up to that point. As time progresses, the partitions become finer (i.e., more detailed), decreasing controllables while increasing observables. This perspective presents time as a trade-off between control over the world and the ability to observe and condition on it.

In a game of Tic-Tac-Toe, for example, the initial partition represents the starting position (an empty grid). Each move corresponds to a specific point in the game, forming a sequence of board states with agreed placements of X's and O's leading up to that point. As the game progresses and players take turns placing their symbols, the number of controllable options decreases, as cells become occupied and unavailable for further placements. Meanwhile, the observable information expands, enabling players to assess the evolving board state, identify winning opportunities, and anticipate their opponent's strategy.

// The Long Version:


In a chess game, for example, the initial partition represents the starting position. As the game unfolds, the partition refines to capture specific moments like moves or piece captures. At the beginning, players have numerous choices (high controllables) while observables are limited to the current board state. However, as the game progresses, the partitions become finer, reducing controllables as viable moves dwindle. Simultaneously, observables expand, enabling deeper analysis of the board state.