What are Cartesian frames?

Proposing a split into two articles:

  • What are Cartesian frames?

  • In Cartesian frames, what are the equivalents to concepts like input, output and time?

What are Cartesian frames?

The theory of Cartesian frames introduces a paradigm for modeling agency. Rather than assuming the existence of Cartesian boundaries (i.e., conceptual boundaries that separate an agent and the environment) as the traditional models of agency do, this framework lets us construct such boundaries. Instead of taking concepts like input, output and time to be elementary, Cartesian frames allow us to carve up flexible models of the world by focusing on what an agent could do over what it should, considering “choice” to be elementary. This would be the difference between modelling a self-driving car that focusses on determining the “correct” control signals, and one that focusses on what it could do in terms of a broader range of actions and potential interactions with the environment - not limited to its immediate driving actions but also the ability to do things like anticipate traffic patterns or adapt to changing road conditions.

Since the emphasis is on exploring the possibilities and capabilities of the agent rather than solely adhering to predefined interactions curtailed by rigid input-output mappings, the precise way in which we draw the Cartesian boundary is less crucial. This allows flexible modeling of subagents and alternative conceptual divisions, such as either drawing a line between one sports team and the field with opposing team players, or drawing one to separate an individual player and the field with other players. More specifically, a Cartesian frame is an object that includes a set of possible states the agent can choose to be in, a set of possible states the environment can choose to be in, and a function that encodes the consequences of their combination, as a possible world given that combination. It is a frame because it represents a specific way to conceptually organize the world. For instance, the choice of an agent (Alice) and that of the environment (Bob) may result in a world where both choose to cooperate in a Prisoner’s Dilemma game. Naturally, this framing also allows for different levels of abstraction, by mapping the underlying detailed worlds to coarse high-level descriptions, such as mapping real-numbered utilities of Alice and Bob in the previous example, to a desciption that simply classifies it to be high, moderate or low.

In Cartesian frames, what are the equivalents to concepts like input, output and time?

Choice is fundamental in a Cartesian frame, and notions such as input, output and time are derived rather than basic. The analogues of input and output in Cartesian Frames are observations and controllables respectively, which in essence allow us to ask the questions ‘what can the agent learn from’ and ‘what can the agent do or force to be true’ instead of attempting to precisely answer questions regarding what the input and output should be, such as "Is the output value of a painting determined by the artististic technique or the emotions it evokes in the viewer's mind?".

The analogue of inputs in a Cartesian frame are observables. In general, they are any properties that the agent can make different decisions based on. For example, in the context of a traffic intersection, the observables could be weather conditions, volume of traffic, time of day, etc. The analogue of outputs in a Cartesian frame are controllables. These are outcomes that are both ensurable and preventable (i.e., basically controllable) by an agent. For example, if there is an action that an agent can take for every environment state that can ensure having smooth traffic flow, as well as one that can prevent it, then the outcome of having smooth traffic flow is controllable by the agent.

This framing blurs the distinction between immediate sensory input and other agent knowledge, encompassing all logically deducible information from observations. For instance, an agent that can observe a cool breeze, can also observe its complement (the absence of cool breeze). The same is true for outputs and controllables.

In Cartesian frames, time can be modeled by dividing the set of possible world histories (W) into partitions representing different moments or stages. Each partition corresponds to a specific point in time, grouping together world histories with agreed events and decisions leading up to that point. As time progresses, the partitions become finer (i.e., more detailed), decreasing controllables while increasing observables. This perspective presents time as a trade-off between control over the world and the ability to observe and condition on it.

In a game of Tic-Tac-Toe, for example, the initial partition represents the starting position (an empty grid). Each move corresponds to a specific point in the game, forming a sequence of board states with agreed placements of X's and O's leading up to that point. As the game progresses and players take turns placing their symbols, the number of controllable options decreases, as cells become occupied and unavailable for further placements. Meanwhile, the observable information expands, enabling players to assess the evolving board state, identify winning opportunities, and anticipate their opponent's strategy.

// The Long Version:

Introduction

Under traditional agent models, the agent and environment are fundamentally separate. The agent observes the environment, makes decisions based on those observations, and sends out actions into the environment according to predefined interactions, such as a self-driving car focused on determining the “correct” control signals based on various sensory inputs. These models carve up the world into variables, assuming concepts like input, output, and time to be elementary and fixed.

The theory of Cartesian frames introduces a different perspective on modeling agency, centered around what an agent could do, rather than what it should. The focus, in the context of the self-driving car, shifts to what it could do in terms of a broader range of actions and potential interactions with the environment. This could include not only the immediate driving actions but also the ability to anticipate traffic patterns or adapt to changing road conditions. The emphasis is on exploring the possibilities and capabilities of the agent rather than solely adhering to predefined interactions, allowing agency to be modelled in a way that goes beyond rigid input-output mappings, allowing for more flexible and adaptive behaviour in complex environments.

Instead of treating the agent as having inputs and outputs, we shift our focus to a set of possible states the agent can be in, a set of possible states the environment can be in, and an evaluation function that encodes the consequences of their combination as a set of possible worlds.

This can be visualised as a matrix resembling a game, such as the "Prisoner's Dilemma" where two players, Alice and Bob, independently choose to either cooperate or defect, with the payoff or utility for each player depending on the combination of their choices. Here, the rows would represent the choices of the agent (Alice), and the columns would represent the possible environments, which in this case are the choices of Bob. Each entry in the matrix corresponds to a possible world, which is a combination of choices made by Alice and Bob. By assigning utility values to these worlds, we gain insights into the relationship between choices and outcomes.

This framework also enables multi-level descriptions of the world, and incorporates first-person perspectives alongside third-person ones, which encompass all possible worlds. We can simplify the world model by defining a set of coarse descriptions (say, V) and mapping them to the detailed universe. In the context of the Prisoner’s Dilemma example, the Cartesian frame C where each point (r1, r2) represents a possible world associated with utility values for Alice and Bob, could be mapped to a coarse description in V. For instance, the point (4, 7) could be mapped to "Alice receives a moderate payoff, and Bob receives a high payoff." Moreover, by swapping labels and transposing the matrix, we gain the ability to switch our perspective between Alice and Bob and explore the game from either player's point of view.

Cartesian Frame Analogues for Input and Output

“Choice” is fundamental in a Cartesian frame, and notions such as input, output and time are derived rather than basic. The analogues of input and output in Cartesian Frames are observations and controllables respectively, which in essence allow us to ask the questions ‘what can the agent learn from’ and ‘what can the agent do or force to be true’ instead of attempting to precisely answer questions regarding what the input and output should be, such as "Is the output value of a painting determined by the artististic technique or the emotions it evokes in the viewer's mind?".

To understand with an example, let's say W represents the state of a traffic intersection. In this scenario, A represents the actions available to an agent, such as "change the traffic light signal," or "deploy a traffic police officer." E represents the possible states of the environment, which could include variables like the volume of traffic, time of day, weather conditions, etc. The observables, in general, are any properties that the agent can make different decisions based on (for example, in this case, weather conditions). The controllables are outcomes that are both ensurable and preventable by an agent. In other words, if a subset of W, say S: smooth traffic flow, was controllable, that would mean that there is an action in A that can be taken for every environment state that ensures smooth traffic flow, as well as one that prevents smooth traffic flow.

This framing blurs the distinction between immediate sensory input and other agent knowledge, encompassing all logically deducible information from observations. For instance, an agent that can observe a cool breeze, can also observe its complement (the absence of cool breeze). The same is true for outputs and controllables.

Naturally, in this framing, the precise way in which the Cartesian boundary (the conceptual boundary that separates the observables and controllables) is drawn is less crucial, allowing flexible modeling of subagents and alternative conceptual divisions, such as a sports team versus opposing team and playing field or an individual player and the field with other players. They are merely different frames of the world in question, and the boundary can thus be constructed as we see fit.

Modelling Time

In Cartesian Frames, time can be modeled by dividing the set of possible world histories (W) into partitions representing different moments or stages. Each partition corresponds to a specific point in time, grouping together world histories with agreed events and decisions leading up to that point. As time progresses, the partitions become finer (i.e., more detailed), decreasing controllables while increasing observables. This perspective presents time as a trade-off between control over the world and the ability to observe and condition on it.

In a game of Tic-Tac-Toe, for example, the initial partition represents the starting position (an empty grid). Each move corresponds to a specific point in the game, forming a sequence of board states with agreed placements of X's and O's leading up to that point. As the game progresses and players take turns placing their symbols, the number of controllable options decreases, as cells become occupied and unavailable for further placements. Meanwhile, the observable information expands, enabling players to assess the evolving board state, identify winning opportunities, and anticipate their opponent's strategy.

In a chess game, for example, the initial partition represents the starting position. As the game unfolds, the partition refines to capture specific moments like moves or piece captures. At the beginning, players have numerous choices (high controllables) while observables are limited to the current board state. However, as the game progresses, the partitions become finer, reducing controllables as viable moves dwindle. Simultaneously, observables expand, enabling deeper analysis of the board state.