What is a "polytope" in a neural network?

In neural networks, a polytope is a region of the input space that captures a particular category or concept. Polytopes are a proposed fundamental building block of neural networks, and the perspective that views them this way is known as the polytope lens. In neural networks that use what is arguably the most popular set of rules for when neurons are activated (piecewise linear activation functions), these building blocks are mathematical polytopes: shapes with flat sides, such as a polygon in 2D or a polyhedron in 3D, but confusingly the building blocks have been referred to as ’neural polytopes’ even when they are not mathematical polytopes.

A neural network typically represents input data as a high-dimensional space. For example, a network trained for image classification might represent a pixel as a dimension of a vector. The network then learns to classify images into different categories based on the patterns it discovers in the input data. The polytope represents the region of the input space where a distinctive useful pattern, known as a ‘feature', is detected by the network.

Consider three neurons in this simplified example of a neural network trained to recognise colors. Each neuron in the network picks up some aspect of the color; these ones detect whether the color has cyan pigment, magenta pigment, and/or yellow pigment.

Each neuron divides the space of possible inputs into two: those that cause it to be activated (where it’s on) and those that don’t (where it’s off). Together, the three neurons divide the input space into six ‘polytopes’, and the collection of all the possible activations of the neurons are known as the model’s ‘activation space’.

Two other framings predate polytopes. One considers the fundamental building blocks of a neural network to be the neurons; the other considers them to be the directions in the model’s activation space. Both of these framings have the downside that the fundamental building block can respond to multiple, seemingly unrelated features of the input. This is known as polysemanticity.

On the other hand, polytopes have been shown to sometimes represent single categories more clearly. They provide a mathematically correct description of how the input space is split up.

Moreover, the boundaries of polytopes reflect semantic boundaries and polytopes that are ‘close’ to each other mean similar things. For example, because a mug and a glass are similar objects, there are fewer boundaries separating pictures of a mug and a glass than boundaries separating a glass and a banana.