What are inductive biases?
In machine learning, an inductive bias is a tendency for a learning algorithm to prefer some explanations over others when multiple explanations fit the data equally well.1
Imagine tracking someone's energy levels, which rise and fall in a pattern that roughly repeats every day. You could model this by drawing straight lines between observations. (That’s the gray curve in the figure below.) This model will fit the observations themselves perfectly, and will fit nearby points pretty well — in fact, by adding dense enough data in a region, you can make the approximation there arbitrarily good. But ask it to predict the future, and it will forecast that energy keeps falling forever. The assumption built into the model — "continue in a straight line" — rules out periodicity from the start.2 No amount of additional data within the observed range will make the gray curve periodic outside it.
Inductive biases get baked into a model through choices like what architecture to use, how to measure the size of errors, and how to penalize complexity. A model designed around linear equations will tend to find linear patterns; a model built from decision trees will tend to carve the world into rectangular regions. Neither is "wrong." They're just suited for solving different sorts of problems.
Without some inductive bias, a learning algorithm cannot generalize at all. Infinitely many patterns pass through any finite set of points. The inductive bias is what breaks the tie — it determines which of those patterns the algorithm will choose. Two models can perform identically on the same training data yet behave completely differently when they encounter something new, because they're generalizing according to different assumptions.
The impossibility of generalizing without inductive bias is formally called the “no free lunch theorem”: no learning algorithm is better than any other across all possible problems. This is because “all possible problems” include predicting totally random events, for which you can’t do better than guessing. Even worse, it includes problems that are perfectly tuned to violate any assumptions a learning algorithm may make. But problems in the real world are not set up by trolls, or totally random. They have structure, and an inductive bias toward simpler explanations tends to find that structure.
Neural networks have their own characteristic inductive biases, which researchers are still working to understand. For example, networks trained with standard methods tend to learn smooth, slowly-varying patterns before sharp, rapidly-changing ones — a phenomenon called spectral bias. When low-frequency patterns allow a network to achieve good performance, higher-frequency patterns may not be learned at all. Nora Belrose and Quintin Pope argue that a treacherous turn — i.e., acting aligned when under oversight and misaligned when in power — is a high-frequency pattern and so is unlikely to occur. However, there’s disagreement about whether the kind of high-frequency patterns that enable treachery are needed for models to be useful.
Likewise, a bias that favored fast computations would penalize the extra reasoning steps required for scheming. In contrast, a bias that favored short description length would make the kind of utility-maximizing behavior more likely that naturally implies treacherous turns.