What is behavioral cloning?

3 min read

Behavioral cloning is a form of imitation learning. It involves gathering observations of the behavior of an “expert demonstrator” who is good at the task being trained for, and then using supervised learning to train an AI agent to imitate the observed behavior.

Behavioral cloning differs from other forms of imitation learning (such as inverse reinforcement learning or cooperative inverse reinforcement learning) in that it aims to have the AI replicate the demonstrator's behavior as closely as possible (rather than to have the AI, e.g., infer the demonstrator's goals or implicit reward function).

Behavioral cloning was originally developed to train self-driving cars, and this use case serves as a good example of how behavioral cloning works:

First, a human "demonstrator" drives a car around, during which we collect data about 1) states of the environment (using sensors such as cameras and Lidars) and 2) the actions that the demonstrator takes in each environmental state (such as steering wheel movements, accelerating/braking, gear shifting, etc.).
Next, we create a dataset consisting of (state, action) pairs.
Finally, we use supervised learning to train a model that takes the environmental state as an input and predicts the driver’s action.

When the accuracy of this model is high enough, we can say that the driver’s behavior has been “cloned”.

Behavioral cloning is also sometimes used to fine-tune l arge language models (LLMs). In this case, behavioral cloning involves imitating a human expert whose behavior produces a set of appropriate (prompt, completion) pairs. As an example, after learning how to predict text from the internet, LLMs can be fine-tuned to follow instructions by copying humans.

Overall, behavioral cloning is a straightforward technique that gives us a good baseline for what we should expect from imitation-based algorithms. However, it does have limitations; see this discussion of some problems that arise in modern LLMs that are fine-tuned using behavioral cloning.

Sources

Stanford CS234: Reinforcement Learning (2019) , Lecture 7 - Imitation Learning
Berkeley CS182: Reinforcement Learning (2021), Lecture 14 - Imitation Learning
leogao (2021). Behavior Cloning is Miscalibrated
Ortega, Pedro, et al. (2021). "Shaking the foundations: delusions in sequence models for interaction and control.”
Zhou, Chunting, et al. (2020). "Detecting Hallucinated Content in Conditional Neural Sequence Generation"
Xiao, Yijun, and Wang, William. (2021) "On Hallucination and Predictive Uncertainty in Conditional Language Generation."

What is imitation learning?

What is reinforcement learning (RL)?

What is reinforcement learning from human feedback (RLHF)?