What is reinforcement learning (RL)?

2 min read

Suggest changes in Google Docs

Reinforcement learning (RL) trains an AI by rewarding behavior we want and punishing behavior we don't want. The AI will repeat behaviors that have helped it get higher rewards in the past, and avoid behaviors that gave negative or low reward. For example, a recommendation system (such as the one on Netflix) sees which recommendations you clicked on in the past and shows you other movies you are likely to click on, based on what people who liked similar things also liked.

RL is one of the three basic machine learning (ML) paradigms, alongside supervised learning (SL) and unsupervised learning (UL). Some ML algorithms work well when we have a labeled dataset to help train our model. However, we don’t always have labels, and hiring humans to label data is expensive. RL is an alternative approach which uses the entire environment as a dataset as opposed to a curated or labeled subset of the environment. The model receives observations and rewards for actions that correspond to what we want it to do.

Due to its generality, reinforcement learning is studied in many disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, and statistics.

What is reinforcement learning from human feedback (RLHF)?

What is behavioral cloning?

What is imitation learning?