A goals which is valued as an end in itself, rather than as a means to some other end.
What is instrumental convergence?
Instrumental convergence is the idea that sufficiently advanced intelligent systems with a wide variety of terminal goals
A terminal goal (also referred to as an "intrinsic goal" or "intrinsic value") is something that an agent values for its own sake (an "end in itself"), while an instrumental goal A goal which is pursued as a means to some other end, rather than as an end in itself.
A system that can be understood as taking actions toward achieving a goal.
For instance, you might donate to an organization that helps the poor in order to improve people’s well-being. Here, “improve well-being” is a terminal goal that you value for its own sake, whereas “donate” is an instrumental goal that you value because it helps you achieve your terminal goal: if you found out that your money wasn’t making people better off, you’d stop donating.
While certain instrumental goals are particular to specific ends (e.g., filling a cup of water to quench your thirst), other instrumental goals are broadly useful. For example, if we imagine an AI with a very specific (and weird) terminal goal — to create as many paperclips as possible — we can see why this goal might lead to the AI pursuing a number of instrumental goals:1 Philosopher who has done research on existential risk from AI and other causes. Formerly founder and head of FHI at Oxford. Author of the 2014 book Superintelligence: Paths, Dangers, Strategies. An AI with cognitive abilities far greater than those of humans in a wide range of important domains.
- Self-preservation. If the AI gets shut off or destroyed, that means it has to stop making paperclips. Therefore, it will be motivated to protect itself as an instrumental goal. (As Stuart Russellquipped: “You can’t fetch the coffee if you’re dead.”)Stuart Russell
Computer science professor at UC Berkeley, founder of CHAI, and co-author of the textbook Artificial Intelligence: A Modern Approach.
- Goal integrity. The AI will try to avoid having its goals changed, since if its goals were changed, it would stop trying to make paperclips and there would probably end up being fewer paperclips in the world. For a human analogy, let's say someone could cause you to stop caring about being kind to others, you would probably oppose that change, since according to your current values that would be a worse situation.
- Resource acquisition. Resources like money, influence, and information are useful for making paperclips. Through advanced technology, even fundamental resources including time, space, matter, and energy could be processed to serve almost any goal.
- Technological advancement. Better technology will improve the efficiency and effectiveness of producing paperclips.
- Cognitive enhancement. Improvements in rationality and intelligence will improve the AI’s decision-making, making it faster at making paperclips.
We can see some degree of instrumental convergence
The idea that agents with widely different terminal goals will end up adopting many of the same instrumental goals.
Nick Bostrom, "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents" (2012). ↩︎