What is Brain-like AI safety?
“Suppose we someday build an AGI algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?”
Brain-like AI safety is an AI alignment research program pursued by Steve Byrnes. It explores the question of how to align an artificial intelligence built in a way which at a high level is similar to the brain. Specifically it looks at how to align systems trained through actor-critic model based reinforcement learning.
One key feature of the brain which he emphasizes is that it is able to learn from scratch. Most human knowledge and abilities are not hard-coded in the neural structure by evolution. Rather, the brain has a generic learning structure
This is based on viewing the brain’s function as an interplay between a steering system (which is based in the part of the brain which has built in values shaped by evolution), and a learning system (which assesses ones thoughts) these two systems exist in a dynamic with each other, as one learns to associate certain thoughts with certain “values” (e.g. seeing an apple, triggers the reward system since it is likely a source of sweetness if you eat it)