What is "metaphilosophy" and how does it relate to AI safety?
Metaphilosophy is the study of how philosophy is developed. Wei Dai introduced this idea as a way to think about AI safety
A research field about how to prevent risks from advanced artificial intelligence.
While humans have made progress in philosophy, we don't have an algorithm we can follow to guarantee further progress. Dai proposes that by better understanding metaphilosophical questions, we can create a formal method for making philosophical progress that can be programmed into a "white-box metaphilosophical AI".
Another approach is to have the AI learn how to make philosophical progress based on human examples, which Dai calls "black-box metaphilosophical AI", because we wouldn’t understand how it worked on the inside. However, this would require an advanced AI whose safety would be hard to ensure.
Alternatively, we could solve the philosophical problems ourselves. Dai has argued this approach is also less than promising: the track record of human philosophy suggests that without AI assistance, we're unlikely to anticipate all the relevant problems which could arise with the emergence of AGI