What is the Center for AI Safety (CAIS)'s research agenda?
The Center for AI Safety (CAIS)1 Engineer known for introducing molecular nanotechnology. More recently, he has been interested in AI and promotes a tool-based view he calls Comprehensive AI Services (CAIS).
Their technical research focuses on improving the safety of existing AI systems, and often involves building benchmarks and testing models against those benchmarks. It includes work on:
-
Robustness
, for example their analysis of distribution shift, their evaluation of LLM rule-following, and their proposed data processing method for improving robustness.RobustnessView full definitionAn agent's ability to maintain its goal and its capabilities when exposed to environments that are substantially different from that on which the agent was trained.
-
Transparency
, where they have presented representation engineering (RepE) as an emerging approach to transparency.TransparencyView full definitionThe ability to ‘look inside’ a model, to understand how it works and why it produces a specific output
-
Machine ethics, where their most well-known work includes the ETHICS dataset and MACHIAVELLI benchmark for evaluating language models.
-
Anomaly detection, where they have worked on establishing a baseline for detection of out-of-distribution
examples, and have proposed outlier exposure (OE), a method of detecting anomalies by training a detector based on a dataset of anomalies.Out-of-distributionA case is out-of-distribution if it is very different from the cases a system encountered during training.
Their conceptual research has included:
-
Surveys of the field: “Unsolved Problems in ML Safety” (2022), “X-Risk Analysis for AI Research” (2022), “An Overview of Catastrophic AI Risks” (2023), and “AI Deception: A Survey of Examples, Risks, and Potential Solutions” (2023)
Their field-building projects include:
-
The May 2023 Statement on AI Risk – a statement signed by many AI scientists and other notable figures
-
The CAIS Compute Cluster, which offers compute for AI safety research
-
Prize incentives for safety-relevant research such as improving ML safety benchmarks, moral uncertainty detection by ML systems, and forecasting by ML systems
-
An ML Safety course and scholarships for ML students doing safety-related research
Not to be confused with Comprehensive AI Services, a conceptual model of artificial general intelligence proposed by Eric Drexler, also abbreviated CAIS. ↩︎