What is the Center for AI Safety (CAIS)'s research agenda?

3 min read

Suggest changes in Google Docs

is a San Francisco-based non-profit directed by Dan Hendrycks that "focuses on mitigating high-consequence, societal-scale risks posed by AI". They pursue both technical and conceptual research alongside work on expanding and supporting the field of AI safety.

Their technical research focuses on improving the safety of existing AI systems, and often involves building benchmarks and testing models against those benchmarks. It includes work on:

Robustness

Robustness

An agent's ability to maintain its goal and its capabilities when exposed to environments that are substantially different from that on which the agent was trained.

View full definition

, for example their analysis of distribution shift, their evaluation of LLM rule-following, and their proposed data processing method for improving robustness.
Transparency

Transparency

The ability to ‘look inside’ a model, to understand how it works and why it produces a specific output

View full definition

, where they have presented representation engineering (RepE) as an emerging approach to transparency.
Machine ethics, where their most well-known work includes the ETHICS dataset and MACHIAVELLI benchmark for evaluating language models.
Anomaly detection, where they have worked on establishing a baseline for detection of out-of-distribution

Out-of-distribution

A case is out-of-distribution if it is very different from the cases a system encountered during training.

examples, and have proposed outlier exposure (OE), a method of detecting anomalies by training a detector based on a dataset of anomalies.

Their conceptual research has included:

Their field-building projects include:

The May 2023 Statement on AI Risk – a statement signed by many AI scientists and other notable figures
The CAIS Compute Cluster, which offers compute for AI safety research
The CAIS Philosophy Fellowship
Prize incentives for safety-relevant research such as improving ML safety benchmarks, moral uncertainty detection by ML systems, and forecasting by ML systems
An ML Safety course and scholarships for ML students doing safety-related research

Not to be confused with Comprehensive AI Services, a conceptual model of artificial general intelligence proposed by Eric Drexler, also abbreviated CAIS. ↩︎

CAIS