Take AISafety.info’s 3 minute survey to help inform our strategy and priorities

Take the survey
Predictions about future AI

Timelines
Compute and scaling
Nature of AI
Takeoff
Takeover
Relative capabilities
Good outcomes
Catastrophic outcomes

How can progress in non-agentic LLMs lead to capable AI agents?

AutoGPT is an example of an agent

built on top of GPT-3.

One threat model which includes a GPT-style AI is “misaligned model-based [reinforcement learning

] agent”. It suggests that a reinforcement learner attached to a GPT-style world model could lead to an existential risk, with the reinforcement learning agent being the optimizer which uses the world model to be much more effective at achieving its goals.

A more speculative possibility is that a sufficiently powerful world model may develop a mesa-optimizer

which could achieve its objectives in the world via the outputs of the model, perhaps by causing an optimizer to be created with goals aligned to the mesa-optimizer.

Keep Reading

Continue with the next entry in "Predictions about future AI"
How might things go wrong even without an agentic AI?
Next
Or jump to a related question


AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

© AISafety.info, 2022—2025

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.