What is Conjecture's research agenda?

Conjecture is an AI research lab focused on "building Cognitive Emulation - an AI architecture that bounds systems' capabilities and makes them reason in ways that humans can understand and control". Conjecture hopes that this approach will allow for "scalable, auditable, controllable AI systems".

Cognitive Emulation (CoEm)

Conjecture's primary alignment program is "cognitive emulation", i.e., trying to make AI systems that emulate human reasoning.[1] Conjecture calls such systems "CoEms". The most powerful current AI systems generally exhibit opaque and potentially very "alien" reasoning. By contrast, a successful CoEm would be "good at chess for the same reasons humans are good at chess".

A CoEm, in the words of Connor Leahy and Gabriel Alfour:

• Is built on understandable, discoverable and implementable ML and computational building blocks.

• Does not have so much Magic[2] inside of it that we cannot even put bounds on its possible consequences and capabilities.

• Can be sufficiently understood and bounded to ensure it does not suddenly dramatically shift its behaviors, properties and capabilities.

• Is well situated in the human(ish) capabilities regime and, when in doubt, will default to human-like failure modes rather than completely unpredictable behaviors.

• Is retargetable enough to be deployed to solve many useful problems and not deviate into dangerous behavior, [as long] as it is used by a careful user.

CoEms, by design, would not be able to achieve far-greater-than-human intelligence. However, Conjecture hopes that CoEms could be used to help find formal solutions to the problem of aligning superintelligent AGI.

Interpretability for large language models

Conjecture has done interpretability research on large language models (LLMs):

Outreach and communication

Conjecture's CEO, Connor Leahy, does public communication about AI safety and alignment, including appearances on podcasts and news programs.


  1. This doesn't mean "simulating" human brains, neurons, etc., but rather emulating the logical structure of human thought. In other words, a CoEm would "reason in the same way" as humans, regardless of structural details of how that reasoning is implemented. ↩︎

  2. "Magic" is a tongue-in-cheek term for computation done by an AI that we don't understand. ↩︎