What is cyborgism?

3 min read

In the context of AI safety, cyborgism^[1] is an agenda that proposes building systems composed of humans and LLMs^[2], in a way that augments the humans’ capabilities, to accelerate progress on AI alignment. In contrast with proposals to almost-entirely automate alignment research using AI, this setup is meant to retain human control over the alignment research process by avoiding potentially dangerous features of researcher AIs, such as agency.

The authors of the alignment forum post that introduces the idea argue that, just as the steam engine was initially used to build horseless carriages before the new capabilities were adapted to more efficient designs (i.e. cars), our current usage of LLMs follows familiar patterns and is quite limited compared to what it could become. They suggest that instead of viewing the ways in which LLMs differ from agents as flaws to be solved^[3], we should use their strengths as simulators^[4] to enhance our thinking, while leaving humans as the only agentic entities steering the collaborative cognition.

Proponents of cyborgism have developed human-LLM collaboration tools such as LOOM, a tool that enables users to explore multiple simultaneous branches of an LLM’s "simulations".

Cyborgism in this context is distinct from cyborg art and cyberfeminism. ↩︎
In principle, cyborg systems could be composed of a human and any AI model, but the agenda emerged from an understanding of LLMs and in practice is used that way. ↩︎
They note that some of the modifications to the base models such as RLHF make LLMs more agent-like, which is capabilities research and reduces the time we have before humanity loses control. ↩︎
They remark that the non-RLHFed base models perform better as multiverse generators than their RLHFed counterparts. RLHF constrains the model’s output in a way that is relevant to a chatbot but not to a multiverse generator. ↩︎