But won't we just design AI to be helpful?

1 min read

Suggest changes in Google Docs

One might think that aligning an AGI would be easy. Companies are doing a decent job at preventing LLMs such as GPT-4 from generating harmful text; why would this not continue as AI gets more and more powerful? Isn’t there a simple fix that allows us to get robust alignment?

Current AI models might be sufficiently aligned to be useful, but despite the best efforts of the labs that produce them, they still suffer from issues such as bias and susceptibility to adversarial prompts. Furthermore, resolving these issues may not be enough: there are reasons to expect that, as AI systems become more capable, systems that seem to display aligned behavior in some environments could turn out to be dangerously misaligned in others. Dive into the related articles to see what makes AI alignment a hard problem and why current alignment techniques might not scale to more powerful AI.