13: We are not reliably on track to solving these problems
Many people “in the know” now take extinction risk from AI seriously. But at the same time, humanity isn't making a coherent effort to deal with it. In some ways, the situation looks grim.
Alignment may take a long time
For AI labs to be able to build and deploy AI systems that are well-aligned, the theory of AI alignment (our techniques for keeping AI on our side) needs to keep pace with AI capabilities (the range of things AI can do).
But we don't know how long solving the alignment problem will take. Alignment is still receiving far fewer resources than capabilities, and it’s unclear how to spend those resources to make reliable progress. For all we know, it may take many years of clever thoughts building on clever thoughts.
One strategy is to use AI itself to solve AI alignment. This strategy has its own problems, but some players, including OpenAI, are betting on it.
AI labs are racing
Meanwhile, capabilities are racing ahead. Major AI companies are explicitly trying to build
human-level intelligence. They feel like they are in a race: if one slows down for reasons of caution, that just means the others will get there first. And national governments are increasingly involved as well.
Those building AI generally argue that, although the existential risks are real, we can probably avoid them. And they think it’s worth taking these risks because of the benefits the technology will bring, and because they think it’s important to be ahead of others who may develop the technology first and use it in ways contrary to their values. For example, in the US, some are arguing that it’s worth taking the risk in order to stay ahead of China.
But for it to make sense to race toward being first, it has to be the case that alignment can be solved in time. If not, then whoever wins the race to superintelligence, we all lose.
Trial and error won’t work
Consider what systems we’ll build on the “first critical try” — when AI first becomes capable of taking over control. Even if we do manage to align these systems, that doesn’t mean we’re out of the woods. The risk of someone building a misaligned powerful agent will remain until AI can be used to make the world secure against takeover attempts. A strategy to “stop bad AI with good AI” requires us to both figure out how to make powerful good AI in the first place, and then how to use it to reliably contain bad AI.
In contrast, if we fail to align these systems, then there’s no way back, because they’ll take over. Although we can learn some things from aligning weaker systems first, stronger systems are likely to come with their own new challenges that we’ll have to navigate without practice.
There is no plan
Given this situation, what can people do to make things work out well? Approaches in the AI safety community include trying to:
-
Accelerate alignment research, to make it more likely that when we’re in a position to build very powerful systems, we can do so safely.
-
Create governance mechanisms to ensure AI is created and deployed in safe ways, and not catastrophically abused.
-
Slow or pause AI progress through e.g. a moratorium, in hopes of getting more time to come up with a solution.
-
Predict, understand, and strategize about how all this will play out, as an indirect step to more direct solutions.
Governments such as those of the US and China aren’t prioritizing the kinds of measures that could substantially reduce the risk, such as building a “pause button” capacity that could be used to stop AI if signs appear that it poses an imminent danger. International summits have been held since 2023, but have changed emphasis from the initial “AI Safety Summit” to an “AI Action Summit”. AI companies, for their part, have policies about what to do if systems become more dangerous, but these policies don’t have much content that is specific and binding.
On the whole, humanity isn’t acting on any well-thought-out plan that can be expected to solve the problem. aisafety.info hopes to inform people in order to help remedy this situation.