How can I do conceptual, mathematical, or philosophical work on AI alignment?

5 min read

Suggest changes in Google Docs

General advice

There isn’t a standard career path in this area. AI alignment is a pre-paradigmatic field in which nobody has a good idea what the right prerequisite knowledge is or what an answer looks like. That means this is a path for people who are willing to wrestle with uncertainty.

Financially supporting your research can be hard; funding isn’t a reliably solved problem, but opportunities for funding do exist.

Rather than thinking of your goal as trying to “become a researcher”, it might be better to think of it as trying to solve the alignment problem. You can get started by reading and thinking about the problem, maybe commenting on posts, and writing down your own ideas in private docs or on LessWrong. Don’t necessarily rely on getting feedback without actively reaching out to people who might have good thoughts. It will help you to find peers to be in contact with.

One way to get into conceptual work is by writing distillations of other people’s work, or critiquing key posts in places like LessWrong (which includes everything that has been posted on the Alignment Forum). It’s important to develop your own “inside view” on the problem.

Consider asking around your personal network for an alignment research mentor, or a collaborator who knows the literature and can give you pointers and feedback. This is unlikely to work with leading alignment researchers, who already get a lot of requests for mentorship, but may be more likely to succeed with people you locally know who can teach you generic research skills. It depends a lot on the person. If you can get a mentor, that’s great, but you don’t need one to succeed, so don’t get blocked on it: almost everything you can get from a mentor, you can also get from a mix of learning by doing and having discussions with and getting feedback from peers. It will take you a bit longer and you’ll probably hit a few more dead ends without a mentor to guide you, but you can do it.

Training programs

Consider training programs (e.g. SERI-MATS) and internships. AI Safety Training has an overview of these. AGI Safety Fundamentals runs courses on AI alignment and governance. The 80,000 Hours AI safety syllabus lists a lot of reading material. For more suggestions, look at Linda Linsefors’s collection of do-it-yourself training programs.

If you’re applying to a program, choose whichever one you think you will most enjoy. The important thing is to start learning the field and to get some contacts. You’ll end up learning different things in different programs, but you won’t be locked into that path. You’re free to continue exploring whatever direction you want and to apply to other programs in the future, and you’ll have a much easier time navigating the space when you have some context and some connections.

Guides and resources

Some helpful guides:

MIRI has a field guide for doing alignment research, especially in the context of a research group like MIRIx.
How to pursue a career in technical AI alignment by CharlieRS has a section with advice on how to get into theoretical AI alignment work.
Rohin Shah has an FAQ on AI alignment career advice.
John Wentworth has a post on getting into independent alignment research.
There is also Adam Gleave's Careers in Beneficial AI Research document.
Richard Ngo has posted on AGI safety career advice.

Other resources:

80,000 Hours offers calls where they give career advice. AI Safety Quest also provides advice in navigation calls.
AI Safety Support has a lot of other good resources: their links page, Slack, and newsletter.
See AI Safety Ideas for research ideas.

How can I use a background in the social sciences to help with AI alignment?