How would we align an AGI whose learning algorithms / cognition look like human brains?
Steven Byrnes, a full-time, independent alignment researcher, works on answering the question: "How would we align an AGI whose learning algorithms / cognition look like human brains?"
Humans seem to robustly care about things; why is that? If we understood that, could we design AGIs to do the same thing? As far as I understand it, most of this work is biology-based: learning how various parts of the brain work, and addressing the alignment problem with this understanding.
There are three other independent researchers working on related projects that Byrnes has proposed.