Episode

#236 – Max Harms on why teaching AI right from wrong could get everyone killed

Podcast
80,000 Hours Podcast
Published
Feb 24, 2026
Duration seconds
9622
Processing state
not_requested
Canonical source
https://80000hours.org/podcast/episodes/max-harms-miri-superintelligence-corrigibility/?utm_campaign=podcast__max-harms&utm_source=80000+Hours+Podcast&utm_medium=podcast
Audio
https://media.transistor.fm/6571daf0/04773272.mp3
JSON
/v1/public/podcasts/80-000-hours-podcast-747608/episodes/236-max-harms-on-why-teaching-ai-right-from-wrong-could-get-everyone-killed
Markdown
/podcast/80-000-hours-podcast-747608/236-max-harms-on-why-teaching-ai-right-from-wrong-could-get-everyone-killed.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/80-000-hours-podcast-747608/episodes/236-max-harms-on-why-teaching-ai-right-from-wrong-could-get-everyone-killed/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/80-000-hours-podcast-747608/236-max-harms-on-why-teaching-ai-right-from-wrong-could-get-everyone-killed.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Most people in AI are trying to give AIs ‘good’ values. Max Harms wants us to give them no values at all. According to Max, the only safe design is an AGI that defers entirely to its human operators, has no views about how the world ought to be, is willingly modifiable, and completely indifferent to being shut down — a strategy no AI company is working on at all. In Max’s view any grander preferences about the world, even ones we agree with, will necessarily become distorted during a recursive self-improvement loop, and be the seeds that grow into a violent takeover attempt once that AI is powerful enough. It’s a vision that springs from the worldview laid out in If Anyone Builds It, Everyone Dies , the recent book by Eliezer Yudkowsky and Nate Soares, two of Max’s colleagues at the Machine Intelligence Research Institute . To Max, the book’s core thesis is common sense: if you build something vastly smarter than you, and its goals are misaligned with your own, then its actions will probably result in human extinction. And Max thinks misalignment is the default outcome. Consider evolution: its “goal” for humans was to maximise reproduction and pass on our genes as much as possible. But as technology has advanced we’ve learned to access the reward signal it set up for us, pleasure — without any reproduction at all, by having sex while on birth control for instance. We can understand intellectually that this is inconsistent with what evolution was trying to design and motivate us to do. We just don’t care. Max thinks current ML training has the same structural problem: our development processes are seeding AI models with a similar mismatch between goals and behaviour. Across virtually every training run, models designed to align with various human goals are also being re…