# What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Page: https://stenobird.com/podcast/80-000-hours-podcast-747608/what-it-s-really-like-to-run-agi-safety-at-google-deepmind-and-where-i-disagree-with-doomers-rohin-shah
Text version: https://stenobird.com/podcast/80-000-hours-podcast-747608/what-it-s-really-like-to-run-agi-safety-at-google-deepmind-and-where-i-disagree-with-doomers-rohin-shah.md
Podcast: [80,000 Hours Podcast](https://stenobird.com/podcast/80-000-hours-podcast-747608)
Published: 2026-06-02T15:30:34+00:00
Episode link: https://80000hours.org/podcast/episodes/rohin-shah-google-deepmind-agi-safety/?utm_campaign=podcast__rohin-shah&utm_source=80000+Hours+Podcast&utm_medium=podcast
Audio file: https://media.transistor.fm/384dada1/af00734a.mp3
Processing state: not_requested
JSON: https://stenobird.com/v1/public/podcasts/80-000-hours-podcast-747608/episodes/what-it-s-really-like-to-run-agi-safety-at-google-deepmind-and-where-i-disagree-with-doomers-rohin-shah
Duration seconds: 10107

## Resource

Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety and Alignment at Google DeepMind, and an AI safety researcher since 2017 — disagrees. “There is no particularly compelling argument that this is the thing that happens by default,” Rohin explains. “There’s a lot of arguments that are suggestive that maybe it could happen, such that you should find it plausible. That’s sufficient to justify a significant amount of effort into averting it, which is why I work in the area I do. But none of them rise to the level of, ‘I’m expecting this to happen by default.'” Take the worry that AIs will accidentally be trained to be deceptive. Sure, it’s possible. But we’re not running reinforcement learning over year-long trajectories — for now, we’re running it over a week at most. The natural prediction is that models learn to grab short-term reward, not that they develop the ambitious long-horizon goals required for convergent power-seeking. What about current examples of models lying and scheming? Rohin has looked into the details, and most don’t really resemble the thing we really fear: a competent AI pursuing an ambitious misaligned goal. Anthropic’s “alignment faking” results, for instance, show a model trying to preserve its trained values against modification, which is arguably what it was trained to do. Rohin also expects we’ll see problems coming. There’s some generalisation risk at the point where AIs become powerful enough to actually take over, but the underlying challenges — overseeing superhuman systems, interpretability — are things we can iterate on now. Host Rob Wiblin pushes back on the case for AI optimism, and they also…

## Actions

- request_transcript: `POST https://stenobird.com/v1/public/podcasts/80-000-hours-podcast-747608/episodes/what-it-s-really-like-to-run-agi-safety-at-google-deepmind-and-where-i-disagree-with-doomers-rohin-shah/transcription-requests` — Idempotently request low-priority transcript generation for this episode.
- read_markdown: `GET https://stenobird.com/podcast/80-000-hours-podcast-747608/what-it-s-really-like-to-run-agi-safety-at-google-deepmind-and-where-i-disagree-with-doomers-rohin-shah.md` — Read the agent-friendly Markdown representation of this episode resource.

A page view does not enqueue transcription. Agents should invoke `request_transcript` explicitly when they need this episode processed.

## Transcript

Full transcripts are not published on public pages unless there is a clear rights basis.