Episode

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Podcast
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Published
Mar 5, 2026
Duration seconds
6440
Processing state
processed
Canonical source
https://www.cognitiverevolution.ai/don-t-fight-backprop-goodfire-s-vision-for-intentional-design-w-dan-balsam-tom-mcgrath/
Audio
https://pdst.fm/e/mgln.ai/e/1113/pscrb.fm/rss/p/traffic.megaphone.fm/RINTP1802421308.mp3?updated=1772715479
JSON
/v1/public/podcasts/the-cognitive-revolution/episodes/don-t-fight-backprop-goodfire-s-vision-for-intentional-design-w-dan-balsam-tom-mcgrath
Markdown
/podcast/the-cognitive-revolution/don-t-fight-backprop-goodfire-s-vision-for-intentional-design-w-dan-balsam-tom-mcgrath.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/the-cognitive-revolution/episodes/don-t-fight-backprop-goodfire-s-vision-for-intentional-design-w-dan-balsam-tom-mcgrath/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/the-cognitive-revolution/don-t-fight-backprop-goodfire-s-vision-for-intentional-design-w-dan-balsam-tom-mcgrath.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Dan Balsam and Tom McGrath from Goodfire return to explore the frontier of mechanistic interpretability and their new research pillar, Intentional Design. They explain the shift from sparse autoencoders to understanding geometric structure in latent spaces, and share a proof-of-concept method for reducing hallucinations using probes and RL. The conversation tackles concerns about reward hacking, principles for shaping the loss landscape instead of fighting backprop, and what this means for aligning powerful models. They also discuss recent Goodfire results on Alzheimer’s prediction, disentangling memorization vs reasoning weights, and how they balance commercial growth with a public benefit mission. Nathan uses Granola to uncover blind spots in conversations and AI research. Try it at granola.ai/tcr with code TCR — and if you’re already using it, test his blind spot recipe here: https://bit.ly/granolablindspot LINKS: Detecting PII for Rakuten Interpretability for Alzheimer's biomarker detection You and Your Research Agent Adversarial examples and superposition Discovering rare behaviors with model diff Priors in time for interpretability Belief dynamics in in-context learning Mixing mechanisms in language models Sparse autoencoder scaling with manifolds Sponsors: VCX: VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com Claude: Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr Serval: Serval uses AI-powered automations to cut IT h…