Episode

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

Podcast
Latent Space: The AI Engineer Podcast
Published
Feb 6, 2026
Duration seconds
4081
Processing state
processed
Canonical source
https://www.latent.space/p/goodfire
Audio
https://api.substack.com/feed/podcast/187000315/45fdd1fce3ff7c69d24a13281311b152.mp3
JSON
/v1/public/podcasts/latent-space-ai-engineer/episodes/the-first-mechanistic-interpretability-frontier-lab-myra-deng-mark-bissell-of-goodfire-ai
Markdown
/podcast/latent-space-ai-engineer/the-first-mechanistic-interpretability-frontier-lab-myra-deng-mark-bissell-of-goodfire-ai.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/the-first-mechanistic-interpretability-frontier-lab-myra-deng-mark-bissell-of-goodfire-ai/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/latent-space-ai-engineer/the-first-mechanistic-interpretability-frontier-lab-myra-deng-mark-bissell-of-goodfire-ai.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent $150M Series B funding round at a $1.25B valuation . In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire’s core bet : that the AI lifecycle is still fundamentally broken because the only reliable control we have is data and we post-train, RLHF, and fine-tune by “slurping supervision through a straw,” hoping the model picks up the right behaviors while quietly absorbing the wrong ones. Goodfire’s answer is to build a bi-directional interface between humans and models: read what’s happening inside , edit it surgically , and eventually use interpretability during training so customization isn’t just brute-force guesswork. Mark and Myra walk through what that looks like when you stop treating interpretability like a lab demo and start treating it like infrastructure: lightweight probes that add near-zero latency, token-level safety filters that can run at inference time, and interpretability workflows that survive messy constraints (multilingual inputs, synthetic→real transfer, regulated domains, no access to sensitive data). We also get a live window into what “frontier-scale interp” means operationally (i.e. steering a trillion-parameter model in real time by targeting internal features) plus why the same tooling generalizes cleanly from language models to genomics, medical imaging, and “pixel-space” world models. We discuss: * Myra + Mark’s path: Palantir (health systems, fo…