Episode

The Utility of Interpretability — Emmanuel Amiesen

Podcast
Latent Space: The AI Engineer Podcast
Published
Jun 6, 2025
Duration seconds
6782
Processing state
processed
Canonical source
https://www.latent.space/p/the-utility-of-interpretability-emmanuel
Audio
https://api.substack.com/feed/podcast/186632799/5f0d1a6cb0dc287bfa49b0f096ae08a9.mp3
JSON
/v1/public/podcasts/latent-space-ai-engineer/episodes/the-utility-of-interpretability-emmanuel-amiesen
Markdown
/podcast/latent-space-ai-engineer/the-utility-of-interpretability-emmanuel-amiesen.md

Actions

  • POST https://stenobird.com/v1/public/podcasts/latent-space-ai-engineer/episodes/the-utility-of-interpretability-emmanuel-amiesen/transcription-requests
    Idempotently request low-priority transcript generation for this episode.
  • GET https://stenobird.com/podcast/latent-space-ai-engineer/the-utility-of-interpretability-emmanuel-amiesen.md
    Read the agent-friendly Markdown representation of this episode resource.

Summary

Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ). We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen! While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod. Full Video Episode Timestamps 00:00 Intro & Guest Introductions 01:00 Anthropic's Circuit Tracing Release 06:11 Exploring Circuit Tracing Tools & Demos 13:01 Model Behaviors and User Experiments 17:02 Behind the Research: Team and Community 24:19 Main Episode Start: Mech Interp Backgrounds 25:56 Getting Into Mech Interp Research 31:52 History and Foundations of Mech Interp 37:05 Core Concepts: Superposition & Features 39:54 Applications & Interventions in Models 45:59 Challenges & Open Questions in Interpretability 57:15 Understanding Model Mechanisms: Circuits & Reasoning 01:04:24 Model Planning, Reaso…